To understand the relevance of extract transform and load (ETL) components and how they fit into business intelligence (BI), one should first appreciate what data integration is and the significance of having clean, accurate data that enable successful business decisions. Within the BI industry, data integration is essential. By capturing the right information, organizations are able to perform analyses, create reports, and develop strategies that help them to not only survive, but, more importantly, to thrive.
Informatica, a leading provider of enterprise data integration software, defines data integration as "the process of combining two or more data sets together for sharing and analysis, in order to support information management inside a business". In BI terms, this means that data is extracted in its original form and stored in an interim location, where it is transformed into the format that will be used in the data warehouse. The transformation process includes validating data (e.g., filling in null zip code information in the customer database) and reformatting data fields (e.g., separating Last Name and First Name fields of customer records that are merged in one database but not others). The next step is to load the data into the data warehouse. The data is then used to create queries and data analysis builds, such as on-line analytical processing (OLAP) cubes and scorecard analyses. In a sense, extracting the proper data, transforming it by cleansing and merging records, and loading it into the target database is what allows BI solutions to build analytical tools successfully. It is also the essence of ETL functionality.
Data Integration Components
In order to determine the most suitable ETL solution for them, organizations should evaluate their needs in terms of the core components of the data integration process, as listed below.
* Data Identification. What data does the organization need to extract and where does it come from? What end result, in terms of the data, does the organization want to analyze? Essentially, answering these questions means identifying the origin of the data, and what the relationship is between the different data sources.
* Data Extraction. How frequently does the organization require the data? Is it monthly, weekly, daily, or hourly? Where should data storing and transformation activities occur (i.e., on a dedicated server or in the data warehouse, etc.)? Considering these factors identifies the data frequency needs of the organization. For example, analysis of sales data may require the organization to load data monthly or quarterly, whereas some other data transfers may be performed multiple times a day. In determining the frequency of the data loading and transformation in the data warehouse or on the dedicated server, the organization should also consider the amount of data to be transferred and its effect on product performance.
* Data Standardization. What is the format of the organization's data, and is it currently compatible with the same data elements in other systems? For example, if the organization wants to analyze customer information and to merge customer buying patterns with customer service data, it must know if the customer is identified in the same way in both places (e.g., by customer identification [ID], phone number, or first and last name). This is crucial for ensuring that the correct data is merged and that the data is attached to the right customer throughout the data standardization process. Another data standardization issue the organization should deal with is identifying how it will manage data cleansing and data integrity functions within the data warehouse over time.
* Data Transformation. The organization should consider data transformation requirements and the interaction between the transformed data components. The critical questions are how will the data be reflected in the new database, and how will that data be merged on a row by row basis? Answering these questions involves identifying the business and data rules associated with the data to ensure accuracy in data loads.
Informatica, a leading provider of enterprise data integration software, defines data integration as "the process of combining two or more data sets together for sharing and analysis, in order to support information management inside a business". In BI terms, this means that data is extracted in its original form and stored in an interim location, where it is transformed into the format that will be used in the data warehouse. The transformation process includes validating data (e.g., filling in null zip code information in the customer database) and reformatting data fields (e.g., separating Last Name and First Name fields of customer records that are merged in one database but not others). The next step is to load the data into the data warehouse. The data is then used to create queries and data analysis builds, such as on-line analytical processing (OLAP) cubes and scorecard analyses. In a sense, extracting the proper data, transforming it by cleansing and merging records, and loading it into the target database is what allows BI solutions to build analytical tools successfully. It is also the essence of ETL functionality.
Data Integration Components
In order to determine the most suitable ETL solution for them, organizations should evaluate their needs in terms of the core components of the data integration process, as listed below.
* Data Identification. What data does the organization need to extract and where does it come from? What end result, in terms of the data, does the organization want to analyze? Essentially, answering these questions means identifying the origin of the data, and what the relationship is between the different data sources.
* Data Extraction. How frequently does the organization require the data? Is it monthly, weekly, daily, or hourly? Where should data storing and transformation activities occur (i.e., on a dedicated server or in the data warehouse, etc.)? Considering these factors identifies the data frequency needs of the organization. For example, analysis of sales data may require the organization to load data monthly or quarterly, whereas some other data transfers may be performed multiple times a day. In determining the frequency of the data loading and transformation in the data warehouse or on the dedicated server, the organization should also consider the amount of data to be transferred and its effect on product performance.
* Data Standardization. What is the format of the organization's data, and is it currently compatible with the same data elements in other systems? For example, if the organization wants to analyze customer information and to merge customer buying patterns with customer service data, it must know if the customer is identified in the same way in both places (e.g., by customer identification [ID], phone number, or first and last name). This is crucial for ensuring that the correct data is merged and that the data is attached to the right customer throughout the data standardization process. Another data standardization issue the organization should deal with is identifying how it will manage data cleansing and data integrity functions within the data warehouse over time.
* Data Transformation. The organization should consider data transformation requirements and the interaction between the transformed data components. The critical questions are how will the data be reflected in the new database, and how will that data be merged on a row by row basis? Answering these questions involves identifying the business and data rules associated with the data to ensure accuracy in data loads.
No comments:
Post a Comment