Data transformation in the context of Extract, transform, load


Data transformation in the context of Extract, transform, load

Data transformation Study page number 1 of 1

Play TriviaQuestions Online!

or

Skip to study material about Data transformation in the context of "Extract, transform, load"


⭐ Core Definition: Data transformation

In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integration and data management tasks such as data wrangling, data warehousing, data integration and application integration.

Data transformation can be simple or complex based on the required changes to the data between the source (initial) data and the target (final) data. Data transformation is typically performed via a mixture of manual and automated steps. Tools and technologies used for data transformation can vary widely based on the format, structure, complexity, and volume of the data being transformed.

↓ Menu
HINT:

👉 Data transformation in the context of Extract, transform, load

Extract, transform, load (ETL) is a three-phase computing process where data is extracted from an input source, transformed (including cleaning), and loaded into an output data container. The data can be collected from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on recurring schedules either as single jobs or aggregated into a batch of jobs.

A properly designed ETL system extracts data from source systems and enforces data type and data validity standards and ensures it conforms structurally to the requirements of the output. Some ETL systems can also deliver data in a presentation-ready format so that application developers can build applications and end users can make decisions.

↓ Explore More Topics
In this Dossier

Data transformation in the context of Data analysis

Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.

Data mining is a particular data analysis technique that focuses on statistical modeling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information. In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new features in the data while CDA focuses on confirming or falsifying existing hypotheses. Predictive analytics focuses on the application of statistical models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources, a variety of unstructured data. All of the above are varieties of data analysis.

View the full Wikipedia page for Data analysis
↑ Return to Menu

Data transformation in the context of Coding (social sciences)

In the social sciences, coding is an analytical process in which data, in both quantitative form (such as questionnaires results) or qualitative form (such as interview transcripts) are categorized to facilitate analysis.

One purpose of coding is to transform the data into a form suitable for computer-aided analysis. This categorization of information is an important step, for example, in preparing data for computer processing with statistical software. Prior to coding, an annotation scheme is defined. It consists of codes or tags. During coding, coders manually add codes into data where required features are identified. The coding scheme ensures that the codes are added consistently across the data set and allows for verification of previously tagged data.

View the full Wikipedia page for Coding (social sciences)
↑ Return to Menu

Data transformation in the context of Data mapping

In computing and data management, data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks, including:

  • Data transformation or data mediation between a data source and a destination
  • Identification of data relationships as part of data lineage analysis
  • Discovery of hidden sensitive data such as the last four digits of a social security number hidden in another user id as part of a data masking or de-identification project
  • Consolidation of multiple databases into a single database and identifying redundant columns of data for consolidation or elimination

For example, a company that would like to transmit and receive purchases and invoices with other companies might use data mapping to create data maps from a company's data to standardized ANSI ASC X12 messages for items such as purchase orders and invoices.

View the full Wikipedia page for Data mapping
↑ Return to Menu