Course Catalog Help
DATAENG 02 (Builder): Introduction to Data Transformation with Pipeline Builder

DATAENG 02 (Builder): Introduction to Data Transformation with Pipeline Builder

Use Pipeline Builder to normalize and format data using some basic transforms.

rate limit

Code not recognized.

About this course

Once your team has agreed on the datasets and transformation steps needed to achieve your outcome, it’s time to start developing your data assets. The Pipeline Builder application contains a fully integrated suite of tools that let you configure transformation logic and then build new data transformations as part of a production pipeline. There are several Foundry applications capable of transforming and outputting datasets (e.g., Code Repositories, Contour, Code Workbook, Preparation, Fusion), but for reasons we’ll explore throughout the learning path, production pipelines should only be built with Pipeline Builder or—if specialized code is needed—the Code Repositories application.

⚠️ Course prerequisites

  • DATAENG 01: Data Pipeline Foundations: If you have not completed this previous course, please do so now.

In the previous course, you created a series of folders that implements a recommended pipeline project structure. You’ll now use the Pipeline Builder application to generate the initial datasets in your pipeline. The inputs to this training are the simulated raw data sets from an upstream source and the outputs will be “pre-processed” datasets formatted for further cleaning in the next tutorial.

Learning Objectives

  1. Start your pipeline in the Pipeline Builder application.
  2. Understand the importance of a pre-processing and cleaning in a data pipeline development.
  3. Gain additional practice transforming data in Pipeline Builder.

Foundry Skills

  • Create a pipeline using Pipeline Builder.
  • Transform data with Pipeline Builder and generate output datasets.

Curriculum

  • Introduction
  • About this Course
  • Getting Started
  • Preview the Project in Data Lineage
  • Simulate your Datasource
  • Exercise Summary
  • Transform your data
  • Add a Preprocessing Pipeline
  • Preprocessing Logic: Flight Alerts
  • Preprocessing Logic: Mapping Datasets
  • Exercise Summary
  • Conclusion
  • Key Takeaways
  • Next steps

About this course

Once your team has agreed on the datasets and transformation steps needed to achieve your outcome, it’s time to start developing your data assets. The Pipeline Builder application contains a fully integrated suite of tools that let you configure transformation logic and then build new data transformations as part of a production pipeline. There are several Foundry applications capable of transforming and outputting datasets (e.g., Code Repositories, Contour, Code Workbook, Preparation, Fusion), but for reasons we’ll explore throughout the learning path, production pipelines should only be built with Pipeline Builder or—if specialized code is needed—the Code Repositories application.

⚠️ Course prerequisites

  • DATAENG 01: Data Pipeline Foundations: If you have not completed this previous course, please do so now.

In the previous course, you created a series of folders that implements a recommended pipeline project structure. You’ll now use the Pipeline Builder application to generate the initial datasets in your pipeline. The inputs to this training are the simulated raw data sets from an upstream source and the outputs will be “pre-processed” datasets formatted for further cleaning in the next tutorial.

Learning Objectives

  1. Start your pipeline in the Pipeline Builder application.
  2. Understand the importance of a pre-processing and cleaning in a data pipeline development.
  3. Gain additional practice transforming data in Pipeline Builder.

Foundry Skills

  • Create a pipeline using Pipeline Builder.
  • Transform data with Pipeline Builder and generate output datasets.

Curriculum

  • Introduction
  • About this Course
  • Getting Started
  • Preview the Project in Data Lineage
  • Simulate your Datasource
  • Exercise Summary
  • Transform your data
  • Add a Preprocessing Pipeline
  • Preprocessing Logic: Flight Alerts
  • Preprocessing Logic: Mapping Datasets
  • Exercise Summary
  • Conclusion
  • Key Takeaways
  • Next steps