DATAENG 05c (Repositories): Multiple Outputs with Data Transforms in Code Repositories

About this course

After a Datasource Project has generated a set of clean outputs, the next stage in a pipeline — the Transform Project — prepares data to feed into the Ontology layer. These projects import the cleaned datasets from one or more Datasource Projects, join them with lookup datasets to expand values, normalize or de-normalize relationships to create object-centric or time-centric datasets, or aggregate data to create standard, shared metrics.

Up to this point in the Data Engineering Learning Path, you’ve authored code-based data transformations that output a single dataset. Foundry transform APIs provide at least two ways to generate multiple outputs in a single transform file. This is helpful in cases where you want to programmatically brake inputs into distinctive parts. In this tutorial, you’ll explore one of the available methods for outputting multiple datasets from a single transform as you take your pipeline into the Transform Project phase.

⚠️ Course Prerequisites

Publishing and Using Shared Libraries in Code Respositories: If you have not completed the previous course, please do so now.

Outcomes

The exercises in this tutorial will take the clean outputs from your Datasource project: Flight Alerts and Datasource Project: Passengers and further process them using the concept of a multi-output Python transform. You’ll first generate an intermediate transform that joins the flight alerts data with the passenger data. Then you’ll create a multi-output transform that creates individual datasets of alerts based on passenger country.

🥅 Learning Objectives

Gain familiarity with the Transform Project stage of a production pipeline.
Understand the difference between a multi-output and a generated transform, both of which are capable of producing more than one dataset output from a single transform file.

💪 Foundry Skills

Create, schedule, and document the Transform Project portion of a production data pipeline.
Write a generated and multi-output Python transform.

Curriculum

About this Course
Create a Transform Project and Multiple Output Transforms
Create Your Folder Structure and Repository
Add Code for Your “Transformed” Datasets
Multiple Outputs with “Generated” Transforms
Multi-output Transforms
Exercise Summary
Document and Schedule Your Pipeline
Add a README File
Add a Data Lineage Graph for Documentation
Configure a Connecting Build Schedule
Take Stock of Your Pipeline
Exercise Summary
Conclusion
Key Takeaways
Next Steps

About this course

⚠️ Course Prerequisites

Publishing and Using Shared Libraries in Code Respositories: If you have not completed the previous course, please do so now.

Outcomes

🥅 Learning Objectives

Gain familiarity with the Transform Project stage of a production pipeline.
Understand the difference between a multi-output and a generated transform, both of which are capable of producing more than one dataset output from a single transform file.

💪 Foundry Skills

Create, schedule, and document the Transform Project portion of a production data pipeline.
Write a generated and multi-output Python transform.

Curriculum

About this Course
Create a Transform Project and Multiple Output Transforms
Create Your Folder Structure and Repository
Add Code for Your “Transformed” Datasets
Multiple Outputs with “Generated” Transforms
Multi-output Transforms
Exercise Summary
Document and Schedule Your Pipeline
Add a README File
Add a Data Lineage Graph for Documentation
Configure a Connecting Build Schedule
Take Stock of Your Pipeline
Exercise Summary
Conclusion
Key Takeaways
Next Steps

DATAENG 05c (Repositories): Multiple Outputs with Data Transforms in Code Repositories

Learn how to use multi-output and generated transforms to produce more than one dataset output from a single transform file.

Also available as part of:

About this course

⚠️ Course Prerequisites

Outcomes

🥅 Learning Objectives

💪 Foundry Skills

Curriculum

⚠️ Course Prerequisites

Outcomes

🥅 Learning Objectives

💪 Foundry Skills