Deepdive: Building your first Pipeline

About this course

In this course, you will learn, hands-on, how to build a data pipeline with Foundry's Pipeline Builder and apply key data processing techniques to transform raw data into valuable insights.

Understanding Data Pipelines and their Importance: Gain an understanding of data pipelines and their role in systematically transforming raw data into valuable insights, and recognize their importance in driving data-driven analyses and decision making within an organization.
Utilizing Pipeline Builder for Data Processing: Explore the capabilities of Foundry's Pipeline Builder and its pre-built Spark modules to create and manage data pipelines effectively, without the need for extensive coding, and leverage it for common data processing challenges.
Applying Data Cleaning and Preprocessing Techniques: Apply fundamental data cleaning and preprocessing techniques using pre-built Pipeline Builder modules to address common data quality issues, such as data type casting, format conversion, and data cleaning, while ensuring consistency, accuracy, and readiness for further analysis.
Integrating and Aggregating Data: Learn how to integrate and aggregate datasets using joins and aggregations, understand the significance of materializing outputs, and validate joined datasets to ensure accuracy and meaningful information.
Deploying Advanced Pipeline Techniques: Explore advanced Pipeline Builder capabilities such as streaming pipelines, outputting to ontology, and leveraging User-Defined Functions (UDFs) for custom functionality; and understand best practices for segmenting, materializing outputs, and maintaining data pipelines for performance and reliability.

Curriculum

Overview
Introduction
Getting Started
Choose or create your training location
Download and upload the data sources
Creating a Pipeline 01 (Data Preprocessing and Cleaning)
Introduction
Create a new pipeline
Basic transforms: Clean the products dataset
Advanced transforms: Clean the customers dataset
Work with different file types: Preprocess and clean the transactions dataset
Creating a Pipeline 02 (Data Integration, Joining and Aggregations)
Introduction
Simple joins: Join transactions and products
Chaining joins: Join transactions x products with customers
Materializing outputs
Identify a bad join: Drill into transactions x products
Create an operationally valuable dataset
Additional Considerations and Best Practices
Introduction
Other Pipeline Builder Capabilities
User-Defined Functions (UDFs)
Segmenting your pipeline and materializing outputs: Best practices
Pipeline Maintainability

About this course

In this course, you will learn, hands-on, how to build a data pipeline with Foundry's Pipeline Builder and apply key data processing techniques to transform raw data into valuable insights.

Understanding Data Pipelines and their Importance: Gain an understanding of data pipelines and their role in systematically transforming raw data into valuable insights, and recognize their importance in driving data-driven analyses and decision making within an organization.
Utilizing Pipeline Builder for Data Processing: Explore the capabilities of Foundry's Pipeline Builder and its pre-built Spark modules to create and manage data pipelines effectively, without the need for extensive coding, and leverage it for common data processing challenges.
Applying Data Cleaning and Preprocessing Techniques: Apply fundamental data cleaning and preprocessing techniques using pre-built Pipeline Builder modules to address common data quality issues, such as data type casting, format conversion, and data cleaning, while ensuring consistency, accuracy, and readiness for further analysis.
Integrating and Aggregating Data: Learn how to integrate and aggregate datasets using joins and aggregations, understand the significance of materializing outputs, and validate joined datasets to ensure accuracy and meaningful information.
Deploying Advanced Pipeline Techniques: Explore advanced Pipeline Builder capabilities such as streaming pipelines, outputting to ontology, and leveraging User-Defined Functions (UDFs) for custom functionality; and understand best practices for segmenting, materializing outputs, and maintaining data pipelines for performance and reliability.

Curriculum

Overview
Introduction
Getting Started
Choose or create your training location
Download and upload the data sources
Creating a Pipeline 01 (Data Preprocessing and Cleaning)
Introduction
Create a new pipeline
Basic transforms: Clean the products dataset
Advanced transforms: Clean the customers dataset
Work with different file types: Preprocess and clean the transactions dataset
Creating a Pipeline 02 (Data Integration, Joining and Aggregations)
Introduction
Simple joins: Join transactions and products
Chaining joins: Join transactions x products with customers
Materializing outputs
Identify a bad join: Drill into transactions x products
Create an operationally valuable dataset
Additional Considerations and Best Practices
Introduction
Other Pipeline Builder Capabilities
User-Defined Functions (UDFs)
Segmenting your pipeline and materializing outputs: Best practices
Pipeline Maintainability

Deepdive: Building your first Pipeline

Learn to build a data pipeline in Foundry's Pipeline Builder, from data preprocessing to integration and best practices, with minimal coding.

About this course

Curriculum