Course Catalog Help

Deepdive: Building your first Pipeline

Learn to build a data pipeline in Foundry's Pipeline Builder, from data preprocessing to integration and best practices, with minimal coding.

rate limit

Code not recognized.

About this course

In this course, you will learn, hands-on, how to build a data pipeline with Foundry's Pipeline Builder and apply key data processing techniques to transform raw data into valuable insights.
 
 
  • Understanding Data Pipelines and their Importance: Gain an understanding of data pipelines and their role in systematically transforming raw data into valuable insights, and recognize their importance in driving data-driven analyses and decision making within an organization.
  • Utilizing Pipeline Builder for Data Processing: Explore the capabilities of Foundry's Pipeline Builder and its pre-built Spark modules to create and manage data pipelines effectively, without the need for extensive coding, and leverage it for common data processing challenges.

  • Applying Data Cleaning and Preprocessing Techniques: Apply fundamental data cleaning and preprocessing techniques using pre-built Pipeline Builder modules to address common data quality issues, such as data type casting, format conversion, and data cleaning, while ensuring consistency, accuracy, and readiness for further analysis.

  • Integrating and Aggregating Data: Learn how to integrate and aggregate datasets using joins and aggregations, understand the significance of materializing outputs, and validate joined datasets to ensure accuracy and meaningful information.

  • Deploying Advanced Pipeline Techniques: Explore advanced Pipeline Builder capabilities such as streaming pipelines, outputting to ontology, and leveraging User-Defined Functions (UDFs) for custom functionality; and understand best practices for segmenting, materializing outputs, and maintaining data pipelines for performance and reliability.

 

Curriculum

  • Overview
  • Introduction
  • Getting Started
  • Choose or create your training location
  • Download and upload the data sources
  • Creating a Pipeline 01 (Data Preprocessing and Cleaning)
  • Introduction
  • Create a new pipeline
  • Basic transforms: Clean the products dataset
  • Advanced transforms: Clean the customers dataset
  • Work with different file types: Preprocess and clean the transactions dataset
  • Creating a Pipeline 02 (Data Integration, Joining and Aggregations)
  • Introduction
  • Simple joins: Join transactions and products
  • Chaining joins: Join transactions x products with customers
  • Materializing outputs
  • Identify a bad join: Drill into transactions x products
  • Create an operationally valuable dataset
  • Additional Considerations and Best Practices
  • Introduction
  • Other Pipeline Builder Capabilities
  • User-Defined Functions (UDFs)
  • Segmenting your pipeline and materializing outputs: Best practices
  • Pipeline Maintainability

About this course

In this course, you will learn, hands-on, how to build a data pipeline with Foundry's Pipeline Builder and apply key data processing techniques to transform raw data into valuable insights.
 
 
  • Understanding Data Pipelines and their Importance: Gain an understanding of data pipelines and their role in systematically transforming raw data into valuable insights, and recognize their importance in driving data-driven analyses and decision making within an organization.
  • Utilizing Pipeline Builder for Data Processing: Explore the capabilities of Foundry's Pipeline Builder and its pre-built Spark modules to create and manage data pipelines effectively, without the need for extensive coding, and leverage it for common data processing challenges.

  • Applying Data Cleaning and Preprocessing Techniques: Apply fundamental data cleaning and preprocessing techniques using pre-built Pipeline Builder modules to address common data quality issues, such as data type casting, format conversion, and data cleaning, while ensuring consistency, accuracy, and readiness for further analysis.

  • Integrating and Aggregating Data: Learn how to integrate and aggregate datasets using joins and aggregations, understand the significance of materializing outputs, and validate joined datasets to ensure accuracy and meaningful information.

  • Deploying Advanced Pipeline Techniques: Explore advanced Pipeline Builder capabilities such as streaming pipelines, outputting to ontology, and leveraging User-Defined Functions (UDFs) for custom functionality; and understand best practices for segmenting, materializing outputs, and maintaining data pipelines for performance and reliability.

 

Curriculum

  • Overview
  • Introduction
  • Getting Started
  • Choose or create your training location
  • Download and upload the data sources
  • Creating a Pipeline 01 (Data Preprocessing and Cleaning)
  • Introduction
  • Create a new pipeline
  • Basic transforms: Clean the products dataset
  • Advanced transforms: Clean the customers dataset
  • Work with different file types: Preprocess and clean the transactions dataset
  • Creating a Pipeline 02 (Data Integration, Joining and Aggregations)
  • Introduction
  • Simple joins: Join transactions and products
  • Chaining joins: Join transactions x products with customers
  • Materializing outputs
  • Identify a bad join: Drill into transactions x products
  • Create an operationally valuable dataset
  • Additional Considerations and Best Practices
  • Introduction
  • Other Pipeline Builder Capabilities
  • User-Defined Functions (UDFs)
  • Segmenting your pipeline and materializing outputs: Best practices
  • Pipeline Maintainability