- Overview
-
Introduction
- Getting Started
-
Choose or create your training location
-
Download and upload the data sources
- Creating a Pipeline 01 (Data Preprocessing and Cleaning)
-
Introduction
-
Create a new pipeline
-
Basic transforms: Clean the products dataset
-
Advanced transforms: Clean the customers dataset
-
Work with different file types: Preprocess and clean the transactions dataset
- Creating a Pipeline 02 (Data Integration, Joining and Aggregations)
-
Introduction
-
Simple joins: Join transactions and products
-
Chaining joins: Join transactions x products with customers
-
Materializing outputs
-
Identify a bad join: Drill into transactions x products
-
Create an operationally valuable dataset
- Additional Considerations and Best Practices
-
Introduction
-
Other Pipeline Builder Capabilities
-
User-Defined Functions (UDFs)
-
Segmenting your pipeline and materializing outputs: Best practices
-
Pipeline Maintainability
Deepdive: Building your first Pipeline
Learn to build a data pipeline in Foundry's Pipeline Builder, from data preprocessing to integration and best practices, with minimal coding.
In this course, you will learn, hands-on, how to build a data pipeline with Foundry's Pipeline Builder and apply key data processing techniques to transform raw data into valuable insights.
- Understanding Data Pipelines and their Importance: Gain an understanding of data pipelines and their role in systematically transforming raw data into valuable insights, and recognize their importance in driving data-driven analyses and decision making within an organization.
- Utilizing Pipeline Builder for Data Processing: Explore the capabilities of Foundry's Pipeline Builder and its pre-built Spark modules to create and manage data pipelines effectively, without the need for extensive coding, and leverage it for common data processing challenges.
- Applying Data Cleaning and Preprocessing Techniques: Apply fundamental data cleaning and preprocessing techniques using pre-built Pipeline Builder modules to address common data quality issues, such as data type casting, format conversion, and data cleaning, while ensuring consistency, accuracy, and readiness for further analysis.
- Integrating and Aggregating Data: Learn how to integrate and aggregate datasets using joins and aggregations, understand the significance of materializing outputs, and validate joined datasets to ensure accuracy and meaningful information.
- Deploying Advanced Pipeline Techniques: Explore advanced Pipeline Builder capabilities such as streaming pipelines, outputting to ontology, and leveraging User-Defined Functions (UDFs) for custom functionality; and understand best practices for segmenting, materializing outputs, and maintaining data pipelines for performance and reliability.