Azure Data Engineer with ADF, Synapse, Fabric, Data bricks and PySpark -Live Training.
(A complete hands-on program covering Microsoft Fabric, Synapse, Databricks, PySpark & DevOps pipelines)
Isha presents an comprehensive, hands-on training programs focused on Microsoft Fabric, Azure Databricks, Apache Spark, and end-to-end data engineering. The curriculum spans critical areas such as Fabric Warehouse, Data Lakehouse architecture, PySpark transformations, Delta Lake operations, Unity Catalog security, and real-time data streaming. Learners gain expertise in modern data integration techniques, CI/CD workflows using Azure DevOps, and secure data governance. With a strong emphasis on Medallion architecture and real-world projects, our training equips professionals with the technical depth and practical skills required to succeed in today’s data-driven industry.
Prerequisite for training: – Knowledge in python programing and SQL.
About the Instructor:
Raj |
Live Sessions Price:
For LIVE sessions – Offer price after discount is 259 USD 199 125 USD Or USD13900 INR 11900 INR 9900 Rupees
OR
What will I learn by the end of this course?
- Gain in-depth proficiency in Microsoft Fabric, including Warehouse setup, Dataflows Gen2, Access Control, Lakehouse vs Warehouse design decisions, and SQL analytics features like Time Travel and Zero Copy Clones.
- Master PySpark and Apache Spark SQL, covering DataFrames, window functions, joins, transformations, partitioning, UDFs, and advanced performance optimization techniques.
- Develop hands-on expertise in Azure Databricks, including cluster setup, DBFS, notebook management, REST API integration, Delta Lake fundamentals, and Medallion Architecture implementation.
- Learn Delta Lake features such as Schema Enforcement, Evolution, Time Travel, Vacuum, Optimize, Z-Ordering, and efficient ingestion using Auto Loader and Structured Streaming.
- Understand and configure Unity Catalog for enterprise-grade data governance with external locations, storage credentials, roles, permissions, and security layers like Row-Level and Column-Level Security.
- Design and orchestrate end-to-end data engineering projects, leveraging CI/CD pipelines using Azure DevOps, GIT integration, and automation for Data Factory-linked notebooks.
- Implement real-world medallion architecture pipelines from raw to bronze, silver, and gold layers with practical exercises, optimized transformations, and data modeling for reporting
Free Demo Session:
21st May @ 9 PM – 10 PM (IST) (Indian Timings)
21st May @ 11:30 AM – 12:30 PM (EST) (U.S Timings)
21st May @ 4:30 PM – 5:30 PM (BST) (UK Timings)
Class Schedule:
For Participants in India: Monday to Friday 9 PM – 10 PM (IST)
For Participants in US: Monday to Friday 11:30 AM – 12:30 PM (EST)
For Participants in UK: Monday to Friday 4:30 PM – 5:30 PM (BST)
What student’s have to say about Trainer:
Fantastic trainer! Each session was well-structured and full of actionable insights – Smitha
The sessions were super interactive, and the trainer made even the most complex Azure components feel simple. I now understand Data Factory pipelines and workspace organization much better than before- Chandu
Thank you for such an informative and well-organized training- Swarna
Loved the way the trainer explained Azure Synapse and Databricks—very hands-on and easy to follow –Anu
Excellent at maintaining engagement throughout. Every session felt well-paced and thoughtfully delivered.- Amaresh
I gained a lot more than I expected, mainly due to the trainer’s teaching style and attention to individual progress –Megha
Salient Features:
- 40 Hours of Live Training along with recorded videos
- Lifetime access to the recorded videos
- Course Completion Certificate
Who can enroll in this course?
- Data Engineers looking to deepen their skills in Microsoft Fabric, Databricks, and Delta Lake.
- Data Analysts and BI Developers aiming to transition into data engineering or work with large-scale analytics solutions.
- Software Developers wanting to learn big data processing using Apache Spark and PySpark.
- ETL Developers and Azure Data Factory users interested in advanced data orchestration and automation.
- DevOps Engineers and Cloud Engineers working with CI/CD pipelines, Git integration, and Azure DevOps.
- Database Administrators (DBAs) moving toward cloud-based data platforms.
- Anyone preparing for roles in modern data platforms, including Lakehouse and streaming data architectures.
Course syllabus:
Azure Data Factory + Synapse : 8hrs
Azure Fabric : 10hrs
Pyspark : 8hrs
Databricks : 14hrs
Azure Data Factory
- What is Azure Data Factory?
- Create Azure Data Factory service
- Building Blocks of Data Factory
- ADF Provisioning
- Linked Services
ADF Activity, Control Flow & Copy Activity
- Lookup Activity
- Get Metadata Activity
- Filter Activity
- For Each Loop
- If else condition
- Execute Pipeline activity
- First Pipeline – Lookup / Set Variable / Datasets
- Foreach Activity – Processing Items In A Loop
- Using Stored Procedures With Lookup Activity
- Read File/Folder Properties Using Get Metadata Activity
- Validation Activity Vs Get Metadata Activity
- Conditional Execution Using IF Activity
- Copy Data Activity – Scenario 1
- Copy Data Activity – Scenario 2
- Assignment 1: Copy files from local filesystem to Azure SQL Database
- Assignment 2: Load a table from one db to another db based on a condition
- Project -LAB-1 Design & Build First Metadata Driven ETL Framework
- Proejct-LAB-2 Design & Build First Metadata Driven ETL Framework
- Using Wait Activity As A Timer
- Using Fail Activity To Raise Exceptions
- Using Append Activity With Array Variable
- Using Filter Activity To Selectively Process Files
- Using Delete Activity To Cleanup Files After Processing
- Copy A Single JSON File
- Copy A Single TEXT File
- Copy A Single PARQUET File
- Copy All Files In A Folder
- Binary File Copy & Implementing File Move
- File To Table Copy Using Insert & Upsert Techniques
- File To Table Copy Using Stored Procedure
- File To Table Copy – Large File Issue
- Table To File Copy
- . Master-Child Pattern Using Execute Pipeline Activity
- Using Self Hosted Integration Runtime To Ingest On-Prem Data
- Parameterized Linked Service & ADF Global Parameters
- Automated Execution Of Pipeline Using Scheduled Triggers
- Event-Based Execution Of Pipeline Using Event Triggers
- ADF Limitation – No Iterative Activity Within Foreach Activity
- ADF Limitation – No Iterative Activity Within IF Activity
- ADF Limitation – No Iterative Activity Within SWITCH Activity
- Sequential Vs Parallel Batch In A Foreach Activity
- ADF Limitation – Dynamic Execution Of Pipeline
- ADF Limitation – Record Number & Data Size Restrictions
- ADF Limitation – Dynamic Variable Name In Set Variable
Project: Meta Driven ETL Framework
- Set Up Azure Active Directory Users/Groups & Key Vault
- Set Up Azure Storage
- Set Up Azure SQL Database
- Set Up Additional Groups & Users
- ETL Framework Metadata Tables
- Set Up Azure Data Factory
- Modular & Reusable Design
- Generic Pipeline to Extract From SQL Database -1
- Generic Pipeline to Extract From SQL Database -2
- Generic Pipeline to Extract From SQL Database -3
- Use Case : Historical Or Intial Load executing with dynamic configuration approach
- Use Case : Incremental Load from Azure SQL to Azure Data Lake using 10 minute SLA .
Data Ingestion
- Data Ingestion – Integration Runtimes
- Data Ingestion – What is Self Hosted Integration Runtime
- Overview of On-premise data source and Datalake
- Downloading and installing Self Hosted IR in On-premise
- UPDATE – Self Hosted IR Files Access issue
- Creating and adding Secrets to Azure Key vault
- Creating Linked Service for Azure Key vault – Demo
- Creating Linked Service and Dataset for On-premise File
- UPDATE -Fix access issue- Create Azure VM and install Self
- UPDATED- Fix ‘host’ is not allowed error
- Creating Linked Service and Dataset for Azure Datalake
- Creating Copy Activity to copy all files from On-premise to Azure
- Incremental data loading using Last Modified Date of File
- Incremental Load based on File Name – Demo
- Incremental Data loading based on Filename – Practical
Parameterize
- Parameterize Linked Service, DataSets, Pipeline
- Monitor Visually
- Azure Monitor
Real Time Use Case – Frequently used into the project
- Apply UPSERT into ADF – using Copy Activity
- One Prem to Azure Cloud Migration
- Remove Duplicate record in ADF
- How to handle NULL From ADF
- Remove specific Rows in File using ADF
- Remove 1st few Rows and last few Rows From ADF
- Handle Error Handling in Data Flow Mapping
- Get File Name From Source
- Copy Files based on last modified Date
- Build ETL Pipeline
- Modular & Resuable Design
- Passing Parent pipeline Run ID & Parent Pipeline Name to Child Pipeline .
- Slowly Changing Dimension Type I
- Lab: Slowly Changing Dimension Type 1
- Artifacts for Tables used in the Lab session of SCD Type 1
- Slowly Changing Dimension Type 2 (Concepts)
- Artifacts for Tables used in the Lab Session of SCD Type II
- Lab: Slowly Changing Dimension Type 2
Azure Synapse Analytics
- Why Warehousing in Cloud
- Traditional vs Modern Warehouse architecture
- What is Synapse Analytics Service
- Demo: Create Dedicated SQL Pool
- Demo: Connect Dedicated SQL Pool with SSMS
- Demo: Create Azure Synapse Analytics Studio Workspace
- Demo: Explore Synapse Studio V2
- Demo: Create Dedicated SQL Pool and Spark Pool
- Demo: Analyse Data using Dedicated SQL Pool
- Demo: Analyse Data using Apache Spark Notebook
- Demo: Analyse Data using Serverless SQL Pool
- Demo: Data Factory from Synapse Analytics Studio
- Demo: Monitor Synapse Studio
Azure Synapse Benefits
- Introduction:
- What is Microsoft Fabric?
- Fabric Signup
- Creating Fabric Workspace
- Fabric Pricing
- Creating storage account in Azure
- Creating Azure Synapse Analytics Service in Azure
- Evolution of Data Architectures
- Delta Lake Structure
- Why Microsoft Fabric is needed
- Microsoft’s definition of Fabric
- How to enable and access Microsoft Fabric
- Fabric License and costing
- Update in Fabric UI
- Experiences in Microsoft Fabric
- Fabric Terminology
- OneLake in Fabric
- One copy for all Computes in Microsoft Fabric
Fabric Lakehouse
- Understanding Fabric Workspaces
- Enable Fabric Trail and Create workspace
- Purchasing Fabric Capacity from Azure
- Workspace roles in Microsoft Fabric
- Update in the create items UI
- Creating a Lakehouse
- What is inside lakehouse
- Uploading data to Lakehouse
- Uploading Folder into Lakehouse
- SQL analytics endpoint in Lakehouse
- Access SQL analytics endpoint using SSMS
- Visual Query in SQL endpoint
- Default Semantic Model
- OneLake File Explorer
Fabric Datafactory
- Fabric Data Factory UI
- Ways to load data into Lakehouse
- Fabric Data Factory vs Azure Data Factory Scenario
- Gateway types in Microsoft Fabric
- Installing On-prem data gateway
- Create Connection to SQL Server
- Pipeline to ingest OnPrem SQL data to Lakehouse
- Scenario completed using Fabric data factory
- Dataflow Gen2 – Intro
- Creating DataFlow Gen2
- DataFlow Gen2 in Fabric vs Dataflows in ADF
OneLake in Fabric
- Shortcuts in Fabric – Intro
- Prerequisites to Create a shortcut
- Creating a shortcut in Files of Lakehouse
- Criteria to create shortcuts in table section
- Uploading required files and access for synapse
- Right way to create a shortcut in table’s section
- Creating delta file
- Creating shortcut in Table’s section
- Scenario – Creating shortcut with delta in a subfolder
- Scenario – Creating shortcut with only parquet format
- Requirements to create shortcuts in Table and files section
- Updation Scenario 1 – Lakehouse to Datalake
- Updation Scenario 2 – Datalake to Lakehouse
- Shortcut deletion scenarios intro
- Deletion Scenario 1 – Delete in Lakehouse files
- Deletion Scenario 2 – Delete in ADLS
- Deletion Scenario 3 – Delete table data in Lakehouse
- Deletion Scenario 4 – Delete table data in ADLS
- Deletion Scenario 5 – Deleting entire shortcut
- Shortcut deleting scenario summary
Fabric Synapse Data Engineering
- Ingestion to Lakehouse status
- Spark in Microsoft Fabric
- Spark pools in Microsoft Fabric
- Spark pool node size
- Customizing Starter pools
- Creating a custom pool in Workspace
- Standard vs High Concurrency Sessions
- Changing Spark Settings to StarterPool
- Update in attaching Lakehouse to Notebook Option
- Understanding Notebooks UI
- Fabric Notebook basics
- MSSparkUtils – Intro
- MSSparkUtils – FS- Mount
- MSSparkUtils – FS – Other utils
- MSSparkUtils – FS – FastCp
- Creating Folders in Microsoft Fabric
- MSSparkUtils – Notebook Utils – Run exit
- MSSparkUtils – Notebook – RunMultiple
- Access ADLS data to Lakehouse – Intro
- Access ADLS using Entra ID
- Access ADLS using Service principal
- Access ADLS using SP with keyvault
- Call Fabric notebook from Fabric pipeline
- Managed vs External table – Intro
- Create a Managed Table
- Create an External Table
- Shortcut Table is an external or managed table
- Data Wrangler in Fabric Notebook
- Environments in Microsoft Fabric
- Understanding V-order optimization
- Inspire us with your Thoughts
- Spark Job Definition
- What is a data mesh
- Creating domains in Fabric
Synapse Migration to Fabric
- Manual import from Synapse to Fabric
- Automated way to import and export notebooks – Intro
- Migrate all notebooks from Synapse to fabric
- Possibility of Migration of Pipelines to Fabric pipelines
- Ways to migrate ADLS data to Fabric OneLake
- Migrate ADLS data to Onelake using Storage Explorer
- Install Capacity Metrics App
- Understanding UI of Capacity Metrics App
- Capacity Units consumption
- Throttling vs Smoothing
- Throttling stage- Overage Protection Policy
- Other throttling stages
- Throttling stages Summary
- Overages in Fabric
- System Events in Fabric
- Matrix Visual
Fabric Warehouse Synapse
- Creating a Warehouse in Fabric
- Warehouse vs SQL Analytics Endpoint
- Creating a table and Limitations
- Ways to Load Data into Warehouse
- Loading Data using COPY INTO Command
- Loading Data using Pipeline to Warehouse
- Loading Data using DataFlow Gen2
- Data Sharing – Lakehouse & Warehouse
- Cross Database Ingestion in Warehouse
- Lakehouse vs Warehouse when to choose what
- Different Medallion Architectural patterns
- Update Lakehouse data from WH and vice versa
- SQL query as session in Fabric
- Zero Copy clone within and across Schema
- Time Travel in Warehouse
- Benefits & Limitations of Zero Copy clones
- Cloning single or multiple tables using UI
- Query Insights in Warehouse
Fabric Access Control and Permission
- Microsoft Fabric Structure
- Tenant Level permissions
- Capacity Level Permissions
- Creating new user in Entra ID
- Workspace roles- Workspace Administration
- Workspace roles – Data pipeline permissions
- Workspace Roles – Notebook, Spark jobs, etc
- Data Warehouse permissions – Intro
- Workspace Roles – Accessing shortcuts internal to fabric – Theory
- Workspace Roles – Accessing Shortcuts Internal to Fabric – Practical
- Workspace Roles – Accessing ADLS shortcuts – Theory
- Workspace Roles – Accessing ADLS shortcuts – Practical
- Workspace Roles – Lakehouse permissions
- Item level permissions – Intro
- Warehouse Sharing – No additional permissions
- Warehouse Sharing – ReadData permissions
- Warehouse Sharing – ReadAll permissions
- Warehouse Sharing – Build permissions
- Extend Microsoft Fabric Trail
- Lakehouse Sharing – All permissions
- Notebook – Item Sharing
- Manage OneLake data access
- Row-Level Security in Warehouse and SQL endpoint
- Dynamic Data Masking in Warehouse and SQL endpoint
- Column & Object level security in Warehouse and SQL endpoint
End to End project using Fabric
- Different Medallion architectures in Fabric
- Understanding domain and dataset information
- Project Architecture
- Creating workspace for project and review dataset
- Get data from Raw to landing – theory
- Raw to landing zone
- Different incremental loading patterns
- Incrementally ingest from Raw to landing zone
- Automate ingest from Raw to Landing using pipeline
- Ingest data from Landing to Bronze layer – Theory
- Understanding UPSERT logic for Landing to Bronze ingestion
- Landing to Bronze layer – practical
- Reading landing to bronze from next partition
- UPSERT scenario practical – Landing to bronze
- Bronze layer to Silver layer – Theory
- Understanding data transformations and UPSERT logic for Silver table
- Silver table – Data cleaning
- Silver Layer – data transformations
- Gold Layer – Facts and dimensions table – Theory
- Gold Layer – Facts and dimension tables – Practical
- Data modelling and creating a report
- Orchestrate end to end pipeline and execute it
GIT Integration
- Creating data sources for PROD
- Changes made to support Git integration
- Executing to check if changes were working
- Sign up with Azure DevOps account
- Connect Fabric workspace to Azure DevOps
- Git integration permissions and Limitations
- Locking main branch with branch policy
- Understanding Continuous Integration (CI) in Fabric
- Continuous Integration in Fabric Workspace
- Status of workspace created for feature branch
- Understanding Continuous Deployment in Fabric
- Deploying Fabric items from Dev to Prod
- Deployment rules to Change data sources of Prod workspace
- End to End execution in PROD
- Git integration for Power BI developers
Pyspark
Apache Spark using SQL – Getting Started
- Launching and using Spark SQL CLI
- Understanding Spark Metastore Warehouse Directory
- Managing Spark Metastore Databases
- Managing Spark Metastore Tables
- Retrieve Metadata of Spark Metastore Tables
- Role of Spark Metastore or Hive Metastore
- Example to working with Dataframe
- DataFrame with SparkSQL shell
- Spark DataFrame
- working with dataframe row
- working with Dataframe row and unit test
- working with Dataframe row and unstructure data
- working with dataframe column
- DataFrame partition and Executors
- Creating and using UDF
- Aggregation in DataFrame
- Windowing in dataframe
- -Grouping Aggregation in Dataframe
- DataFrame joins
- Internal Joins & shuffle
- Optimizing joins
- Implementing Bucket joins
- Spark Transformation and Actions
- Spark Jobs Stages & Task
- Understanding Execution plan
- Unit Testing in Spark
- Debuging Spark Driver and Executor
- Spark Application logs in cluster
Assignment :
Spark SQL Exercise
Apache Spark using SQL – Pre-defined Function
- Overview of Pre-defined Functions using Spark SQL
- Validating Functions using Spark SQL
- String Manipulation Functions using Spark SQL
- Date Manipulation Functions using Spark SQL
- Overview of Numeric Functions using Spark SQL
- Data Type Conversion using Spark SQL
- Dealing with Nulls using Spark SQL
- Using CASE and WHEN using Spark SQL
Apache Spark using SQL – Basic Transformations
- Prepare or Create Tables using Spark SQL
- Projecting or Selecting Data using Spark SQL
- Filtering Data using Spark SQL
- Joining Tables using Spark SQL – Inner
- Joining Tables using Spark SQL – Outer
- Aggregating Data using Spark SQL
- Sorting Data using Spark SQL
Apache Spark using SQL – Basic DDL and DML
- Introduction to Basic DDL and DML using Spark SQL
- Create Spark Metastore Tables using Spark SQL
- Overview of Data Types for Spark Metastore Table Columns
- Adding Comments to Spark Metastore Tables using Spark SQL
- Loading Data Into Spark Metastore Tables using Spark SQL – Local
- Loading Data Into Spark Metastore Tables using Spark SQL – HDFS
- Loading Data into Spark Metastore Tables using Spark SQL – Append and Overwrite
- . Creating External Tables in Spark Metastore using Spark SQL
- Managed Spark Metastore Tables vs External Spark Metastore Tables
- Overview of Spark Metastore Table File Formats
- Drop Spark Metastore Tables and Databases
- Truncating Spark Metastore Tables
- Exercise – Managed Spark Metastore Tables
Apache Spark using SQL – DML and Partitioning
- Introduction to DML and Partitioning of Spark Metastore Tables using Spark SQL
- ntroduction to Partitioning of Spark Metastore Tables using Spark SQL
- Creating Spark Metastore Tables using Parquet File Format
- Load vs. Insert into Spark Metastore Tables using Spark SQL
- Inserting Data using Stage Spark Metastore Table using Spark SQL
- Creating Partitioned Spark Metastore Tables using Spark SQL
- Adding Partitions to Spark Metastore Tables using Spark SQL
- Loading Data into Partitioned Spark Metastore Tables using Spark SQL
- Inserting Data into Partitions of Spark Metastore Tables using Spark SQL
- Using Dynamic Partition Mode to insert data into Spark Metastore Tables
Azure Databricks
Introduction and Basic Understanding
- Creating a budget for project
- Creating an Azure Databricks Workspace
- Creating an Azure Datalake Storage Gen2
- Walkthough on databricks Workspace UI
- Introduction to Distributed Data Processing
- What is Azure Databricks
- Azure Databricks Architecture
- Cluster types and configuration
- Behind the scenes when creating cluster
- Sign up for Databricks Community Edition
- Understanding notebook and Markdown basics
- Notebook – Magic Commands
- DBUitls -File System Utilities
- DBUitls -Widget Utilities
- DBUtils – Notebook Utils
- Navigate the Workspace
- Databricks Runtimes
- Clusters Part 1
- Cluster Part 2
- Notebooks
- Libraries
- Repos for Git integration
- Databricks File System (DBFS)
- DBUTILS
- Widgets
- Workflows
- Metastore – Setup external Metastore
- Metastore – Setup external Metastore II
- Hands-on: How to navigate to the databricks service?
- Hands-on: How to create a workspace?
- Hands-on: How to create a spark cluster?
- Hands-on: How to create a notebook?
- Hands-on: How to create a table?
- Hands-on: How to delete a spark cluster?
- Hands-on: How to delete all resources in Azure Cloud?
- What is workspace?
- What is Resource Group?
- What is Databricks Runtime?
- What is notebook?
- Hands-on: Using notebook to visualize data
- Hands-on: Set up Apache Spark with Delta Lake
- Hands-on: Using python to operate delta lake
- Hands-on: Download and install postman
- Hands-on: Generate a token
- Hands-on: Create a spark cluster using REST API
- Hands-on: Delete a spark cluster using REST API
- Hands-on: Permanently delete a spark cluster using REST API
Databricks Developer Tools with Hands on Session Example
- Databricks Notebook, Rest API , Delta Lake What is Databricks Developer tools?
- Hands-on: Download and install python
- Hands-on: How to set up databricks cli?
- Hands-on: How to use databricks cli?
- Hands-on: How to use Databricks Utilities?
- Hands-on: Download and install JDK
- Hands-on: Download and install IntelliJ IDEA
- Hands-on: Using Databricks Utilities API Library in IDE
- Hands-on: How to use databricks in Azure Data Factory
- Hands-on: How to debug the notebook in pipeline?
- Hands-on: ETL with Azure Databricks
- Hands-on: How to debug ETL notebook in ETL pipeline?
Databricks CLI and Rest API
- DataBricks CLI
- Setting up Databricks CLI
- Lab : Workspace CLIS
- Lab : Cluster CLI
- Lab : DBFS CLI
- Lab : Jobs CLI
- Databricks CLI on Windows
- REST API
- Lab : Invoke REST API
- Lab : Job REST API
- Lab : Token Rest API
- Lab : Group API
Data Bricks CLI
- DataBricks CLI
- Setting up Databricks CLI
- Lab : Workspace CLIS
- Lab : Cluster CLI
- Lab : DBFS CLI
- Lab : Jobs CLI
- Databricks CLI on Windows
- REST API
- Lab : Invoke REST API
- Lab : Job REST API
- Lab : Token Rest API
- Lab : Group API
Working with Databricks File System & Security
- Working with DBFS Root
- Mounting ADLS to DBFS
- Drawbacks of Azure Datalake
- What is delta lake
- Understanding Lakehouse Architecture
- DataBricks Security
- Lab : Secret management
- Part I -> Column level Security
- Part II -> Column level Security
- Row level Security
Delta Lake & Delta Table
- Drawbacks of Azure Datalake
- What is delta lake
- Understanding Lakehouse Architecture
- Creating databricks workspace and ADLS for delta lake
- Accessing Datalake storage using service principal
- Sharing data for External Delta Table
- Reading Delta Table
- Delta Table Operations
- Drawbacks of ADLS – practical
- Medallion Lakehouse architecture
- Creating Delta Lake
- Understanding the delta format
- Understanding Transaction Log
- Creating delta tables using SQL Command
- Creating Delta table using PySpark Code
- Uploading files for next lectures
- Schema Enforcement
- Schema Evolution
- Delta Table Time Travel
- Time Travel and Versioning
- Vacuum Command
- Convert to Delta
- Understanding Optimize Command – Demo
- Optimize Command – Practical
- UPSERT using MERGE
- Lab : Create Delta Table (SQL & Python)
- Lab : Read & Write Delta Table
- . Lab : Convert a Parquet table to a Delta table
- Lab : Incremental ETL load
- Lab : Incremental ETL load (@version property)
- Convert Parquet to Delta
- Detailed of Delta Table Schema Validation
- Detailed of Delta Table Schema Evolution
- Look Inside Delta Table
- Delta Table Utilities and Optimization
- Processing XML, JSON , Delta Tables :
- Processing Nested XML file
- Processing Nested JSON file
- Delta Table – Time Travel and Vacuum
- UDF using Pyspark – hands on example
- Spark ingestion
- Disk partitioning
- Storage
- Predicate Pushdown
- Serialization
- Bucketing
- Zordering
- Adaptive Query Execution
Unity Catalog
- What is Unity Catalog
- Creating Access Connector for Databricks
- Creating Metastore in Unity Catalog
- Unity Catalog Object Model
- Roles in Unity Catalog
- Creating users in Azure Entra ID
- User and groups management Practical
- Cluster Policies
- What are cluster pools
- Creating Cluster Pool
- Creating a Dev Catalog
- Unity Catalog Privileges
- Understanding Unity Catalog
- Creating and accessing External location and storage credential
- Managed and External Tables in Unity Catalog
- Working with Securable Objects
- Setup Unity Catalog
- Unity Catalog User Provisioning
Unity Catalog- Mini Project
- Create External Location
- Create Catalogs and Schema
- Create External Tables
- Create Managed Tables
- Create Databricks Workflow
- Data Discovery
- Data Audit
- Data Lineage
- Data Access Control Overview
- Data Access Control Demo
Spark Structure Streaming & Autoloader In DataBricks
- Spark Structured Streaming – basics
- Understanding micro batches and background query
- Supported Sources and Sinks
- WriteStream and checkpoints
- Community Edition Drop databases
- Understanding outputModes
- Understanding Triggers
- Autoloader – Intro
- Autoloader – Schema inference
- What is Autoloader & Demo
- Autoloader Schema Evolution
- How to build incremental pipeline using Autoloader
- Schema Evolution – Demo
- Schema Evolution – Practical
Data Bricks incremental Ingestion tools
- Architecture and Need for Incremental Ingestion
- Using Copy Into with Manual Schema Evolution
- Using Copy Into with Automatic Schema Evolution
- Streaming Ingestion with Manual Schema Evolution
- Streaming Ingestion with Automatic Schema Evolution
- Introduction to Databricks Autoloader
- Autoloader with Automatic Schema Evolution
Notebook CI/CD via Azure Devops with Github
- Integrate databricks notebook with Git providers like Github.
- Configure Continuous Integration – Artefacts to deployed in clusters.
- Configure Continuous delivery using datathirst templates.
- Run notebook on Azure Databricks via Jobs.
- Secure cluster via cluster policy and permission
- DataFactory LinkedServices
- Orchestrate notebook via DataFactory
Project Details
- Typical Medallion Architecture
- Project Architecture
- Understanding the dataset
- Expected Setup
- Creating containers and External Locations
- Creating all schemas dynamically
- Creating bronze Tables Dynamically
Ingestion to Bronze
- Ingesting data to bronze layer – Demo
- Ingesting raw_traffic data to bronze table
- Assignment to get the raw_roads data to bronze table
- Ingesting raw_roads data to bronze Table
- To prove autoloader handles incremental loading
Silver & Gold Layer Transformation
- Transforming Silver Traffic data
- To prove only incremented records were being transformed
- Creating a common Notebook
- Run one notebook from another notebook
- Transforming Silver Roads data
- Getting data to Gold Layer
- Gold Layer Transformations and loading
Live Sessions Price:
For LIVE sessions – Offer price after discount is 259 USD 199 125 USD Or USD13900 INR 11900 INR 9900 Rupees
Sample Course Completion Certificate:
Your course completion certificate looks like this……
Course Features
- Lecture 0
- Quiz 0
- Duration 40 hours
- Skill level All levels
- Language English
- Students 0
- Assessments Yes