Azure Data Engineer with ADF, Synapse, Fabric, Data bricks and PySpark.

Instructor

Raj

₹17,000.00

Azure Data Engineer with ADF, Synapse, Fabric, Data bricks and PySpark -Live Training.

(A complete hands-on program covering Microsoft Fabric, Synapse, Databricks, PySpark & DevOps pipelines)

Isha presents an comprehensive, hands-on training programs focused on Microsoft Fabric, Azure Databricks, Apache Spark, and end-to-end data engineering. The curriculum spans critical areas such as Fabric Warehouse, Data Lakehouse architecture, PySpark transformations, Delta Lake operations, Unity Catalog security, and real-time data streaming. Learners gain expertise in modern data integration techniques, CI/CD workflows using Azure DevOps, and secure data governance. With a strong emphasis on Medallion architecture and real-world projects, our training equips professionals with the technical depth and practical skills required to succeed in today’s data-driven industry.

Prerequisite for training: – Knowledge in python programing and SQL.

About the Instructor:

Raj is a seasoned data engineering professional with over 18 years of experience in multinational corporations (MNCs). Throughout his career, he has specialized in architecting and implementing scalable data solutions using Microsoft Azure services, including Azure Data Factory, Synapse Analytics, Databricks, and Microsoft Fabric. His expertise encompasses designing end-to-end data pipelines, optimizing data workflows, and leveraging cloud technologies to drive business intelligence and analytics.

With 8 years of dedicated training experience, Raj has successfully mentored numerous professionals in the field of data engineering. His teaching approach combines theoretical knowledge with practical, real-world applications, ensuring that learners are equipped with the skills necessary to excel in the industry. Known for his clear communication and hands-on training style, he has been instrumental in guiding students through complex concepts and preparing them for successful careers in data engineering

Live Sessions Price:

For LIVE sessions – Offer price after discount is ~~259 USD~~ 209 USD Or 25000 ~~INR~~ 17000 Rupees

Enroll For Free Demo

What will I learn by the end of this course?

Gain in-depth proficiency in Microsoft Fabric, including Warehouse setup, Dataflows Gen2, Access Control, Lakehouse vs Warehouse design decisions, and SQL analytics features like Time Travel and Zero Copy Clones.
Master PySpark and Apache Spark SQL, covering DataFrames, window functions, joins, transformations, partitioning, UDFs, and advanced performance optimization techniques.
Develop hands-on expertise in Azure Databricks, including cluster setup, DBFS, notebook management, REST API integration, Delta Lake fundamentals, and Medallion Architecture implementation.
Learn Delta Lake features such as Schema Enforcement, Evolution, Time Travel, Vacuum, Optimize, Z-Ordering, and efficient ingestion using Auto Loader and Structured Streaming.
Implement real-world medallion architecture pipelines from raw to bronze, silver, and gold layers with practical exercises, optimized transformations, and data modeling for reporting

Free Demo Session:

17th June @ 9 PM – 10 PM (IST) (Indian Timings)

17th June @ 11:30 AM – 12:30 PM (EST) (U.S Timings)

17th June @ 4:30 PM – 5:30 PM (BST) (UK Timings)

Class Schedule:

For Participants in India: Monday to Friday 9 PM – 10 PM (IST)

For Participants in US: Monday to Friday 11:30 AM – 12:30 PM (EST)

For Participants in UK: Monday to Friday 4:30 PM – 5:30 PM (BST)

What student’s have to say about Trainer:

Fantastic trainer! Each session was well-structured and full of actionable insights – Smitha

The sessions were super interactive, and the trainer made even the most complex Azure components feel simple. I now understand Data Factory pipelines and workspace organization much better than before- Chandu

Thank you for such an informative and well-organized training- Swarna

Loved the way the trainer explained Azure Synapse and Databricks—very hands-on and easy to follow –Anu

Excellent at maintaining engagement throughout. Every session felt well-paced and thoughtfully delivered.- Amaresh

I gained a lot more than I expected, mainly due to the trainer’s teaching style and attention to individual progress –Megha

Salient Features:

40+ Hours of Live Training along with recorded videos
Lifetime access to the recorded videos
Course Completion Certificate

Who can enroll in this course?

Data Engineers looking to deepen their skills in Microsoft Fabric, Databricks, and Delta Lake.
Data Analysts and BI Developers aiming to transition into data engineering or work with large-scale analytics solutions.
Software Developers wanting to learn big data processing using Apache Spark and PySpark.
ETL Developers and Azure Data Factory users interested in advanced data orchestration and automation.
DevOps Engineers and Cloud Engineers working with CI/CD pipelines, Git integration, and Azure DevOps.
Database Administrators (DBAs) moving toward cloud-based data platforms.
Anyone preparing for roles in modern data platforms, including Lakehouse and streaming data architectures.

Course syllabus:

Azure Data Factory + Synapse : 8hrs

Azure Fabric : 10hrs

Pyspark : 8hrs

Databricks : 14hrs

Azure Data Factory

What is Azure Data Factory?
Create Azure Data Factory service
Building Blocks of Data Factory
ADF Provisioning
Linked Services

ADF Activity, Control Flow & Copy Activity

Lookup Activity
Get Metadata Activity
Filter Activity
For Each Loop
If else condition
Execute Pipeline activity
First Pipeline – Lookup / Set Variable / Datasets
Foreach Activity – Processing Items In A Loop
Using Stored Procedures With Lookup Activity
Read File/Folder Properties Using Get Metadata Activity
Validation Activity Vs Get Metadata Activity
Conditional Execution Using IF Activity
Copy Data Activity – Scenario 1
Copy Data Activity – Scenario 2
Assignment 1: Copy files from local filesystem to Azure SQL Database
Assignment 2: Load a table from one db to another db based on a condition
Project -LAB-1 Design & Build First Metadata Driven ETL Framework
Proejct-LAB-2 Design & Build First Metadata Driven ETL Framework
Using Wait Activity As A Timer
Using Fail Activity To Raise Exceptions
Using Append Activity With Array Variable
Using Filter Activity To Selectively Process Files
Using Delete Activity To Cleanup Files After Processing
Copy A Single JSON File
Copy A Single TEXT File
Copy A Single PARQUET File
Copy All Files In A Folder
Binary File Copy & Implementing File Move
File To Table Copy Using Insert & Upsert Techniques
File To Table Copy Using Stored Procedure
File To Table Copy – Large File Issue
Table To File Copy
Master-Child Pattern Using Execute Pipeline Activity
Using Self Hosted Integration Runtime To Ingest On-Prem Data
Parameterized Linked Service & ADF Global Parameters
Automated Execution Of Pipeline Using Scheduled Triggers
Event-Based Execution Of Pipeline Using Event Triggers
ADF Limitation – No Iterative Activity Within Foreach Activity
ADF Limitation – No Iterative Activity Within IF Activity
ADF Limitation – No Iterative Activity Within SWITCH Activity
Sequential Vs Parallel Batch In A Foreach Activity
ADF Limitation – Dynamic Execution Of Pipeline
ADF Limitation – Record Number & Data Size Restrictions
ADF Limitation – Dynamic Variable Name In Set Variable

Project: Meta Driven ETL Framework

Set Up Azure Active Directory Users/Groups & Key Vault
Set Up Azure Storage
Set Up Azure SQL Database
Set Up Additional Groups & Users
ETL Framework Metadata Tables
Set Up Azure Data Factory
Modular & Reusable Design
Generic Pipeline to Extract From SQL Database -1
Generic Pipeline to Extract From SQL Database -2
Generic Pipeline to Extract From SQL Database -3
Use Case : Historical Or Intial Load executing with dynamic configuration approach
Use Case : Incremental Load from Azure SQL to Azure Data Lake using 10 minute SLA .

Data Ingestion

Data Ingestion – Integration Runtimes
Data Ingestion – What is Self Hosted Integration Runtime
Overview of On-premise data source and Datalake
Downloading and installing Self Hosted IR in On-premise
UPDATE – Self Hosted IR Files Access issue
Creating and adding Secrets to Azure Key vault
Creating Linked Service for Azure Key vault – Demo
Creating Linked Service and Dataset for On-premise File
UPDATE -Fix access issue- Create Azure VM and install Self
UPDATED- Fix ‘host’ is not allowed error
Creating Linked Service and Dataset for Azure Datalake
Creating Copy Activity to copy all files from On-premise to Azure
Incremental data loading using Last Modified Date of File
Incremental Load based on File Name – Demo
Incremental Data loading based on Filename – Practical

Parameterize

Parameterize Linked Service, DataSets, Pipeline
Monitor Visually
Azure Monitor

Real Time Use Case – Frequently used into the project

Apply UPSERT into ADF – using Copy Activity
One Prem to Azure Cloud Migration
Remove Duplicate record in ADF
How to handle NULL From ADF
Remove specific Rows in File using ADF
Remove 1^st few Rows and last few Rows From ADF
Handle Error Handling in Data Flow Mapping
Get File Name From Source
Copy Files based on last modified Date
Build ETL Pipeline
Modular & Resuable Design
Passing Parent pipeline Run ID & Parent Pipeline Name to Child Pipeline .
Slowly Changing Dimension Type I
Lab: Slowly Changing Dimension Type 1
Artifacts for Tables used in the Lab session of SCD Type 1
Slowly Changing Dimension Type 2 (Concepts)
Artifacts for Tables used in the Lab Session of SCD Type II
Lab: Slowly Changing Dimension Type 2

Azure Synapse Analytics

Why Warehousing in Cloud
Traditional vs Modern Warehouse architecture
What is Synapse Analytics Service
Demo: Create Dedicated SQL Pool
Demo: Connect Dedicated SQL Pool with SSMS
Demo: Create Azure Synapse Analytics Studio Workspace
Demo: Explore Synapse Studio V2
Demo: Create Dedicated SQL Pool and Spark Pool
Demo: Analyse Data using Dedicated SQL Pool
Demo: Analyse Data using Apache Spark Notebook
Demo: Analyse Data using Serverless SQL Pool
Demo: Data Factory from Synapse Analytics Studio
Demo: Monitor Synapse Studio

Azure Synapse Benefits

Introduction:
What is Microsoft Fabric?
Fabric Signup
Creating Fabric Workspace
Fabric Pricing
Creating storage account in Azure
Creating Azure Synapse Analytics Service in Azure
Evolution of Data Architectures
Delta Lake Structure
Why Microsoft Fabric is needed
Microsoft’s definition of Fabric
How to enable and access Microsoft Fabric
Fabric License and costing
Update in Fabric UI
Experiences in Microsoft Fabric
Fabric Terminology
OneLake in Fabric
One copy for all Computes in Microsoft Fabric

Fabric Lakehouse

Understanding Fabric Workspaces
Enable Fabric Trail and Create workspace
Purchasing Fabric Capacity from Azure
Workspace roles in Microsoft Fabric
Update in the create items UI
Creating a Lakehouse
What is inside lakehouse
Uploading data to Lakehouse
Uploading Folder into Lakehouse
SQL analytics endpoint in Lakehouse
Access SQL analytics endpoint using SSMS
Visual Query in SQL endpoint
Default Semantic Model
OneLake File Explorer

Fabric Datafactory

Fabric Data Factory UI
Ways to load data into Lakehouse
Fabric Data Factory vs Azure Data Factory Scenario
Gateway types in Microsoft Fabric
Installing On-prem data gateway
Create Connection to SQL Server
Pipeline to ingest OnPrem SQL data to Lakehouse
Scenario completed using Fabric data factory
Dataflow Gen2 – Intro
Creating DataFlow Gen2
DataFlow Gen2 in Fabric vs Dataflows in ADF

OneLake in Fabric

Shortcuts in Fabric – Intro
Prerequisites to Create a shortcut
Creating a shortcut in Files of Lakehouse
Criteria to create shortcuts in table section
Uploading required files and access for synapse
Right way to create a shortcut in table’s section
Creating delta file
Creating shortcut in Table’s section
Scenario – Creating shortcut with delta in a subfolder
Scenario – Creating shortcut with only parquet format
Requirements to create shortcuts in Table and files section
Updation Scenario 1 – Lakehouse to Datalake
Updation Scenario 2 – Datalake to Lakehouse
Shortcut deletion scenarios intro
Deletion Scenario 1 – Delete in Lakehouse files
Deletion Scenario 2 – Delete in ADLS
Deletion Scenario 3 – Delete table data in Lakehouse
Deletion Scenario 4 – Delete table data in ADLS
Deletion Scenario 5 – Deleting entire shortcut
Shortcut deleting scenario summary

Fabric Synapse Data Engineering

Ingestion to Lakehouse status
Spark in Microsoft Fabric
Spark pools in Microsoft Fabric
Spark pool node size
Customizing Starter pools
Creating a custom pool in Workspace
Standard vs High Concurrency Sessions
Changing Spark Settings to StarterPool
Update in attaching Lakehouse to Notebook Option
Understanding Notebooks UI
Fabric Notebook basics
MSSparkUtils – Intro
MSSparkUtils – FS- Mount
MSSparkUtils – FS – Other utils
MSSparkUtils – FS – FastCp
Creating Folders in Microsoft Fabric
MSSparkUtils – Notebook Utils – Run exit
MSSparkUtils – Notebook – RunMultiple
Access ADLS data to Lakehouse – Intro
Access ADLS using Entra ID
Access ADLS using Service principal
Access ADLS using SP with keyvault
Call Fabric notebook from Fabric pipeline
Managed vs External table – Intro
Create a Managed Table
Create an External Table
Shortcut Table is an external or managed table
Data Wrangler in Fabric Notebook
Environments in Microsoft Fabric
Understanding V-order optimization
Inspire us with your Thoughts
Spark Job Definition
What is a data mesh
Creating domains in Fabric

Synapse Migration to Fabric

Manual import from Synapse to Fabric
Automated way to import and export notebooks – Intro
Migrate all notebooks from Synapse to fabric
Possibility of Migration of Pipelines to Fabric pipelines
Ways to migrate ADLS data to Fabric OneLake
Migrate ADLS data to Onelake using Storage Explorer
Install Capacity Metrics App
Understanding UI of Capacity Metrics App
Capacity Units consumption
Throttling vs Smoothing
Throttling stage- Overage Protection Policy
Other throttling stages
Throttling stages Summary
Overages in Fabric
System Events in Fabric
Matrix Visual

Fabric Warehouse Synapse

Creating a Warehouse in Fabric
Warehouse vs SQL Analytics Endpoint
Creating a table and Limitations
Ways to Load Data into Warehouse
Loading Data using COPY INTO Command
Loading Data using Pipeline to Warehouse
Loading Data using DataFlow Gen2
Data Sharing – Lakehouse & Warehouse
Cross Database Ingestion in Warehouse
Lakehouse vs Warehouse when to choose what
Different Medallion Architectural patterns
Update Lakehouse data from WH and vice versa
SQL query as session in Fabric
Zero Copy clone within and across Schema
Time Travel in Warehouse
Benefits & Limitations of Zero Copy clones
Cloning single or multiple tables using UI
Query Insights in Warehouse

Fabric Access Control and Permission

Microsoft Fabric Structure
Tenant Level permissions
Capacity Level Permissions
Creating new user in Entra ID
Workspace roles- Workspace Administration
Workspace roles – Data pipeline permissions
Workspace Roles – Notebook, Spark jobs, etc
Data Warehouse permissions – Intro
Workspace Roles – Accessing shortcuts internal to fabric – Theory
Workspace Roles – Accessing Shortcuts Internal to Fabric – Practical
Workspace Roles – Accessing ADLS shortcuts – Theory
Workspace Roles – Accessing ADLS shortcuts – Practical
Workspace Roles – Lakehouse permissions
Item level permissions – Intro
Warehouse Sharing – No additional permissions
Warehouse Sharing – ReadData permissions
Warehouse Sharing – ReadAll permissions
Warehouse Sharing – Build permissions
Extend Microsoft Fabric Trail
Lakehouse Sharing – All permissions
Notebook – Item Sharing
Manage OneLake data access
Row-Level Security in Warehouse and SQL endpoint
Dynamic Data Masking in Warehouse and SQL endpoint
Column & Object level security in Warehouse and SQL endpoint

End to End project using Fabric

Different Medallion architectures in Fabric
Understanding domain and dataset information
Project Architecture
Creating workspace for project and review dataset
Get data from Raw to landing – theory
Raw to landing zone
Different incremental loading patterns
Incrementally ingest from Raw to landing zone
Automate ingest from Raw to Landing using pipeline
Ingest data from Landing to Bronze layer – Theory
Understanding UPSERT logic for Landing to Bronze ingestion
Landing to Bronze layer – practical
Reading landing to bronze from next partition
UPSERT scenario practical – Landing to bronze
Bronze layer to Silver layer – Theory
Understanding data transformations and UPSERT logic for Silver table
Silver table – Data cleaning
Silver Layer – data transformations
Gold Layer – Facts and dimensions table – Theory
Gold Layer – Facts and dimension tables – Practical
Data modelling and creating a report
Orchestrate end to end pipeline and execute it

GIT Integration

Creating data sources for PROD
Changes made to support Git integration
Executing to check if changes were working
Sign up with Azure DevOps account
Connect Fabric workspace to Azure DevOps
Git integration permissions and Limitations
Locking main branch with branch policy
Understanding Continuous Integration (CI) in Fabric
Continuous Integration in Fabric Workspace
Status of workspace created for feature branch
Understanding Continuous Deployment in Fabric
Deploying Fabric items from Dev to Prod
Deployment rules to Change data sources of Prod workspace
End to End execution in PROD
Git integration for Power BI developers

Pyspark

Apache Spark using SQL – Getting Started

Launching and using Spark SQL CLI
Understanding Spark Metastore Warehouse Directory
Managing Spark Metastore Databases
Managing Spark Metastore Tables
Retrieve Metadata of Spark Metastore Tables
Role of Spark Metastore or Hive Metastore
Example to working with Dataframe
- DataFrame with SparkSQL shell
- Spark DataFrame
working with dataframe row
working with Dataframe row and unit test
working with Dataframe row and unstructure data
working with dataframe column
DataFrame partition and Executors
Creating and using UDF
Aggregation in DataFrame
Windowing in dataframe
- -Grouping Aggregation in Dataframe
DataFrame joins
Internal Joins & shuffle
Optimizing joins
Implementing Bucket joins
Spark Transformation and Actions
Spark Jobs Stages & Task
Understanding Execution plan
Unit Testing in Spark
Debuging Spark Driver and Executor
Spark Application logs in cluster

Assignment :

Spark SQL Exercise

Apache Spark using SQL – Pre-defined Function

Overview of Pre-defined Functions using Spark SQL
Validating Functions using Spark SQL
String Manipulation Functions using Spark SQL
Date Manipulation Functions using Spark SQL
Overview of Numeric Functions using Spark SQL
Data Type Conversion using Spark SQL
Dealing with Nulls using Spark SQL
Using CASE and WHEN using Spark SQL

Apache Spark using SQL – Basic Transformations

Prepare or Create Tables using Spark SQL
Projecting or Selecting Data using Spark SQL
Filtering Data using Spark SQL
Joining Tables using Spark SQL – Inner
Joining Tables using Spark SQL – Outer
Aggregating Data using Spark SQL
Sorting Data using Spark SQL

Apache Spark using SQL – Basic DDL and DML

Introduction to Basic DDL and DML using Spark SQL
Create Spark Metastore Tables using Spark SQL
Overview of Data Types for Spark Metastore Table Columns
Adding Comments to Spark Metastore Tables using Spark SQL
Loading Data Into Spark Metastore Tables using Spark SQL – Local
Loading Data Into Spark Metastore Tables using Spark SQL – HDFS
Loading Data into Spark Metastore Tables using Spark SQL – Append and Overwrite
. Creating External Tables in Spark Metastore using Spark SQL
Managed Spark Metastore Tables vs External Spark Metastore Tables
Overview of Spark Metastore Table File Formats
Drop Spark Metastore Tables and Databases
Truncating Spark Metastore Tables
Exercise – Managed Spark Metastore Tables

Apache Spark using SQL – DML and Partitioning

Introduction to DML and Partitioning of Spark Metastore Tables using Spark SQL
ntroduction to Partitioning of Spark Metastore Tables using Spark SQL
Creating Spark Metastore Tables using Parquet File Format
Load vs. Insert into Spark Metastore Tables using Spark SQL
Inserting Data using Stage Spark Metastore Table using Spark SQL
Creating Partitioned Spark Metastore Tables using Spark SQL
Adding Partitions to Spark Metastore Tables using Spark SQL
Loading Data into Partitioned Spark Metastore Tables using Spark SQL
Inserting Data into Partitions of Spark Metastore Tables using Spark SQL
Using Dynamic Partition Mode to insert data into Spark Metastore Tables

Azure Databricks

Introduction and Basic Understanding

Creating a budget for project
Creating an Azure Databricks Workspace
Creating an Azure Datalake Storage Gen2
Walkthough on databricks Workspace UI
Introduction to Distributed Data Processing
What is Azure Databricks
Azure Databricks Architecture
Cluster types and configuration
Behind the scenes when creating cluster
Sign up for Databricks Community Edition
Understanding notebook and Markdown basics
Notebook – Magic Commands
DBUitls -File System Utilities
DBUitls -Widget Utilities
DBUtils – Notebook Utils
Navigate the Workspace
Databricks Runtimes
Clusters Part 1
Cluster Part 2
Notebooks
Libraries
Repos for Git integration
Databricks File System (DBFS)
DBUTILS
Widgets
Workflows
Metastore – Setup external Metastore
Metastore – Setup external Metastore II
Hands-on: How to navigate to the databricks service?
Hands-on: How to create a workspace?
Hands-on: How to create a spark cluster?
Hands-on: How to create a notebook?
Hands-on: How to create a table?
Hands-on: How to delete a spark cluster?
Hands-on: How to delete all resources in Azure Cloud?
What is workspace?
What is Resource Group?
What is Databricks Runtime?
What is notebook?
Hands-on: Using notebook to visualize data
Hands-on: Set up Apache Spark with Delta Lake
Hands-on: Using python to operate delta lake
Hands-on: Download and install postman
Hands-on: Generate a token
Hands-on: Create a spark cluster using REST API
Hands-on: Delete a spark cluster using REST API
Hands-on: Permanently delete a spark cluster using REST API

Databricks Developer Tools with Hands on Session Example

Databricks Notebook, Rest API , Delta Lake What is Databricks Developer tools?
Hands-on: Download and install python
Hands-on: How to set up databricks cli?
Hands-on: How to use databricks cli?
Hands-on: How to use Databricks Utilities?
Hands-on: Download and install JDK
Hands-on: Download and install IntelliJ IDEA
Hands-on: Using Databricks Utilities API Library in IDE
Hands-on: How to use databricks in Azure Data Factory
Hands-on: How to debug the notebook in pipeline?
Hands-on: ETL with Azure Databricks
Hands-on: How to debug ETL notebook in ETL pipeline?

Databricks CLI and Rest API

DataBricks CLI
Setting up Databricks CLI
Lab : Workspace CLIS
Lab : Cluster CLI
Lab : DBFS CLI
Lab : Jobs CLI
Databricks CLI on Windows
REST API
Lab : Invoke REST API
Lab : Job REST API
Lab : Token Rest API
Lab : Group API

Data Bricks CLI

DataBricks CLI
Setting up Databricks CLI
Lab : Workspace CLIS
Lab : Cluster CLI
Lab : DBFS CLI
Lab : Jobs CLI
Databricks CLI on Windows
REST API
Lab : Invoke REST API
Lab : Job REST API
Lab : Token Rest API
Lab : Group API

Working with Databricks File System & Security

Working with DBFS Root
Mounting ADLS to DBFS
Drawbacks of Azure Datalake
What is delta lake
Understanding Lakehouse Architecture
DataBricks Security
Lab : Secret management
Part I -> Column level Security
Part II -> Column level Security
Row level Security

Delta Lake & Delta Table

Drawbacks of Azure Datalake
What is delta lake
Understanding Lakehouse Architecture
Creating databricks workspace and ADLS for delta lake
Accessing Datalake storage using service principal
Sharing data for External Delta Table
Reading Delta Table
Delta Table Operations
Drawbacks of ADLS – practical
Medallion Lakehouse architecture
Creating Delta Lake
Understanding the delta format
Understanding Transaction Log
Creating delta tables using SQL Command
Creating Delta table using PySpark Code
Uploading files for next lectures
Schema Enforcement
Schema Evolution
Delta Table Time Travel
Time Travel and Versioning
Vacuum Command
Convert to Delta
Understanding Optimize Command – Demo
Optimize Command – Practical
UPSERT using MERGE
Lab : Create Delta Table (SQL & Python)
Lab : Read & Write Delta Table
. Lab : Convert a Parquet table to a Delta table
Lab : Incremental ETL load
Lab : Incremental ETL load (@version property)
Convert Parquet to Delta
Detailed of Delta Table Schema Validation
Detailed of Delta Table Schema Evolution
Look Inside Delta Table
Delta Table Utilities and Optimization
Processing XML, JSON , Delta Tables :
Processing Nested XML file
Processing Nested JSON file
Delta Table – Time Travel and Vacuum
UDF using Pyspark – hands on example
Spark ingestion
Disk partitioning
Storage
Predicate Pushdown
Serialization
Bucketing
Zordering
Adaptive Query Execution

Unity Catalog

What is Unity Catalog
Creating Access Connector for Databricks
Creating Metastore in Unity Catalog
Unity Catalog Object Model
Roles in Unity Catalog
Creating users in Azure Entra ID
User and groups management Practical
Cluster Policies
What are cluster pools
Creating Cluster Pool
Creating a Dev Catalog
Unity Catalog Privileges
Understanding Unity Catalog
Creating and accessing External location and storage credential
Managed and External Tables in Unity Catalog
Working with Securable Objects
Setup Unity Catalog
Unity Catalog User Provisioning

Unity Catalog- Mini Project

Create External Location
Create Catalogs and Schema
Create External Tables
Create Managed Tables
Create Databricks Workflow
Data Discovery
Data Audit
Data Lineage
Data Access Control Overview
Data Access Control Demo

Spark Structure Streaming & Autoloader In DataBricks

Spark Structured Streaming – basics
Understanding micro batches and background query
Supported Sources and Sinks
WriteStream and checkpoints
Community Edition Drop databases
Understanding outputModes
Understanding Triggers
Autoloader – Intro
Autoloader – Schema inference
What is Autoloader & Demo
Autoloader Schema Evolution
How to build incremental pipeline using Autoloader
Schema Evolution – Demo
Schema Evolution – Practical

Data Bricks incremental Ingestion tools

Architecture and Need for Incremental Ingestion
Using Copy Into with Manual Schema Evolution
Using Copy Into with Automatic Schema Evolution
Streaming Ingestion with Manual Schema Evolution
Streaming Ingestion with Automatic Schema Evolution
Introduction to Databricks Autoloader
Autoloader with Automatic Schema Evolution

Notebook CI/CD via Azure Devops with Github

Integrate databricks notebook with Git providers like Github.
Configure Continuous Integration – Artefacts to deployed in clusters.
Configure Continuous delivery using datathirst templates.
Run notebook on Azure Databricks via Jobs.
Secure cluster via cluster policy and permission
DataFactory LinkedServices
Orchestrate notebook via DataFactory

Project Details

Typical Medallion Architecture
Project Architecture
Understanding the dataset
Expected Setup
Creating containers and External Locations
Creating all schemas dynamically
Creating bronze Tables Dynamically

Ingestion to Bronze

Ingesting data to bronze layer – Demo
Ingesting raw_traffic data to bronze table
Assignment to get the raw_roads data to bronze table
Ingesting raw_roads data to bronze Table
To prove autoloader handles incremental loading

Silver & Gold Layer Transformation

Transforming Silver Traffic data
To prove only incremented records were being transformed
Creating a common Notebook
Run one notebook from another notebook
Transforming Silver Roads data
Getting data to Gold Layer
Gold Layer Transformations and loading

Live Sessions Price:

For LIVE sessions – Offer price after discount is ~~259 USD~~ 209 USD Or 25000 ~~INR~~ 17000 Rupees

Enroll For Free Demo

Sample Course Completion Certificate:

Your course completion certificate looks like this……

Important Note:

To maintain the quality of our training and ensure smooth progress for all learners, we do not allow batch repetition or switching between courses. Once you enroll in a batch, please make sure to attend the classes regularly as per the schedule. We kindly request you to plan your learning accordingly. Thank you for your support and understanding

Course Features

Lectures 305
Quiz 0
Duration 40 hours
Skill level All levels
Language English
Students 0
Assessments Yes

Curriculum

18 Sections
305 Lessons
40 Hours

Expand all sectionsCollapse all sections

Raj

Raj is a seasoned data engineering professional with over 18 years of experience in multinational corporations (MNCs). Throughout his career, he has specialized in architecting and implementing scalable data solutions using Microsoft Azure services, including Azure Data Factory, Synapse Analytics, Databricks, and Microsoft Fabric. His expertise encompasses designing end-to-end data pipelines, optimizing data workflows, and leveraging cloud technologies to drive business intelligence and analytics.

With 8 years of dedicated training experience, Raj has successfully mentored numerous professionals in the field of data engineering. His teaching approach combines theoretical knowledge with practical, real-world applications, ensuring that learners are equipped with the skills necessary to excel in the industry. Known for his clear communication and hands-on training style, he has been instrumental in guiding students through complex concepts and preparing them for successful careers in data engineering