Azure Data Engineer with ADF, Synapse, Fabric, Data bricks and PySpark -Live Training.
(A complete hands-on program covering Microsoft Fabric, Synapse, Databricks, PySpark & DevOps pipelines)
Isha presents an comprehensive, hands-on training programs focused on Microsoft Fabric, Azure Databricks, Apache Spark, and end-to-end data engineering. The curriculum spans critical areas such as Fabric Warehouse, Data Lakehouse architecture, PySpark transformations, Delta Lake operations, Unity Catalog security, and real-time data streaming. Learners gain expertise in modern data integration techniques, CI/CD workflows using Azure DevOps, and secure data governance. With a strong emphasis on Medallion architecture and real-world projects, our training equips professionals with the technical depth and practical skills required to succeed in today’s data-driven industry.
Prerequisite for training: – Knowledge in python programing and SQL.
About the Instructor:
Raj is a seasoned data engineering professional with over 18 years of experience in multinational corporations (MNCs). Throughout his career, he has specialized in architecting and implementing scalable data solutions using Microsoft Azure services, including Azure Data Factory, Synapse Analytics, Databricks, and Microsoft Fabric. His expertise encompasses designing end-to-end data pipelines, optimizing data workflows, and leveraging cloud technologies to drive business intelligence and analytics.
With 8 years of dedicated training experience, Raj has successfully mentored numerous professionals in the field of data engineering. His teaching approach combines theoretical knowledge with practical, real-world applications, ensuring that learners are equipped with the skills necessary to excel in the industry. Known for his clear communication and hands-on training style, he has been instrumental in guiding students through complex concepts and preparing them for successful careers in data engineering
Live Sessions Price:
For LIVE sessions – Offer price after discount is 259 USD 209 USD Or 25000 INR 17000 Rupees
OR
What will I learn by the end of this course?
- Gain in-depth proficiency in Microsoft Fabric, including Warehouse setup, Dataflows Gen2, Access Control, Lakehouse vs Warehouse design decisions, and SQL analytics features like Time Travel and Zero Copy Clones.
- Master PySpark and Apache Spark SQL, covering DataFrames, window functions, joins, transformations, partitioning, UDFs, and advanced performance optimization techniques.
- Develop hands-on expertise in Azure Databricks, including cluster setup, DBFS, notebook management, REST API integration, Delta Lake fundamentals, and Medallion Architecture implementation.
- Learn Delta Lake features such as Schema Enforcement, Evolution, Time Travel, Vacuum, Optimize, Z-Ordering, and efficient ingestion using Auto Loader and Structured Streaming.
- Implement real-world medallion architecture pipelines from raw to bronze, silver, and gold layers with practical exercises, optimized transformations, and data modeling for reporting
Free Demo Session:
17th June @ 9 PM – 10 PM (IST) (Indian Timings)
17th June @ 11:30 AM – 12:30 PM (EST) (U.S Timings)
17th June @ 4:30 PM – 5:30 PM (BST) (UK Timings)
Class Schedule:
For Participants in India: Monday to Friday 9 PM – 10 PM (IST)
For Participants in US: Monday to Friday 11:30 AM – 12:30 PM (EST)
For Participants in UK: Monday to Friday 4:30 PM – 5:30 PM (BST)
What student’s have to say about Trainer:
Fantastic trainer! Each session was well-structured and full of actionable insights – Smitha
The sessions were super interactive, and the trainer made even the most complex Azure components feel simple. I now understand Data Factory pipelines and workspace organization much better than before- Chandu
Thank you for such an informative and well-organized training- Swarna
Loved the way the trainer explained Azure Synapse and Databricks—very hands-on and easy to follow –Anu
Excellent at maintaining engagement throughout. Every session felt well-paced and thoughtfully delivered.- Amaresh
I gained a lot more than I expected, mainly due to the trainer’s teaching style and attention to individual progress –Megha
Salient Features:
- 40+ Hours of Live Training along with recorded videos
- Lifetime access to the recorded videos
- Course Completion Certificate
Who can enroll in this course?
- Data Engineers looking to deepen their skills in Microsoft Fabric, Databricks, and Delta Lake.
- Data Analysts and BI Developers aiming to transition into data engineering or work with large-scale analytics solutions.
- Software Developers wanting to learn big data processing using Apache Spark and PySpark.
- ETL Developers and Azure Data Factory users interested in advanced data orchestration and automation.
- DevOps Engineers and Cloud Engineers working with CI/CD pipelines, Git integration, and Azure DevOps.
- Database Administrators (DBAs) moving toward cloud-based data platforms.
- Anyone preparing for roles in modern data platforms, including Lakehouse and streaming data architectures.
Course syllabus:
Azure Data Factory + Synapse : 8hrs
Azure Fabric : 10hrs
Pyspark : 8hrs
Databricks : 14hrs
Azure Data Factory
- What is Azure Data Factory?
- Create Azure Data Factory service
- Building Blocks of Data Factory
- ADF Provisioning
- Linked Services
ADF Activity, Control Flow & Copy Activity
- Lookup Activity
- Get Metadata Activity
- Filter Activity
- For Each Loop
- If else condition
- Execute Pipeline activity
- First Pipeline – Lookup / Set Variable / Datasets
- Foreach Activity – Processing Items In A Loop
- Using Stored Procedures With Lookup Activity
- Read File/Folder Properties Using Get Metadata Activity
- Validation Activity Vs Get Metadata Activity
- Conditional Execution Using IF Activity
- Copy Data Activity – Scenario 1
- Copy Data Activity – Scenario 2
- Assignment 1: Copy files from local filesystem to Azure SQL Database
- Assignment 2: Load a table from one db to another db based on a condition
- Project -LAB-1 Design & Build First Metadata Driven ETL Framework
- Proejct-LAB-2 Design & Build First Metadata Driven ETL Framework
- Using Wait Activity As A Timer
- Using Fail Activity To Raise Exceptions
- Using Append Activity With Array Variable
- Using Filter Activity To Selectively Process Files
- Using Delete Activity To Cleanup Files After Processing
- Copy A Single JSON File
- Copy A Single TEXT File
- Copy A Single PARQUET File
- Copy All Files In A Folder
- Binary File Copy & Implementing File Move
- File To Table Copy Using Insert & Upsert Techniques
- File To Table Copy Using Stored Procedure
- File To Table Copy – Large File Issue
- Table To File Copy
- Master-Child Pattern Using Execute Pipeline Activity
- Using Self Hosted Integration Runtime To Ingest On-Prem Data
- Parameterized Linked Service & ADF Global Parameters
- Automated Execution Of Pipeline Using Scheduled Triggers
- Event-Based Execution Of Pipeline Using Event Triggers
- ADF Limitation – No Iterative Activity Within Foreach Activity
- ADF Limitation – No Iterative Activity Within IF Activity
- ADF Limitation – No Iterative Activity Within SWITCH Activity
- Sequential Vs Parallel Batch In A Foreach Activity
- ADF Limitation – Dynamic Execution Of Pipeline
- ADF Limitation – Record Number & Data Size Restrictions
- ADF Limitation – Dynamic Variable Name In Set Variable
Project: Meta Driven ETL Framework
- Set Up Azure Active Directory Users/Groups & Key Vault
- Set Up Azure Storage
- Set Up Azure SQL Database
- Set Up Additional Groups & Users
- ETL Framework Metadata Tables
- Set Up Azure Data Factory
- Modular & Reusable Design
- Generic Pipeline to Extract From SQL Database -1
- Generic Pipeline to Extract From SQL Database -2
- Generic Pipeline to Extract From SQL Database -3
- Use Case : Historical Or Intial Load executing with dynamic configuration approach
- Use Case : Incremental Load from Azure SQL to Azure Data Lake using 10 minute SLA .
Data Ingestion
- Data Ingestion – Integration Runtimes
- Data Ingestion – What is Self Hosted Integration Runtime
- Overview of On-premise data source and Datalake
- Downloading and installing Self Hosted IR in On-premise
- UPDATE – Self Hosted IR Files Access issue
- Creating and adding Secrets to Azure Key vault
- Creating Linked Service for Azure Key vault – Demo
- Creating Linked Service and Dataset for On-premise File
- UPDATE -Fix access issue- Create Azure VM and install Self
- UPDATED- Fix ‘host’ is not allowed error
- Creating Linked Service and Dataset for Azure Datalake
- Creating Copy Activity to copy all files from On-premise to Azure
- Incremental data loading using Last Modified Date of File
- Incremental Load based on File Name – Demo
- Incremental Data loading based on Filename – Practical
Parameterize
- Parameterize Linked Service, DataSets, Pipeline
- Monitor Visually
- Azure Monitor
Real Time Use Case – Frequently used into the project
- Apply UPSERT into ADF – using Copy Activity
- One Prem to Azure Cloud Migration
- Remove Duplicate record in ADF
- How to handle NULL From ADF
- Remove specific Rows in File using ADF
- Remove 1st few Rows and last few Rows From ADF
- Handle Error Handling in Data Flow Mapping
- Get File Name From Source
- Copy Files based on last modified Date
- Build ETL Pipeline
- Modular & Resuable Design
- Passing Parent pipeline Run ID & Parent Pipeline Name to Child Pipeline .
- Slowly Changing Dimension Type I
- Lab: Slowly Changing Dimension Type 1
- Artifacts for Tables used in the Lab session of SCD Type 1
- Slowly Changing Dimension Type 2 (Concepts)
- Artifacts for Tables used in the Lab Session of SCD Type II
- Lab: Slowly Changing Dimension Type 2
Azure Synapse Analytics
- Why Warehousing in Cloud
- Traditional vs Modern Warehouse architecture
- What is Synapse Analytics Service
- Demo: Create Dedicated SQL Pool
- Demo: Connect Dedicated SQL Pool with SSMS
- Demo: Create Azure Synapse Analytics Studio Workspace
- Demo: Explore Synapse Studio V2
- Demo: Create Dedicated SQL Pool and Spark Pool
- Demo: Analyse Data using Dedicated SQL Pool
- Demo: Analyse Data using Apache Spark Notebook
- Demo: Analyse Data using Serverless SQL Pool
- Demo: Data Factory from Synapse Analytics Studio
- Demo: Monitor Synapse Studio
Azure Synapse Benefits
- Introduction:
- What is Microsoft Fabric?
- Fabric Signup
- Creating Fabric Workspace
- Fabric Pricing
- Creating storage account in Azure
- Creating Azure Synapse Analytics Service in Azure
- Evolution of Data Architectures
- Delta Lake Structure
- Why Microsoft Fabric is needed
- Microsoft’s definition of Fabric
- How to enable and access Microsoft Fabric
- Fabric License and costing
- Update in Fabric UI
- Experiences in Microsoft Fabric
- Fabric Terminology
- OneLake in Fabric
- One copy for all Computes in Microsoft Fabric
Fabric Lakehouse
- Understanding Fabric Workspaces
- Enable Fabric Trail and Create workspace
- Purchasing Fabric Capacity from Azure
- Workspace roles in Microsoft Fabric
- Update in the create items UI
- Creating a Lakehouse
- What is inside lakehouse
- Uploading data to Lakehouse
- Uploading Folder into Lakehouse
- SQL analytics endpoint in Lakehouse
- Access SQL analytics endpoint using SSMS
- Visual Query in SQL endpoint
- Default Semantic Model
- OneLake File Explorer
Fabric Datafactory
- Fabric Data Factory UI
- Ways to load data into Lakehouse
- Fabric Data Factory vs Azure Data Factory Scenario
- Gateway types in Microsoft Fabric
- Installing On-prem data gateway
- Create Connection to SQL Server
- Pipeline to ingest OnPrem SQL data to Lakehouse
- Scenario completed using Fabric data factory
- Dataflow Gen2 – Intro
- Creating DataFlow Gen2
- DataFlow Gen2 in Fabric vs Dataflows in ADF
OneLake in Fabric
- Shortcuts in Fabric – Intro
- Prerequisites to Create a shortcut
- Creating a shortcut in Files of Lakehouse
- Criteria to create shortcuts in table section
- Uploading required files and access for synapse
- Right way to create a shortcut in table’s section
- Creating delta file
- Creating shortcut in Table’s section
- Scenario – Creating shortcut with delta in a subfolder
- Scenario – Creating shortcut with only parquet format
- Requirements to create shortcuts in Table and files section
- Updation Scenario 1 – Lakehouse to Datalake
- Updation Scenario 2 – Datalake to Lakehouse
- Shortcut deletion scenarios intro
- Deletion Scenario 1 – Delete in Lakehouse files
- Deletion Scenario 2 – Delete in ADLS
- Deletion Scenario 3 – Delete table data in Lakehouse
- Deletion Scenario 4 – Delete table data in ADLS
- Deletion Scenario 5 – Deleting entire shortcut
- Shortcut deleting scenario summary
Fabric Synapse Data Engineering
- Ingestion to Lakehouse status
- Spark in Microsoft Fabric
- Spark pools in Microsoft Fabric
- Spark pool node size
- Customizing Starter pools
- Creating a custom pool in Workspace
- Standard vs High Concurrency Sessions
- Changing Spark Settings to StarterPool
- Update in attaching Lakehouse to Notebook Option
- Understanding Notebooks UI
- Fabric Notebook basics
- MSSparkUtils – Intro
- MSSparkUtils – FS- Mount
- MSSparkUtils – FS – Other utils
- MSSparkUtils – FS – FastCp
- Creating Folders in Microsoft Fabric
- MSSparkUtils – Notebook Utils – Run exit
- MSSparkUtils – Notebook – RunMultiple
- Access ADLS data to Lakehouse – Intro
- Access ADLS using Entra ID
- Access ADLS using Service principal
- Access ADLS using SP with keyvault
- Call Fabric notebook from Fabric pipeline
- Managed vs External table – Intro
- Create a Managed Table
- Create an External Table
- Shortcut Table is an external or managed table
- Data Wrangler in Fabric Notebook
- Environments in Microsoft Fabric
- Understanding V-order optimization
- Inspire us with your Thoughts
- Spark Job Definition
- What is a data mesh
- Creating domains in Fabric
Synapse Migration to Fabric
- Manual import from Synapse to Fabric
- Automated way to import and export notebooks – Intro
- Migrate all notebooks from Synapse to fabric
- Possibility of Migration of Pipelines to Fabric pipelines
- Ways to migrate ADLS data to Fabric OneLake
- Migrate ADLS data to Onelake using Storage Explorer
- Install Capacity Metrics App
- Understanding UI of Capacity Metrics App
- Capacity Units consumption
- Throttling vs Smoothing
- Throttling stage- Overage Protection Policy
- Other throttling stages
- Throttling stages Summary
- Overages in Fabric
- System Events in Fabric
- Matrix Visual
Fabric Warehouse Synapse
- Creating a Warehouse in Fabric
- Warehouse vs SQL Analytics Endpoint
- Creating a table and Limitations
- Ways to Load Data into Warehouse
- Loading Data using COPY INTO Command
- Loading Data using Pipeline to Warehouse
- Loading Data using DataFlow Gen2
- Data Sharing – Lakehouse & Warehouse
- Cross Database Ingestion in Warehouse
- Lakehouse vs Warehouse when to choose what
- Different Medallion Architectural patterns
- Update Lakehouse data from WH and vice versa
- SQL query as session in Fabric
- Zero Copy clone within and across Schema
- Time Travel in Warehouse
- Benefits & Limitations of Zero Copy clones
- Cloning single or multiple tables using UI
- Query Insights in Warehouse
Fabric Access Control and Permission
- Microsoft Fabric Structure
- Tenant Level permissions
- Capacity Level Permissions
- Creating new user in Entra ID
- Workspace roles- Workspace Administration
- Workspace roles – Data pipeline permissions
- Workspace Roles – Notebook, Spark jobs, etc
- Data Warehouse permissions – Intro
- Workspace Roles – Accessing shortcuts internal to fabric – Theory
- Workspace Roles – Accessing Shortcuts Internal to Fabric – Practical
- Workspace Roles – Accessing ADLS shortcuts – Theory
- Workspace Roles – Accessing ADLS shortcuts – Practical
- Workspace Roles – Lakehouse permissions
- Item level permissions – Intro
- Warehouse Sharing – No additional permissions
- Warehouse Sharing – ReadData permissions
- Warehouse Sharing – ReadAll permissions
- Warehouse Sharing – Build permissions
- Extend Microsoft Fabric Trail
- Lakehouse Sharing – All permissions
- Notebook – Item Sharing
- Manage OneLake data access
- Row-Level Security in Warehouse and SQL endpoint
- Dynamic Data Masking in Warehouse and SQL endpoint
- Column & Object level security in Warehouse and SQL endpoint
End to End project using Fabric
- Different Medallion architectures in Fabric
- Understanding domain and dataset information
- Project Architecture
- Creating workspace for project and review dataset
- Get data from Raw to landing – theory
- Raw to landing zone
- Different incremental loading patterns
- Incrementally ingest from Raw to landing zone
- Automate ingest from Raw to Landing using pipeline
- Ingest data from Landing to Bronze layer – Theory
- Understanding UPSERT logic for Landing to Bronze ingestion
- Landing to Bronze layer – practical
- Reading landing to bronze from next partition
- UPSERT scenario practical – Landing to bronze
- Bronze layer to Silver layer – Theory
- Understanding data transformations and UPSERT logic for Silver table
- Silver table – Data cleaning
- Silver Layer – data transformations
- Gold Layer – Facts and dimensions table – Theory
- Gold Layer – Facts and dimension tables – Practical
- Data modelling and creating a report
- Orchestrate end to end pipeline and execute it
GIT Integration
- Creating data sources for PROD
- Changes made to support Git integration
- Executing to check if changes were working
- Sign up with Azure DevOps account
- Connect Fabric workspace to Azure DevOps
- Git integration permissions and Limitations
- Locking main branch with branch policy
- Understanding Continuous Integration (CI) in Fabric
- Continuous Integration in Fabric Workspace
- Status of workspace created for feature branch
- Understanding Continuous Deployment in Fabric
- Deploying Fabric items from Dev to Prod
- Deployment rules to Change data sources of Prod workspace
- End to End execution in PROD
- Git integration for Power BI developers
Pyspark
Apache Spark using SQL – Getting Started
- Launching and using Spark SQL CLI
- Understanding Spark Metastore Warehouse Directory
- Managing Spark Metastore Databases
- Managing Spark Metastore Tables
- Retrieve Metadata of Spark Metastore Tables
- Role of Spark Metastore or Hive Metastore
- Example to working with Dataframe
- DataFrame with SparkSQL shell
- Spark DataFrame
- working with dataframe row
- working with Dataframe row and unit test
- working with Dataframe row and unstructure data
- working with dataframe column
- DataFrame partition and Executors
- Creating and using UDF
- Aggregation in DataFrame
- Windowing in dataframe
- -Grouping Aggregation in Dataframe
- DataFrame joins
- Internal Joins & shuffle
- Optimizing joins
- Implementing Bucket joins
- Spark Transformation and Actions
- Spark Jobs Stages & Task
- Understanding Execution plan
- Unit Testing in Spark
- Debuging Spark Driver and Executor
- Spark Application logs in cluster
Assignment :
Spark SQL Exercise
Apache Spark using SQL – Pre-defined Function
- Overview of Pre-defined Functions using Spark SQL
- Validating Functions using Spark SQL
- String Manipulation Functions using Spark SQL
- Date Manipulation Functions using Spark SQL
- Overview of Numeric Functions using Spark SQL
- Data Type Conversion using Spark SQL
- Dealing with Nulls using Spark SQL
- Using CASE and WHEN using Spark SQL
Apache Spark using SQL – Basic Transformations
- Prepare or Create Tables using Spark SQL
- Projecting or Selecting Data using Spark SQL
- Filtering Data using Spark SQL
- Joining Tables using Spark SQL – Inner
- Joining Tables using Spark SQL – Outer
- Aggregating Data using Spark SQL
- Sorting Data using Spark SQL
Apache Spark using SQL – Basic DDL and DML
- Introduction to Basic DDL and DML using Spark SQL
- Create Spark Metastore Tables using Spark SQL
- Overview of Data Types for Spark Metastore Table Columns
- Adding Comments to Spark Metastore Tables using Spark SQL
- Loading Data Into Spark Metastore Tables using Spark SQL – Local
- Loading Data Into Spark Metastore Tables using Spark SQL – HDFS
- Loading Data into Spark Metastore Tables using Spark SQL – Append and Overwrite
- . Creating External Tables in Spark Metastore using Spark SQL
- Managed Spark Metastore Tables vs External Spark Metastore Tables
- Overview of Spark Metastore Table File Formats
- Drop Spark Metastore Tables and Databases
- Truncating Spark Metastore Tables
- Exercise – Managed Spark Metastore Tables
Apache Spark using SQL – DML and Partitioning
- Introduction to DML and Partitioning of Spark Metastore Tables using Spark SQL
- ntroduction to Partitioning of Spark Metastore Tables using Spark SQL
- Creating Spark Metastore Tables using Parquet File Format
- Load vs. Insert into Spark Metastore Tables using Spark SQL
- Inserting Data using Stage Spark Metastore Table using Spark SQL
- Creating Partitioned Spark Metastore Tables using Spark SQL
- Adding Partitions to Spark Metastore Tables using Spark SQL
- Loading Data into Partitioned Spark Metastore Tables using Spark SQL
- Inserting Data into Partitions of Spark Metastore Tables using Spark SQL
- Using Dynamic Partition Mode to insert data into Spark Metastore Tables
Azure Databricks
Introduction and Basic Understanding
- Creating a budget for project
- Creating an Azure Databricks Workspace
- Creating an Azure Datalake Storage Gen2
- Walkthough on databricks Workspace UI
- Introduction to Distributed Data Processing
- What is Azure Databricks
- Azure Databricks Architecture
- Cluster types and configuration
- Behind the scenes when creating cluster
- Sign up for Databricks Community Edition
- Understanding notebook and Markdown basics
- Notebook – Magic Commands
- DBUitls -File System Utilities
- DBUitls -Widget Utilities
- DBUtils – Notebook Utils
- Navigate the Workspace
- Databricks Runtimes
- Clusters Part 1
- Cluster Part 2
- Notebooks
- Libraries
- Repos for Git integration
- Databricks File System (DBFS)
- DBUTILS
- Widgets
- Workflows
- Metastore – Setup external Metastore
- Metastore – Setup external Metastore II
- Hands-on: How to navigate to the databricks service?
- Hands-on: How to create a workspace?
- Hands-on: How to create a spark cluster?
- Hands-on: How to create a notebook?
- Hands-on: How to create a table?
- Hands-on: How to delete a spark cluster?
- Hands-on: How to delete all resources in Azure Cloud?
- What is workspace?
- What is Resource Group?
- What is Databricks Runtime?
- What is notebook?
- Hands-on: Using notebook to visualize data
- Hands-on: Set up Apache Spark with Delta Lake
- Hands-on: Using python to operate delta lake
- Hands-on: Download and install postman
- Hands-on: Generate a token
- Hands-on: Create a spark cluster using REST API
- Hands-on: Delete a spark cluster using REST API
- Hands-on: Permanently delete a spark cluster using REST API
Databricks Developer Tools with Hands on Session Example
- Databricks Notebook, Rest API , Delta Lake What is Databricks Developer tools?
- Hands-on: Download and install python
- Hands-on: How to set up databricks cli?
- Hands-on: How to use databricks cli?
- Hands-on: How to use Databricks Utilities?
- Hands-on: Download and install JDK
- Hands-on: Download and install IntelliJ IDEA
- Hands-on: Using Databricks Utilities API Library in IDE
- Hands-on: How to use databricks in Azure Data Factory
- Hands-on: How to debug the notebook in pipeline?
- Hands-on: ETL with Azure Databricks
- Hands-on: How to debug ETL notebook in ETL pipeline?
Databricks CLI and Rest API
- DataBricks CLI
- Setting up Databricks CLI
- Lab : Workspace CLIS
- Lab : Cluster CLI
- Lab : DBFS CLI
- Lab : Jobs CLI
- Databricks CLI on Windows
- REST API
- Lab : Invoke REST API
- Lab : Job REST API
- Lab : Token Rest API
- Lab : Group API
Data Bricks CLI
- DataBricks CLI
- Setting up Databricks CLI
- Lab : Workspace CLIS
- Lab : Cluster CLI
- Lab : DBFS CLI
- Lab : Jobs CLI
- Databricks CLI on Windows
- REST API
- Lab : Invoke REST API
- Lab : Job REST API
- Lab : Token Rest API
- Lab : Group API
Working with Databricks File System & Security
- Working with DBFS Root
- Mounting ADLS to DBFS
- Drawbacks of Azure Datalake
- What is delta lake
- Understanding Lakehouse Architecture
- DataBricks Security
- Lab : Secret management
- Part I -> Column level Security
- Part II -> Column level Security
- Row level Security
Delta Lake & Delta Table
- Drawbacks of Azure Datalake
- What is delta lake
- Understanding Lakehouse Architecture
- Creating databricks workspace and ADLS for delta lake
- Accessing Datalake storage using service principal
- Sharing data for External Delta Table
- Reading Delta Table
- Delta Table Operations
- Drawbacks of ADLS – practical
- Medallion Lakehouse architecture
- Creating Delta Lake
- Understanding the delta format
- Understanding Transaction Log
- Creating delta tables using SQL Command
- Creating Delta table using PySpark Code
- Uploading files for next lectures
- Schema Enforcement
- Schema Evolution
- Delta Table Time Travel
- Time Travel and Versioning
- Vacuum Command
- Convert to Delta
- Understanding Optimize Command – Demo
- Optimize Command – Practical
- UPSERT using MERGE
- Lab : Create Delta Table (SQL & Python)
- Lab : Read & Write Delta Table
- . Lab : Convert a Parquet table to a Delta table
- Lab : Incremental ETL load
- Lab : Incremental ETL load (@version property)
- Convert Parquet to Delta
- Detailed of Delta Table Schema Validation
- Detailed of Delta Table Schema Evolution
- Look Inside Delta Table
- Delta Table Utilities and Optimization
- Processing XML, JSON , Delta Tables :
- Processing Nested XML file
- Processing Nested JSON file
- Delta Table – Time Travel and Vacuum
- UDF using Pyspark – hands on example
- Spark ingestion
- Disk partitioning
- Storage
- Predicate Pushdown
- Serialization
- Bucketing
- Zordering
- Adaptive Query Execution
Unity Catalog
- What is Unity Catalog
- Creating Access Connector for Databricks
- Creating Metastore in Unity Catalog
- Unity Catalog Object Model
- Roles in Unity Catalog
- Creating users in Azure Entra ID
- User and groups management Practical
- Cluster Policies
- What are cluster pools
- Creating Cluster Pool
- Creating a Dev Catalog
- Unity Catalog Privileges
- Understanding Unity Catalog
- Creating and accessing External location and storage credential
- Managed and External Tables in Unity Catalog
- Working with Securable Objects
- Setup Unity Catalog
- Unity Catalog User Provisioning
Unity Catalog- Mini Project
- Create External Location
- Create Catalogs and Schema
- Create External Tables
- Create Managed Tables
- Create Databricks Workflow
- Data Discovery
- Data Audit
- Data Lineage
- Data Access Control Overview
- Data Access Control Demo
Spark Structure Streaming & Autoloader In DataBricks
- Spark Structured Streaming – basics
- Understanding micro batches and background query
- Supported Sources and Sinks
- WriteStream and checkpoints
- Community Edition Drop databases
- Understanding outputModes
- Understanding Triggers
- Autoloader – Intro
- Autoloader – Schema inference
- What is Autoloader & Demo
- Autoloader Schema Evolution
- How to build incremental pipeline using Autoloader
- Schema Evolution – Demo
- Schema Evolution – Practical
Data Bricks incremental Ingestion tools
- Architecture and Need for Incremental Ingestion
- Using Copy Into with Manual Schema Evolution
- Using Copy Into with Automatic Schema Evolution
- Streaming Ingestion with Manual Schema Evolution
- Streaming Ingestion with Automatic Schema Evolution
- Introduction to Databricks Autoloader
- Autoloader with Automatic Schema Evolution
Notebook CI/CD via Azure Devops with Github
- Integrate databricks notebook with Git providers like Github.
- Configure Continuous Integration – Artefacts to deployed in clusters.
- Configure Continuous delivery using datathirst templates.
- Run notebook on Azure Databricks via Jobs.
- Secure cluster via cluster policy and permission
- DataFactory LinkedServices
- Orchestrate notebook via DataFactory
Project Details
- Typical Medallion Architecture
- Project Architecture
- Understanding the dataset
- Expected Setup
- Creating containers and External Locations
- Creating all schemas dynamically
- Creating bronze Tables Dynamically
Ingestion to Bronze
- Ingesting data to bronze layer – Demo
- Ingesting raw_traffic data to bronze table
- Assignment to get the raw_roads data to bronze table
- Ingesting raw_roads data to bronze Table
- To prove autoloader handles incremental loading
Silver & Gold Layer Transformation
- Transforming Silver Traffic data
- To prove only incremented records were being transformed
- Creating a common Notebook
- Run one notebook from another notebook
- Transforming Silver Roads data
- Getting data to Gold Layer
- Gold Layer Transformations and loading
Live Sessions Price:
For LIVE sessions – Offer price after discount is 259 USD 209 USD Or 25000 INR 17000 Rupees
Sample Course Completion Certificate:
Your course completion certificate looks like this……
Important Note:
To maintain the quality of our training and ensure smooth progress for all learners, we do not allow batch repetition or switching between courses. Once you enroll in a batch, please make sure to attend the classes regularly as per the schedule. We kindly request you to plan your learning accordingly. Thank you for your support and understanding
Course Features
- Lectures 305
- Quiz 0
- Duration 40 hours
- Skill level All levels
- Language English
- Students 0
- Assessments Yes
Curriculum
- 18 Sections
- 305 Lessons
- 40 Hours
- Azure Data Factory5
- ADF Activity, Control Flow & Copy Activity44
- 2.1Lookup Activity
- 2.2Get Metadata Activity
- 2.3Filter Activity
- 2.4For Each Loop
- 2.5If else condition
- 2.6Execute Pipeline activity
- 2.7First Pipeline – Lookup / Set Variable / Datasets
- 2.8Foreach Activity – Processing Items In A Loop
- 2.9Using Stored Procedures With Lookup Activity
- 2.10Read File/Folder Properties Using Get Metadata Activity
- 2.11Validation Activity Vs Get Metadata Activity
- 2.12Conditional Execution Using IF Activity
- 2.13Copy Data Activity – Scenario 1
- 2.14Copy Data Activity – Scenario 2
- 2.15Assignment 1: Copy files from local filesystem to Azure SQL Database
- 2.16Assignment 2: Load a table from one db to another db based on a condition
- 2.17Project -LAB-1 Design & Build First Metadata Driven ETL Framework
- 2.18Proejct-LAB-2 Design & Build First Metadata Driven ETL Framework
- 2.19Using Wait Activity As A Timer
- 2.20Using Fail Activity To Raise Exceptions
- 2.21Using Append Activity With Array Variable
- 2.22Using Filter Activity To Selectively Process Files
- 2.23Using Delete Activity To Cleanup Files After Processing
- 2.24Copy A Single JSON File
- 2.25Copy A Single TEXT File
- 2.26Copy A Single PARQUET File
- 2.27Copy All Files In A Folder
- 2.28Binary File Copy & Implementing File Move
- 2.29File To Table Copy Using Insert & Upsert Techniques
- 2.30File To Table Copy Using Stored Procedure
- 2.31File To Table Copy – Large File Issue
- 2.32Table To File Copy
- 2.33Master-Child Pattern Using Execute Pipeline Activity
- 2.34Using Self Hosted Integration Runtime To Ingest On-Prem Data
- 2.35Parameterized Linked Service & ADF Global Parameters
- 2.36Automated Execution Of Pipeline Using Scheduled Triggers
- 2.37Event-Based Execution Of Pipeline Using Event Triggers
- 2.38ADF Limitation – No Iterative Activity Within Foreach Activity
- 2.39ADF Limitation – No Iterative Activity Within IF Activity
- 2.40ADF Limitation – No Iterative Activity Within SWITCH Activity
- 2.41Sequential Vs Parallel Batch In A Foreach Activity
- 2.42ADF Limitation – Dynamic Execution Of Pipeline
- 2.43ADF Limitation – Record Number & Data Size Restrictions
- 2.44ADF Limitation – Dynamic Variable Name In Set Variable
- Project: Meta Driven ETL Framework12
- 3.1Set Up Azure Active Directory Users/Groups & Key Vault
- 3.2Set Up Azure Storage
- 3.3Set Up Azure SQL Database
- 3.4Set Up Additional Groups & Users
- 3.5ETL Framework Metadata Tables
- 3.6Set Up Azure Data Factory
- 3.7Modular & Reusable Design
- 3.8Generic Pipeline to Extract From SQL Database -1
- 3.9Generic Pipeline to Extract From SQL Database -2
- 3.10Generic Pipeline to Extract From SQL Database -3
- 3.11Use Case : Historical Or Intial Load executing with dynamic configuration approach
- 3.12Use Case : Incremental Load from Azure SQL to Azure Data Lake using 10 minute SLA .
- Data Ingestion15
- 4.1Data Ingestion – Integration Runtimes
- 4.2Data Ingestion – What is Self Hosted Integration Runtime
- 4.3Overview of On-premise data source and Datalake
- 4.4Downloading and installing Self Hosted IR in On-premise
- 4.5UPDATE – Self Hosted IR Files Access issue
- 4.6Creating and adding Secrets to Azure Key vault
- 4.7Creating Linked Service for Azure Key vault – Demo
- 4.8Creating Linked Service and Dataset for On-premise File
- 4.9UPDATE -Fix access issue- Create Azure VM and install Self
- 4.10UPDATED- Fix ‘host’ is not allowed error
- 4.11Creating Linked Service and Dataset for Azure Datalake
- 4.12Creating Copy Activity to copy all files from On-premise to Azure
- 4.13Incremental data loading using Last Modified Date of File
- 4.14Incremental Load based on File Name – Demo
- 4.15Incremental Data loading based on Filename – Practical
- Parameterize3
- Real Time Use Case – Frequently used into the project18
- 6.1Apply UPSERT into ADF – using Copy Activity
- 6.2One Prem to Azure Cloud Migration
- 6.3Remove Duplicate record in ADF
- 6.4How to handle NULL From ADF
- 6.5Remove specific Rows in File using ADF
- 6.6Remove 1st few Rows and last few Rows From ADF
- 6.7Handle Error Handling in Data Flow Mapping
- 6.8Get File Name From Source
- 6.9Copy Files based on last modified Date
- 6.10Build ETL Pipeline
- 6.11Modular & Resuable Design
- 6.12Passing Parent pipeline Run ID & Parent Pipeline Name to Child Pipeline .
- 6.13Slowly Changing Dimension Type I
- 6.14Lab: Slowly Changing Dimension Type 1
- 6.15Artifacts for Tables used in the Lab session of SCD Type 1
- 6.16Slowly Changing Dimension Type 2 (Concepts)
- 6.17Artifacts for Tables used in the Lab Session of SCD Type II
- 6.18Lab: Slowly Changing Dimension Type 2
- Azure Synapse Analytics13
- 7.1Why Warehousing in Cloud
- 7.2Traditional vs Modern Warehouse architecture
- 7.3What is Synapse Analytics Service
- 7.4Demo: Create Dedicated SQL Pool
- 7.5Demo: Connect Dedicated SQL Pool with SSMS
- 7.6Demo: Create Azure Synapse Analytics Studio Workspace
- 7.7Demo: Explore Synapse Studio V2
- 7.8Demo: Create Dedicated SQL Pool and Spark Pool
- 7.9Demo: Analyse Data using Dedicated SQL Pool
- 7.10Demo: Analyse Data using Apache Spark Notebook
- 7.11Demo: Analyse Data using Serverless SQL Pool
- 7.12Demo: Data Factory from Synapse Analytics Studio
- 7.13Demo: Monitor Synapse Studio
- Azure Synapse Benefits18
- 8.1Introduction:
- 8.2What is Microsoft Fabric?
- 8.3Fabric Signup
- 8.4Creating Fabric Workspace
- 8.5Fabric Pricing
- 8.6Creating storage account in Azure
- 8.7Creating Azure Synapse Analytics Service in Azure
- 8.8Evolution of Data Architectures
- 8.9Delta Lake Structure
- 8.10Why Microsoft Fabric is needed
- 8.11Microsoft’s definition of Fabric
- 8.12How to enable and access Microsoft Fabric
- 8.13Fabric License and costing
- 8.14Update in Fabric UI
- 8.15Experiences in Microsoft Fabric
- 8.16Fabric Terminology
- 8.17OneLake in Fabric
- 8.18One copy for all Computes in Microsoft Fabric
- Fabric Lakehouse14
- 9.1Understanding Fabric Workspaces
- 9.2Enable Fabric Trail and Create workspace
- 9.3Purchasing Fabric Capacity from Azure
- 9.4Workspace roles in Microsoft Fabric
- 9.5Update in the create items UI
- 9.6Creating a Lakehouse
- 9.7What is inside lakehouse
- 9.8Uploading data to Lakehouse
- 9.9Uploading Folder into Lakehouse
- 9.10SQL analytics endpoint in Lakehouse
- 9.11Access SQL analytics endpoint using SSMS
- 9.12Visual Query in SQL endpoint
- 9.13Default Semantic Model
- 9.14OneLake File Explorer
- Fabric Datafactory11
- 10.1Fabric Data Factory UI
- 10.2Ways to load data into Lakehouse
- 10.3Fabric Data Factory vs Azure Data Factory Scenario
- 10.4Gateway types in Microsoft Fabric
- 10.5Installing On-prem data gateway
- 10.6Create Connection to SQL Server
- 10.7Pipeline to ingest OnPrem SQL data to Lakehouse
- 10.8Scenario completed using Fabric data factory
- 10.9Dataflow Gen2 – Intro
- 10.10Creating DataFlow Gen2
- 10.11DataFlow Gen2 in Fabric vs Dataflows in ADF
- OneLake in Fabric20
- 11.1Shortcuts in Fabric – Intro
- 11.2Prerequisites to Create a shortcut
- 11.3Creating a shortcut in Files of Lakehouse
- 11.4Criteria to create shortcuts in table section
- 11.5Uploading required files and access for synapse
- 11.6Right way to create a shortcut in table’s section
- 11.7Creating delta file
- 11.8Creating shortcut in Table’s section
- 11.9Scenario – Creating shortcut with delta in a subfolder
- 11.10Scenario – Creating shortcut with only parquet format
- 11.11Requirements to create shortcuts in Table and files section
- 11.12Updation Scenario 1 – Lakehouse to Datalake
- 11.13Updation Scenario 2 – Datalake to Lakehouse
- 11.14Shortcut deletion scenarios intro
- 11.15Deletion Scenario 1 – Delete in Lakehouse files
- 11.16Deletion Scenario 2 – Delete in ADLS
- 11.17Deletion Scenario 3 – Delete table data in Lakehouse
- 11.18Deletion Scenario 4 – Delete table data in ADLS
- 11.19Deletion Scenario 5 – Deleting entire shortcut
- 11.20Shortcut deleting scenario summary
- Fabric Synapse Data Engineering34
- 12.1Ingestion to Lakehouse status
- 12.2Spark in Microsoft Fabric
- 12.3Spark pools in Microsoft Fabric
- 12.4Spark pool node size
- 12.5Customizing Starter pools
- 12.6Creating a custom pool in Workspace
- 12.7Standard vs High Concurrency Sessions
- 12.8Changing Spark Settings to StarterPool
- 12.9Update in attaching Lakehouse to Notebook Option
- 12.10Understanding Notebooks UI
- 12.11Fabric Notebook basics
- 12.12MSSparkUtils – Intro
- 12.13MSSparkUtils – FS- Mount
- 12.14MSSparkUtils – FS – Other utils
- 12.15MSSparkUtils – FS – FastCp
- 12.16Creating Folders in Microsoft Fabric
- 12.17MSSparkUtils – Notebook Utils – Run exit
- 12.18MSSparkUtils – Notebook – RunMultiple
- 12.19Access ADLS data to Lakehouse – Intro
- 12.20Access ADLS using Entra ID
- 12.21Access ADLS using Service principal
- 12.22Access ADLS using SP with keyvault
- 12.23Call Fabric notebook from Fabric pipeline
- 12.24Managed vs External table – Intro
- 12.25Create an External Table
- 12.26Create a Managed Table
- 12.27Shortcut Table is an external or managed table
- 12.28Data Wrangler in Fabric Notebook
- 12.29Environments in Microsoft Fabric
- 12.30Understanding V-order optimization
- 12.31Inspire us with your Thoughts
- 12.32Spark Job Definition
- 12.33What is a data mesh
- 12.34Creating domains in Fabric
- Synapse Migration to Fabric18
- 13.1Manual import from Synapse to Fabric
- 13.2Automated way to import and export notebooks – Intro
- 13.3Migrate all notebooks from Synapse to fabric
- 13.4Automated way to import and export notebooks – Intro
- 13.5Possibility of Migration of Pipelines to Fabric pipelines
- 13.6Ways to migrate ADLS data to Fabric OneLake
- 13.7Migrate all notebooks from Synapse to fabric
- 13.8Migrate ADLS data to Onelake using Storage Explorer
- 13.9Install Capacity Metrics App
- 13.10Understanding UI of Capacity Metrics App
- 13.11Capacity Units consumption
- 13.12Throttling vs Smoothing
- 13.13Throttling stage- Overage Protection Policy
- 13.14Other throttling stages
- 13.15Throttling stages Summary
- 13.16Overages in Fabric
- 13.17System Events in Fabric
- 13.18Matrix Visual
- Fabric Warehouse Synapse18
- 14.1Creating a Warehouse in Fabric
- 14.2Warehouse vs SQL Analytics Endpoint
- 14.3Creating a table and Limitations
- 14.4Ways to Load Data into Warehouse
- 14.5Loading Data using COPY INTO Command
- 14.6Loading Data using Pipeline to Warehouse
- 14.7Loading Data using DataFlow Gen2
- 14.8Data Sharing – Lakehouse & Warehouse
- 14.9Cross Database Ingestion in Warehouse
- 14.10Lakehouse vs Warehouse when to choose what
- 14.11Different Medallion Architectural patterns
- 14.12Update Lakehouse data from WH and vice versa
- 14.13SQL query as session in Fabric
- 14.14Zero Copy clone within and across Schema
- 14.15Time Travel in Warehouse
- 14.16Benefits & Limitations of Zero Copy clones
- 14.17Cloning single or multiple tables using UI
- 14.18Query Insights in Warehouse
- Fabric Access Control and Permission25
- 15.1Microsoft Fabric Structure
- 15.2Tenant Level permissions
- 15.3Capacity Level Permissions
- 15.4Creating new user in Entra ID
- 15.5Workspace roles- Workspace Administration
- 15.6Workspace roles – Data pipeline permissions
- 15.7Workspace Roles – Notebook, Spark jobs, etc
- 15.8Data Warehouse permissions – Intro
- 15.9Workspace Roles – Accessing shortcuts internal to fabric – Theory
- 15.10Workspace Roles – Accessing Shortcuts Internal to Fabric – Practical
- 15.11Workspace Roles – Accessing ADLS shortcuts – Theory
- 15.12Workspace Roles – Accessing ADLS shortcuts – Practical
- 15.13Workspace Roles – Lakehouse permissions
- 15.14Item level permissions – Intro
- 15.15Warehouse Sharing – No additional permissions
- 15.16Warehouse Sharing – ReadData permissions
- 15.17Warehouse Sharing – ReadAll permissions
- 15.18Warehouse Sharing – Build permissions
- 15.19Extend Microsoft Fabric Trail
- 15.20Lakehouse Sharing – All permissions
- 15.21Notebook – Item Sharing
- 15.22Manage OneLake data access
- 15.23Row-Level Security in Warehouse and SQL endpoint
- 15.24Dynamic Data Masking in Warehouse and SQL endpoint
- 15.25Column & Object level security in Warehouse and SQL endpoint
- End to End project using Fabric22
- 16.1Different Medallion architectures in Fabric
- 16.2Understanding domain and dataset information
- 16.3Project Architecture
- 16.4Creating workspace for project and review dataset
- 16.5Get data from Raw to landing – theory
- 16.6Raw to landing zone
- 16.7Different incremental loading patterns
- 16.8Incrementally ingest from Raw to landing zone
- 16.9Automate ingest from Raw to Landing using pipeline
- 16.10Ingest data from Landing to Bronze layer – Theory
- 16.11Understanding UPSERT logic for Landing to Bronze ingestion
- 16.12Landing to Bronze layer – practical
- 16.13Reading landing to bronze from next partition
- 16.14UPSERT scenario practical – Landing to bronze
- 16.15Bronze layer to Silver layer – Theory
- 16.16Understanding data transformations and UPSERT logic for Silver table
- 16.17Silver table – Data cleaning
- 16.18Silver Layer – data transformations
- 16.19Gold Layer – Facts and dimensions table – Theory
- 16.20Gold Layer – Facts and dimension tables – Practical
- 16.21Data modelling and creating a report
- 16.22Orchestrate end to end pipeline and execute it
- GIT Integration15
- 17.1Creating data sources for PROD
- 17.2Changes made to support Git integration
- 17.3Executing to check if changes were working
- 17.4Sign up with Azure DevOps account
- 17.5Connect Fabric workspace to Azure DevOps
- 17.6Git integration permissions and Limitations
- 17.7Locking main branch with branch policy
- 17.8Understanding Continuous Integration (CI) in Fabric
- 17.9Continuous Integration in Fabric Workspace
- 17.10Status of workspace created for feature branch
- 17.11Understanding Continuous Deployment in Fabric
- 17.12Deploying Fabric items from Dev to Prod
- 17.13Deployment rules to Change data sources of Prod workspace
- 17.14End to End execution in PROD
- 17.15Git integration for Power BI developers
- And Many More0