Azure Data Engineer with ADF, Synapse, Fabric, Data bricks and PySpark -Live Training.
(A complete hands-on program covering Microsoft Fabric, Synapse, Databricks, PySpark & DevOps pipelines)
Isha presents an comprehensive, hands-on training programs focused on Microsoft Fabric, Azure Databricks, Apache Spark, and end-to-end data engineering. The curriculum spans critical areas such as Fabric Warehouse, Data Lakehouse architecture, PySpark transformations, Delta Lake operations, Unity Catalog security, and real-time data streaming. Learners gain expertise in modern data integration techniques, CI/CD workflows using Azure DevOps, and secure data governance. With a strong emphasis on Medallion architecture and real-world projects, our training equips professionals with the technical depth and practical skills required to succeed in today’s data-driven industry.
Prerequisite for training: – Knowledge in python programing and SQL.
About the Instructor:
Expert in Microsoft Azure | 18+ Years of MNC Experience | 8+ Years of Training Excellence Raj is a seasoned IT professional with over 18 years of hands-on experience in top Multinational Companies (MNCs), specializing in Microsoft Azure and cloud-based enterprise solutions. Throughout his career, he has successfully led multiple cloud transformation projects, DevOps implementations, and infrastructure modernization initiatives for global clients. With a deep understanding of Azure services, cloud architecture, security, and automation, Raj brings real-world expertise into the classroom. His technical depth is matched by his ability to simplify complex topics for learners of all levels. For the past 8 years, Raj has been dedicated to training and mentoring professionals and freshers alike. His passion for teaching and clarity in delivery have helped hundreds of learners gain the confidence and skills needed to excel in cloud computing and Microsoft Azure technologies. |
Live Sessions Price:
For LIVE sessions – Offer price after discount is 340 USD 300 200 USD Or USD29000 INR 25000 INR 17000 Rupees
OR
What will I learn by the end of this course?
- Gain in-depth proficiency in Microsoft Fabric, including Warehouse setup, Dataflows Gen2, Access Control, Lakehouse vs Warehouse design decisions, and SQL analytics features like Time Travel and Zero Copy Clones.
- Master PySpark and Apache Spark SQL, covering DataFrames, window functions, joins, transformations, partitioning, UDFs, and advanced performance optimization techniques.
- Develop hands-on expertise in Azure Databricks, including cluster setup, DBFS, notebook management, REST API integration, Delta Lake fundamentals, and Medallion Architecture implementation.
- Learn Delta Lake features such as Schema Enforcement, Evolution, Time Travel, Vacuum, Optimize, Z-Ordering, and efficient ingestion using Auto Loader and Structured Streaming.
- Understand and configure Unity Catalog for enterprise-grade data governance with external locations, storage credentials, roles, permissions, and security layers like Row-Level and Column-Level Security.
- Implement real-world medallion architecture pipelines from raw to bronze, silver, and gold layers with practical exercises, optimized transformations, and data modeling for reporting.
Free Demo Session:
24th June @ 9 PM – 10 PM (IST) (Indian Timings)
24th June @ 11:30 AM – 12:30 PM (EST) (U.S Timings)
24th June @ 4:30 PM – 5:30 PM (BST) (UK Timings)
Class Schedule:
For Participants in India: Monday to Friday 9 PM – 10 PM (IST)
For Participants in US: Monday to Friday 11:30 AM – 12:30 PM (EST)
For Participants in UK: Monday to Friday 4:30 PM – 5:30 PM (BST)
What student’s have to say about Trainer:
Fantastic trainer! Each session was well-structured and full of actionable insights – Smitha
The sessions were super interactive, and the trainer made even the most complex Azure components feel simple. I now understand Data Factory pipelines and workspace organization much better than before- Chandu
Thank you for such an informative and well-organized training- Swarna
Loved the way the trainer explained Azure Synapse and Databricks—very hands-on and easy to follow –Anu
Excellent at maintaining engagement throughout. Every session felt well-paced and thoughtfully delivered.- Amaresh
I gained a lot more than I expected, mainly due to the trainer’s teaching style and attention to individual progress –Megha
Salient Features:
- 40 Hours of Live Training along with recorded videos
- Lifetime access to the recorded videos
- Course Completion Certificate
Who can enroll in this course?
- Data Engineers looking to deepen their skills in Microsoft Fabric, Databricks, and Delta Lake.
- Data Analysts and BI Developers aiming to transition into data engineering or work with large-scale analytics solutions.
- Software Developers wanting to learn big data processing using Apache Spark and PySpark.
- ETL Developers and Azure Data Factory users interested in advanced data orchestration and automation.
- DevOps Engineers and Cloud Engineers working with CI/CD pipelines, Git integration, and Azure DevOps.
- Database Administrators (DBAs) moving toward cloud-based data platforms.
- Anyone preparing for roles in modern data platforms, including Lakehouse and streaming data architectures.
Course syllabus:
Azure Data Factory + Synapse : 8hrs
Azure Fabric : 10hrs
Pyspark : 8hrs
Databricks : 14hrs
Azure Data Factory
- What is Azure Data Factory?
- Create Azure Data Factory service
- Building Blocks of Data Factory
- ADF Provisioning
- Linked Services
ADF Activity, Control Flow & Copy Activity
- Lookup Activity
- Get Metadata Activity
- Filter Activity
- For Each Loop
- If else condition
- Execute Pipeline activity
- First Pipeline – Lookup / Set Variable / Datasets
- Foreach Activity – Processing Items In A Loop
- Using Stored Procedures With Lookup Activity
- Read File/Folder Properties Using Get Metadata Activity
- Validation Activity Vs Get Metadata Activity
- Conditional Execution Using IF Activity
- Copy Data Activity – Scenario 1
- Copy Data Activity – Scenario 2
- Assignment 1: Copy files from local filesystem to Azure SQL Database
- Assignment 2: Load a table from one db to another db based on a condition
- Project -LAB-1 Design & Build First Metadata Driven ETL Framework
- Proejct-LAB-2 Design & Build First Metadata Driven ETL Framework
- Using Wait Activity As A Timer
- Using Fail Activity To Raise Exceptions
- Using Append Activity With Array Variable
- Using Filter Activity To Selectively Process Files
- Using Delete Activity To Cleanup Files After Processing
- Copy A Single JSON File
- Copy A Single TEXT File
- Copy A Single PARQUET File
- Copy All Files In A Folder
- Binary File Copy & Implementing File Move
- File To Table Copy Using Insert & Upsert Techniques
- File To Table Copy Using Stored Procedure
- File To Table Copy – Large File Issue
- Table To File Copy
- . Master-Child Pattern Using Execute Pipeline Activity
- Using Self Hosted Integration Runtime To Ingest On-Prem Data
- Parameterized Linked Service & ADF Global Parameters
- Automated Execution Of Pipeline Using Scheduled Triggers
- Event-Based Execution Of Pipeline Using Event Triggers
- ADF Limitation – No Iterative Activity Within Foreach Activity
- ADF Limitation – No Iterative Activity Within IF Activity
- ADF Limitation – No Iterative Activity Within SWITCH Activity
- Sequential Vs Parallel Batch In A Foreach Activity
- ADF Limitation – Dynamic Execution Of Pipeline
- ADF Limitation – Record Number & Data Size Restrictions
- ADF Limitation – Dynamic Variable Name In Set Variable
Project: Meta Driven ETL Framework
- Set Up Azure Active Directory Users/Groups & Key Vault
- Set Up Azure Storage
- Set Up Azure SQL Database
- Set Up Additional Groups & Users
- ETL Framework Metadata Tables
- Set Up Azure Data Factory
- Modular & Reusable Design
- Generic Pipeline to Extract From SQL Database -1
- Generic Pipeline to Extract From SQL Database -2
- Generic Pipeline to Extract From SQL Database -3
- Use Case : Historical Or Intial Load executing with dynamic configuration approach
- Use Case : Incremental Load from Azure SQL to Azure Data Lake using 10 minute SLA .
Data Ingestion
- Data Ingestion – Integration Runtimes
- Data Ingestion – What is Self Hosted Integration Runtime
- Overview of On-premise data source and Datalake
- Downloading and installing Self Hosted IR in On-premise
- UPDATE – Self Hosted IR Files Access issue
- Creating and adding Secrets to Azure Key vault
- Creating Linked Service for Azure Key vault – Demo
- Creating Linked Service and Dataset for On-premise File
- UPDATE -Fix access issue- Create Azure VM and install Self
- UPDATED- Fix ‘host’ is not allowed error
- Creating Linked Service and Dataset for Azure Datalake
- Creating Copy Activity to copy all files from On-premise to Azure
- Incremental data loading using Last Modified Date of File
- Incremental Load based on File Name – Demo
- Incremental Data loading based on Filename – Practical
Parameterize
- Parameterize Linked Service, DataSets, Pipeline
- Monitor Visually
- Azure Monitor
Real Time Use Case – Frequently used into the project
- Apply UPSERT into ADF – using Copy Activity
- One Prem to Azure Cloud Migration
- Remove Duplicate record in ADF
- How to handle NULL From ADF
- Remove specific Rows in File using ADF
- Remove 1st few Rows and last few Rows From ADF
- Handle Error Handling in Data Flow Mapping
- Get File Name From Source
- Copy Files based on last modified Date
- Build ETL Pipeline
- Modular & Resuable Design
- Passing Parent pipeline Run ID & Parent Pipeline Name to Child Pipeline .
- Slowly Changing Dimension Type I
- Lab: Slowly Changing Dimension Type 1
- Artifacts for Tables used in the Lab session of SCD Type 1
- Slowly Changing Dimension Type 2 (Concepts)
- Artifacts for Tables used in the Lab Session of SCD Type II
- Lab: Slowly Changing Dimension Type 2
Azure Synapse Analytics
- Why Warehousing in Cloud
- Traditional vs Modern Warehouse architecture
- What is Synapse Analytics Service
- Demo: Create Dedicated SQL Pool
- Demo: Connect Dedicated SQL Pool with SSMS
- Demo: Create Azure Synapse Analytics Studio Workspace
- Demo: Explore Synapse Studio V2
- Demo: Create Dedicated SQL Pool and Spark Pool
- Demo: Analyse Data using Dedicated SQL Pool
- Demo: Analyse Data using Apache Spark Notebook
- Demo: Analyse Data using Serverless SQL Pool
- Demo: Data Factory from Synapse Analytics Studio
- Demo: Monitor Synapse Studio
Azure Synapse Benefits
- Introduction:
- What is Microsoft Fabric?
- Fabric Signup
- Creating Fabric Workspace
- Fabric Pricing
- Creating storage account in Azure
- Creating Azure Synapse Analytics Service in Azure
- Evolution of Data Architectures
- Delta Lake Structure
- Why Microsoft Fabric is needed
- Microsoft’s definition of Fabric
- How to enable and access Microsoft Fabric
- Fabric License and costing
- Update in Fabric UI
- Experiences in Microsoft Fabric
- Fabric Terminology
- OneLake in Fabric
- One copy for all Computes in Microsoft Fabric
Fabric Lakehouse
- Understanding Fabric Workspaces
- Enable Fabric Trail and Create workspace
- Purchasing Fabric Capacity from Azure
- Workspace roles in Microsoft Fabric
- Update in the create items UI
- Creating a Lakehouse
- What is inside lakehouse
- Uploading data to Lakehouse
- Uploading Folder into Lakehouse
- SQL analytics endpoint in Lakehouse
- Access SQL analytics endpoint using SSMS
- Visual Query in SQL endpoint
- Default Semantic Model
- OneLake File Explorer
Fabric Datafactory
- Fabric Data Factory UI
- Ways to load data into Lakehouse
- Fabric Data Factory vs Azure Data Factory Scenario
- Gateway types in Microsoft Fabric
- Installing On-prem data gateway
- Create Connection to SQL Server
- Pipeline to ingest OnPrem SQL data to Lakehouse
- Scenario completed using Fabric data factory
- Dataflow Gen2 – Intro
- Creating DataFlow Gen2
- DataFlow Gen2 in Fabric vs Dataflows in ADF
OneLake in Fabric
- Shortcuts in Fabric – Intro
- Prerequisites to Create a shortcut
- Creating a shortcut in Files of Lakehouse
- Criteria to create shortcuts in table section
- Uploading required files and access for synapse
- Right way to create a shortcut in table’s section
- Creating delta file
- Creating shortcut in Table’s section
- Scenario – Creating shortcut with delta in a subfolder
- Scenario – Creating shortcut with only parquet format
- Requirements to create shortcuts in Table and files section
- Updation Scenario 1 – Lakehouse to Datalake
- Updation Scenario 2 – Datalake to Lakehouse
- Shortcut deletion scenarios intro
- Deletion Scenario 1 – Delete in Lakehouse files
- Deletion Scenario 2 – Delete in ADLS
- Deletion Scenario 3 – Delete table data in Lakehouse
- Deletion Scenario 4 – Delete table data in ADLS
- Deletion Scenario 5 – Deleting entire shortcut
- Shortcut deleting scenario summary
Fabric Synapse Data Engineering
- Ingestion to Lakehouse status
- Spark in Microsoft Fabric
- Spark pools in Microsoft Fabric
- Spark pool node size
- Customizing Starter pools
- Creating a custom pool in Workspace
- Standard vs High Concurrency Sessions
- Changing Spark Settings to StarterPool
- Update in attaching Lakehouse to Notebook Option
- Understanding Notebooks UI
- Fabric Notebook basics
- MSSparkUtils – Intro
- MSSparkUtils – FS- Mount
- MSSparkUtils – FS – Other utils
- MSSparkUtils – FS – FastCp
- Creating Folders in Microsoft Fabric
- MSSparkUtils – Notebook Utils – Run exit
- MSSparkUtils – Notebook – RunMultiple
- Access ADLS data to Lakehouse – Intro
- Access ADLS using Entra ID
- Access ADLS using Service principal
- Access ADLS using SP with keyvault
- Call Fabric notebook from Fabric pipeline
- Managed vs External table – Intro
- Create a Managed Table
- Create an External Table
- Shortcut Table is an external or managed table
- Data Wrangler in Fabric Notebook
- Environments in Microsoft Fabric
- Understanding V-order optimization
- Inspire us with your Thoughts
- Spark Job Definition
- What is a data mesh
- Creating domains in Fabric
Synapse Migration to Fabric
- Manual import from Synapse to Fabric
- Automated way to import and export notebooks – Intro
- Migrate all notebooks from Synapse to fabric
- Possibility of Migration of Pipelines to Fabric pipelines
- Ways to migrate ADLS data to Fabric OneLake
- Migrate ADLS data to Onelake using Storage Explorer
- Install Capacity Metrics App
- Understanding UI of Capacity Metrics App
- Capacity Units consumption
- Throttling vs Smoothing
- Throttling stage- Overage Protection Policy
- Other throttling stages
- Throttling stages Summary
- Overages in Fabric
- System Events in Fabric
- Matrix Visual
Fabric Warehouse Synapse
- Creating a Warehouse in Fabric
- Warehouse vs SQL Analytics Endpoint
- Creating a table and Limitations
- Ways to Load Data into Warehouse
- Loading Data using COPY INTO Command
- Loading Data using Pipeline to Warehouse
- Loading Data using DataFlow Gen2
- Data Sharing – Lakehouse & Warehouse
- Cross Database Ingestion in Warehouse
- Lakehouse vs Warehouse when to choose what
- Different Medallion Architectural patterns
- Update Lakehouse data from WH and vice versa
- SQL query as session in Fabric
- Zero Copy clone within and across Schema
- Time Travel in Warehouse
- Benefits & Limitations of Zero Copy clones
- Cloning single or multiple tables using UI
- Query Insights in Warehouse
Fabric Access Control and Permission
- Microsoft Fabric Structure
- Tenant Level permissions
- Capacity Level Permissions
- Creating new user in Entra ID
- Workspace roles- Workspace Administration
- Workspace roles – Data pipeline permissions
- Workspace Roles – Notebook, Spark jobs, etc
- Data Warehouse permissions – Intro
- Workspace Roles – Accessing shortcuts internal to fabric – Theory
- Workspace Roles – Accessing Shortcuts Internal to Fabric – Practical
- Workspace Roles – Accessing ADLS shortcuts – Theory
- Workspace Roles – Accessing ADLS shortcuts – Practical
- Workspace Roles – Lakehouse permissions
- Item level permissions – Intro
- Warehouse Sharing – No additional permissions
- Warehouse Sharing – ReadData permissions
- Warehouse Sharing – ReadAll permissions
- Warehouse Sharing – Build permissions
- Extend Microsoft Fabric Trail
- Lakehouse Sharing – All permissions
- Notebook – Item Sharing
- Manage OneLake data access
- Row-Level Security in Warehouse and SQL endpoint
- Dynamic Data Masking in Warehouse and SQL endpoint
- Column & Object level security in Warehouse and SQL endpoint
End to End project using Fabric
- Different Medallion architectures in Fabric
- Understanding domain and dataset information
- Project Architecture
- Creating workspace for project and review dataset
- Get data from Raw to landing – theory
- Raw to landing zone
- Different incremental loading patterns
- Incrementally ingest from Raw to landing zone
- Automate ingest from Raw to Landing using pipeline
- Ingest data from Landing to Bronze layer – Theory
- Understanding UPSERT logic for Landing to Bronze ingestion
- Landing to Bronze layer – practical
- Reading landing to bronze from next partition
- UPSERT scenario practical – Landing to bronze
- Bronze layer to Silver layer – Theory
- Understanding data transformations and UPSERT logic for Silver table
- Silver table – Data cleaning
- Silver Layer – data transformations
- Gold Layer – Facts and dimensions table – Theory
- Gold Layer – Facts and dimension tables – Practical
- Data modelling and creating a report
- Orchestrate end to end pipeline and execute it
GIT Integration
- Creating data sources for PROD
- Changes made to support Git integration
- Executing to check if changes were working
- Sign up with Azure DevOps account
- Connect Fabric workspace to Azure DevOps
- Git integration permissions and Limitations
- Locking main branch with branch policy
- Understanding Continuous Integration (CI) in Fabric
- Continuous Integration in Fabric Workspace
- Status of workspace created for feature branch
- Understanding Continuous Deployment in Fabric
- Deploying Fabric items from Dev to Prod
- Deployment rules to Change data sources of Prod workspace
- End to End execution in PROD
- Git integration for Power BI developers
Pyspark
Apache Spark using SQL – Getting Started
- Launching and using Spark SQL CLI
- Understanding Spark Metastore Warehouse Directory
- Managing Spark Metastore Databases
- Managing Spark Metastore Tables
- Retrieve Metadata of Spark Metastore Tables
- Role of Spark Metastore or Hive Metastore
- Example to working with Dataframe
- DataFrame with SparkSQL shell
- Spark DataFrame
- working with dataframe row
- working with Dataframe row and unit test
- working with Dataframe row and unstructure data
- working with dataframe column
- DataFrame partition and Executors
- Creating and using UDF
- Aggregation in DataFrame
- Windowing in dataframe
- -Grouping Aggregation in Dataframe
- DataFrame joins
- Internal Joins & shuffle
- Optimizing joins
- Implementing Bucket joins
- Spark Transformation and Actions
- Spark Jobs Stages & Task
- Understanding Execution plan
- Unit Testing in Spark
- Debuging Spark Driver and Executor
- Spark Application logs in cluster
Assignment :
Spark SQL Exercise
Apache Spark using SQL – Pre-defined Function
- Overview of Pre-defined Functions using Spark SQL
- Validating Functions using Spark SQL
- String Manipulation Functions using Spark SQL
- Date Manipulation Functions using Spark SQL
- Overview of Numeric Functions using Spark SQL
- Data Type Conversion using Spark SQL
- Dealing with Nulls using Spark SQL
- Using CASE and WHEN using Spark SQL
Apache Spark using SQL – Basic Transformations
- Prepare or Create Tables using Spark SQL
- Projecting or Selecting Data using Spark SQL
- Filtering Data using Spark SQL
- Joining Tables using Spark SQL – Inner
- Joining Tables using Spark SQL – Outer
- Aggregating Data using Spark SQL
- Sorting Data using Spark SQL
Apache Spark using SQL – Basic DDL and DML
- Introduction to Basic DDL and DML using Spark SQL
- Create Spark Metastore Tables using Spark SQL
- Overview of Data Types for Spark Metastore Table Columns
- Adding Comments to Spark Metastore Tables using Spark SQL
- Loading Data Into Spark Metastore Tables using Spark SQL – Local
- Loading Data Into Spark Metastore Tables using Spark SQL – HDFS
- Loading Data into Spark Metastore Tables using Spark SQL – Append and Overwrite
- . Creating External Tables in Spark Metastore using Spark SQL
- Managed Spark Metastore Tables vs External Spark Metastore Tables
- Overview of Spark Metastore Table File Formats
- Drop Spark Metastore Tables and Databases
- Truncating Spark Metastore Tables
- Exercise – Managed Spark Metastore Tables
Apache Spark using SQL – DML and Partitioning
- Introduction to DML and Partitioning of Spark Metastore Tables using Spark SQL
- ntroduction to Partitioning of Spark Metastore Tables using Spark SQL
- Creating Spark Metastore Tables using Parquet File Format
- Load vs. Insert into Spark Metastore Tables using Spark SQL
- Inserting Data using Stage Spark Metastore Table using Spark SQL
- Creating Partitioned Spark Metastore Tables using Spark SQL
- Adding Partitions to Spark Metastore Tables using Spark SQL
- Loading Data into Partitioned Spark Metastore Tables using Spark SQL
- Inserting Data into Partitions of Spark Metastore Tables using Spark SQL
- Using Dynamic Partition Mode to insert data into Spark Metastore Tables
Azure Databricks
Introduction and Basic Understanding
- Creating a budget for project
- Creating an Azure Databricks Workspace
- Creating an Azure Datalake Storage Gen2
- Walkthough on databricks Workspace UI
- Introduction to Distributed Data Processing
- What is Azure Databricks
- Azure Databricks Architecture
- Cluster types and configuration
- Behind the scenes when creating cluster
- Sign up for Databricks Community Edition
- Understanding notebook and Markdown basics
- Notebook – Magic Commands
- DBUitls -File System Utilities
- DBUitls -Widget Utilities
- DBUtils – Notebook Utils
- Navigate the Workspace
- Databricks Runtimes
- Clusters Part 1
- Cluster Part 2
- Notebooks
- Libraries
- Repos for Git integration
- Databricks File System (DBFS)
- DBUTILS
- Widgets
- Workflows
- Metastore – Setup external Metastore
- Metastore – Setup external Metastore II
- Hands-on: How to navigate to the databricks service?
- Hands-on: How to create a workspace?
- Hands-on: How to create a spark cluster?
- Hands-on: How to create a notebook?
- Hands-on: How to create a table?
- Hands-on: How to delete a spark cluster?
- Hands-on: How to delete all resources in Azure Cloud?
- What is workspace?
- What is Resource Group?
- What is Databricks Runtime?
- What is notebook?
- Hands-on: Using notebook to visualize data
- Hands-on: Set up Apache Spark with Delta Lake
- Hands-on: Using python to operate delta lake
- Hands-on: Download and install postman
- Hands-on: Generate a token
- Hands-on: Create a spark cluster using REST API
- Hands-on: Delete a spark cluster using REST API
- Hands-on: Permanently delete a spark cluster using REST API
Databricks Developer Tools with Hands on Session Example
- Databricks Notebook, Rest API , Delta Lake What is Databricks Developer tools?
- Hands-on: Download and install python
- Hands-on: How to set up databricks cli?
- Hands-on: How to use databricks cli?
- Hands-on: How to use Databricks Utilities?
- Hands-on: Download and install JDK
- Hands-on: Download and install IntelliJ IDEA
- Hands-on: Using Databricks Utilities API Library in IDE
- Hands-on: How to use databricks in Azure Data Factory
- Hands-on: How to debug the notebook in pipeline?
- Hands-on: ETL with Azure Databricks
- Hands-on: How to debug ETL notebook in ETL pipeline?
Databricks CLI and Rest API
- DataBricks CLI
- Setting up Databricks CLI
- Lab : Workspace CLIS
- Lab : Cluster CLI
- Lab : DBFS CLI
- Lab : Jobs CLI
- Databricks CLI on Windows
- REST API
- Lab : Invoke REST API
- Lab : Job REST API
- Lab : Token Rest API
- Lab : Group API
Data Bricks CLI
- DataBricks CLI
- Setting up Databricks CLI
- Lab : Workspace CLIS
- Lab : Cluster CLI
- Lab : DBFS CLI
- Lab : Jobs CLI
- Databricks CLI on Windows
- REST API
- Lab : Invoke REST API
- Lab : Job REST API
- Lab : Token Rest API
- Lab : Group API
Working with Databricks File System & Security
- Working with DBFS Root
- Mounting ADLS to DBFS
- Drawbacks of Azure Datalake
- What is delta lake
- Understanding Lakehouse Architecture
- DataBricks Security
- Lab : Secret management
- Part I -> Column level Security
- Part II -> Column level Security
- Row level Security
Delta Lake & Delta Table
- Drawbacks of Azure Datalake
- What is delta lake
- Understanding Lakehouse Architecture
- Creating databricks workspace and ADLS for delta lake
- Accessing Datalake storage using service principal
- Sharing data for External Delta Table
- Reading Delta Table
- Delta Table Operations
- Drawbacks of ADLS – practical
- Medallion Lakehouse architecture
- Creating Delta Lake
- Understanding the delta format
- Understanding Transaction Log
- Creating delta tables using SQL Command
- Creating Delta table using PySpark Code
- Uploading files for next lectures
- Schema Enforcement
- Schema Evolution
- Delta Table Time Travel
- Time Travel and Versioning
- Vacuum Command
- Convert to Delta
- Understanding Optimize Command – Demo
- Optimize Command – Practical
- UPSERT using MERGE
- Lab : Create Delta Table (SQL & Python)
- Lab : Read & Write Delta Table
- . Lab : Convert a Parquet table to a Delta table
- Lab : Incremental ETL load
- Lab : Incremental ETL load (@version property)
- Convert Parquet to Delta
- Detailed of Delta Table Schema Validation
- Detailed of Delta Table Schema Evolution
- Look Inside Delta Table
- Delta Table Utilities and Optimization
- Processing XML, JSON , Delta Tables :
- Processing Nested XML file
- Processing Nested JSON file
- Delta Table – Time Travel and Vacuum
- UDF using Pyspark – hands on example
- Spark ingestion
- Disk partitioning
- Storage
- Predicate Pushdown
- Serialization
- Bucketing
- Zordering
- Adaptive Query Execution
Unity Catalog
- What is Unity Catalog
- Creating Access Connector for Databricks
- Creating Metastore in Unity Catalog
- Unity Catalog Object Model
- Roles in Unity Catalog
- Creating users in Azure Entra ID
- User and groups management Practical
- Cluster Policies
- What are cluster pools
- Creating Cluster Pool
- Creating a Dev Catalog
- Unity Catalog Privileges
- Understanding Unity Catalog
- Creating and accessing External location and storage credential
- Managed and External Tables in Unity Catalog
- Working with Securable Objects
- Setup Unity Catalog
- Unity Catalog User Provisioning
Unity Catalog- Mini Project
- Create External Location
- Create Catalogs and Schema
- Create External Tables
- Create Managed Tables
- Create Databricks Workflow
- Data Discovery
- Data Audit
- Data Lineage
- Data Access Control Overview
- Data Access Control Demo
Spark Structure Streaming & Autoloader In DataBricks
- Spark Structured Streaming – basics
- Understanding micro batches and background query
- Supported Sources and Sinks
- WriteStream and checkpoints
- Community Edition Drop databases
- Understanding outputModes
- Understanding Triggers
- Autoloader – Intro
- Autoloader – Schema inference
- What is Autoloader & Demo
- Autoloader Schema Evolution
- How to build incremental pipeline using Autoloader
- Schema Evolution – Demo
- Schema Evolution – Practical
Data Bricks incremental Ingestion tools
- Architecture and Need for Incremental Ingestion
- Using Copy Into with Manual Schema Evolution
- Using Copy Into with Automatic Schema Evolution
- Streaming Ingestion with Manual Schema Evolution
- Streaming Ingestion with Automatic Schema Evolution
- Introduction to Databricks Autoloader
- Autoloader with Automatic Schema Evolution
Notebook CI/CD via Azure Devops with Github
- Integrate databricks notebook with Git providers like Github.
- Configure Continuous Integration – Artefacts to deployed in clusters.
- Configure Continuous delivery using datathirst templates.
- Run notebook on Azure Databricks via Jobs.
- Secure cluster via cluster policy and permission
- DataFactory LinkedServices
- Orchestrate notebook via DataFactory
Project Details
- Typical Medallion Architecture
- Project Architecture
- Understanding the dataset
- Expected Setup
- Creating containers and External Locations
- Creating all schemas dynamically
- Creating bronze Tables Dynamically
Ingestion to Bronze
- Ingesting data to bronze layer – Demo
- Ingesting raw_traffic data to bronze table
- Assignment to get the raw_roads data to bronze table
- Ingesting raw_roads data to bronze Table
- To prove autoloader handles incremental loading
Silver & Gold Layer Transformation
- Transforming Silver Traffic data
- To prove only incremented records were being transformed
- Creating a common Notebook
- Run one notebook from another notebook
- Transforming Silver Roads data
- Getting data to Gold Layer
- Gold Layer Transformations and loading
Live Sessions Price:
For LIVE sessions – Offer price after discount is 259 USD 199 125 USD Or USD13900 INR 11900 INR 9900 Rupees
Sample Course Completion Certificate:
Your course completion certificate looks like this……
Typically, there is a one-day break following public sessions.
Important Note:
To maintain the quality of our training and ensure smooth progress for all learners, we do not allow batch repetition or switching between courses. Once you enroll in a batch, please make sure to attend the classes regularly as per the schedule. We kindly request you to plan your learning accordingly. Thank you for your support and understanding.
Course Features
- Lecture 0
- Quiz 0
- Duration 40 hours
- Skill level All levels
- Language English
- Students 0
- Assessments Yes