Azure Data Engineer with ADF, Synapse, Fabric, Data bricks and PySpark -Live Training.
(A complete hands-on program covering Microsoft Fabric, Synapse, Databricks, PySpark & DevOps pipelines)
Isha presents an comprehensive, hands-on training programs focused on Microsoft Fabric, Azure Databricks, Apache Spark, and end-to-end data engineering. The curriculum spans critical areas such as Fabric Warehouse, Data Lakehouse architecture, PySpark transformations, Delta Lake operations, Unity Catalog security, and real-time data streaming. Learners gain expertise in modern data integration techniques, CI/CD workflows using Azure DevOps, and secure data governance. With a strong emphasis on Medallion architecture and real-world projects, our training equips professionals with the technical depth and practical skills required to succeed in today’s data-driven industry.
Prerequisite for training: – Knowledge in python programing and SQL.
About the Instructor:
Expert in Microsoft Azure | 18+ Years of MNC Experience | 8+ Years of Training Excellence Raj is a seasoned IT professional with over 18 years of hands-on experience in top Multinational Companies (MNCs), specializing in Microsoft Azure and cloud-based enterprise solutions. Throughout his career, he has successfully led multiple cloud transformation projects, DevOps implementations, and infrastructure modernization initiatives for global clients. With a deep understanding of Azure services, cloud architecture, security, and automation, Raj brings real-world expertise into the classroom. His technical depth is matched by his ability to simplify complex topics for learners of all levels. For the past 8 years, Raj has been dedicated to training and mentoring professionals and freshers alike. His passion for teaching and clarity in delivery have helped hundreds of learners gain the confidence and skills needed to excel in cloud computing and Microsoft Azure technologies. |
Live Sessions Price:
For LIVE sessions – Offer price after discount is 340 USD 300 200 USD Or USD29000 INR 25000 INR 17000 Rupees
OR
What will I learn by the end of this course?
- Gain in-depth proficiency in Microsoft Fabric, including Warehouse setup, Dataflows Gen2, Access Control, Lakehouse vs Warehouse design decisions, and SQL analytics features like Time Travel and Zero Copy Clones.
- Master PySpark and Apache Spark SQL, covering DataFrames, window functions, joins, transformations, partitioning, UDFs, and advanced performance optimization techniques.
- Develop hands-on expertise in Azure Databricks, including cluster setup, DBFS, notebook management, REST API integration, Delta Lake fundamentals, and Medallion Architecture implementation.
- Learn Delta Lake features such as Schema Enforcement, Evolution, Time Travel, Vacuum, Optimize, Z-Ordering, and efficient ingestion using Auto Loader and Structured Streaming.
- Understand and configure Unity Catalog for enterprise-grade data governance with external locations, storage credentials, roles, permissions, and security layers like Row-Level and Column-Level Security.
- Implement real-world medallion architecture pipelines from raw to bronze, silver, and gold layers with practical exercises, optimized transformations, and data modeling for reporting.
Free Demo Session:
24th June @ 9 PM – 10 PM (IST) (Indian Timings)
24th June @ 11:30 AM – 12:30 PM (EST) (U.S Timings)
24th June @ 4:30 PM – 5:30 PM (BST) (UK Timings)
Class Schedule:
For Participants in India: Monday to Friday 9 PM – 10 PM (IST)
For Participants in US: Monday to Friday 11:30 AM – 12:30 PM (EST)
For Participants in UK: Monday to Friday 4:30 PM – 5:30 PM (BST)
What student’s have to say about Trainer:
Fantastic trainer! Each session was well-structured and full of actionable insights – Smitha
The sessions were super interactive, and the trainer made even the most complex Azure components feel simple. I now understand Data Factory pipelines and workspace organization much better than before- Chandu
Thank you for such an informative and well-organized training- Swarna
Loved the way the trainer explained Azure Synapse and Databricks—very hands-on and easy to follow –Anu
Excellent at maintaining engagement throughout. Every session felt well-paced and thoughtfully delivered.- Amaresh
I gained a lot more than I expected, mainly due to the trainer’s teaching style and attention to individual progress –Megha
Salient Features:
- 40 Hours of Live Training along with recorded videos
- Lifetime access to the recorded videos
- Course Completion Certificate
Who can enroll in this course?
- Data Engineers looking to deepen their skills in Microsoft Fabric, Databricks, and Delta Lake.
- Data Analysts and BI Developers aiming to transition into data engineering or work with large-scale analytics solutions.
- Software Developers wanting to learn big data processing using Apache Spark and PySpark.
- ETL Developers and Azure Data Factory users interested in advanced data orchestration and automation.
- DevOps Engineers and Cloud Engineers working with CI/CD pipelines, Git integration, and Azure DevOps.
- Database Administrators (DBAs) moving toward cloud-based data platforms.
- Anyone preparing for roles in modern data platforms, including Lakehouse and streaming data architectures.
Course syllabus:
Azure Data Factory + Synapse : 8hrs
Azure Fabric : 10hrs
Pyspark : 8hrs
Databricks : 14hrs
Azure Data Factory
- What is Azure Data Factory?
- Create Azure Data Factory service
- Building Blocks of Data Factory
- ADF Provisioning
- Linked Services
ADF Activity, Control Flow & Copy Activity
- Lookup Activity
- Get Metadata Activity
- Filter Activity
- For Each Loop
- If else condition
- Execute Pipeline activity
- First Pipeline – Lookup / Set Variable / Datasets
- Foreach Activity – Processing Items In A Loop
- Using Stored Procedures With Lookup Activity
- Read File/Folder Properties Using Get Metadata Activity
- Validation Activity Vs Get Metadata Activity
- Conditional Execution Using IF Activity
- Copy Data Activity – Scenario 1
- Copy Data Activity – Scenario 2
- Assignment 1: Copy files from local filesystem to Azure SQL Database
- Assignment 2: Load a table from one db to another db based on a condition
- Project -LAB-1 Design & Build First Metadata Driven ETL Framework
- Proejct-LAB-2 Design & Build First Metadata Driven ETL Framework
- Using Wait Activity As A Timer
- Using Fail Activity To Raise Exceptions
- Using Append Activity With Array Variable
- Using Filter Activity To Selectively Process Files
- Using Delete Activity To Cleanup Files After Processing
- Copy A Single JSON File
- Copy A Single TEXT File
- Copy A Single PARQUET File
- Copy All Files In A Folder
- Binary File Copy & Implementing File Move
- File To Table Copy Using Insert & Upsert Techniques
- File To Table Copy Using Stored Procedure
- File To Table Copy – Large File Issue
- Table To File Copy
- . Master-Child Pattern Using Execute Pipeline Activity
- Using Self Hosted Integration Runtime To Ingest On-Prem Data
- Parameterized Linked Service & ADF Global Parameters
- Automated Execution Of Pipeline Using Scheduled Triggers
- Event-Based Execution Of Pipeline Using Event Triggers
- ADF Limitation – No Iterative Activity Within Foreach Activity
- ADF Limitation – No Iterative Activity Within IF Activity
- ADF Limitation – No Iterative Activity Within SWITCH Activity
- Sequential Vs Parallel Batch In A Foreach Activity
- ADF Limitation – Dynamic Execution Of Pipeline
- ADF Limitation – Record Number & Data Size Restrictions
- ADF Limitation – Dynamic Variable Name In Set Variable
Project: Meta Driven ETL Framework
- Set Up Azure Active Directory Users/Groups & Key Vault
- Set Up Azure Storage
- Set Up Azure SQL Database
- Set Up Additional Groups & Users
- ETL Framework Metadata Tables
- Set Up Azure Data Factory
- Modular & Reusable Design
- Generic Pipeline to Extract From SQL Database -1
- Generic Pipeline to Extract From SQL Database -2
- Generic Pipeline to Extract From SQL Database -3
- Use Case : Historical Or Intial Load executing with dynamic configuration approach
- Use Case : Incremental Load from Azure SQL to Azure Data Lake using 10 minute SLA .
Data Ingestion
- Data Ingestion – Integration Runtimes
- Data Ingestion – What is Self Hosted Integration Runtime
- Overview of On-premise data source and Datalake
- Downloading and installing Self Hosted IR in On-premise
- UPDATE – Self Hosted IR Files Access issue
- Creating and adding Secrets to Azure Key vault
- Creating Linked Service for Azure Key vault – Demo
- Creating Linked Service and Dataset for On-premise File
- UPDATE -Fix access issue- Create Azure VM and install Self
- UPDATED- Fix ‘host’ is not allowed error
- Creating Linked Service and Dataset for Azure Datalake
- Creating Copy Activity to copy all files from On-premise to Azure
- Incremental data loading using Last Modified Date of File
- Incremental Load based on File Name – Demo
- Incremental Data loading based on Filename – Practical
Parameterize
- Parameterize Linked Service, DataSets, Pipeline
- Monitor Visually
- Azure Monitor
Real Time Use Case – Frequently used into the project
- Apply UPSERT into ADF – using Copy Activity
- One Prem to Azure Cloud Migration
- Remove Duplicate record in ADF
- How to handle NULL From ADF
- Remove specific Rows in File using ADF
- Remove 1st few Rows and last few Rows From ADF
- Handle Error Handling in Data Flow Mapping
- Get File Name From Source
- Copy Files based on last modified Date
- Build ETL Pipeline
- Modular & Resuable Design
- Passing Parent pipeline Run ID & Parent Pipeline Name to Child Pipeline .
- Slowly Changing Dimension Type I
- Lab: Slowly Changing Dimension Type 1
- Artifacts for Tables used in the Lab session of SCD Type 1
- Slowly Changing Dimension Type 2 (Concepts)
- Artifacts for Tables used in the Lab Session of SCD Type II
- Lab: Slowly Changing Dimension Type 2
Azure Synapse Analytics
- Why Warehousing in Cloud
- Traditional vs Modern Warehouse architecture
- What is Synapse Analytics Service
- Demo: Create Dedicated SQL Pool
- Demo: Connect Dedicated SQL Pool with SSMS
- Demo: Create Azure Synapse Analytics Studio Workspace
- Demo: Explore Synapse Studio V2
- Demo: Create Dedicated SQL Pool and Spark Pool
- Demo: Analyse Data using Dedicated SQL Pool
- Demo: Analyse Data using Apache Spark Notebook
- Demo: Analyse Data using Serverless SQL Pool
- Demo: Data Factory from Synapse Analytics Studio
- Demo: Monitor Synapse Studio
Azure Synapse Benefits
- Introduction:
- What is Microsoft Fabric?
- Fabric Signup
- Creating Fabric Workspace
- Fabric Pricing
- Creating storage account in Azure
- Creating Azure Synapse Analytics Service in Azure
- Evolution of Data Architectures
- Delta Lake Structure
- Why Microsoft Fabric is needed
- Microsoft’s definition of Fabric
- How to enable and access Microsoft Fabric
- Fabric License and costing
- Update in Fabric UI
- Experiences in Microsoft Fabric
- Fabric Terminology
- OneLake in Fabric
- One copy for all Computes in Microsoft Fabric
Fabric Lakehouse
- Understanding Fabric Workspaces
- Enable Fabric Trail and Create workspace
- Purchasing Fabric Capacity from Azure
- Workspace roles in Microsoft Fabric
- Update in the create items UI
- Creating a Lakehouse
- What is inside lakehouse
- Uploading data to Lakehouse
- Uploading Folder into Lakehouse
- SQL analytics endpoint in Lakehouse
- Access SQL analytics endpoint using SSMS
- Visual Query in SQL endpoint
- Default Semantic Model
- OneLake File Explorer
Fabric Datafactory
- Fabric Data Factory UI
- Ways to load data into Lakehouse
- Fabric Data Factory vs Azure Data Factory Scenario
- Gateway types in Microsoft Fabric
- Installing On-prem data gateway
- Create Connection to SQL Server
- Pipeline to ingest OnPrem SQL data to Lakehouse
- Scenario completed using Fabric data factory
- Dataflow Gen2 – Intro
- Creating DataFlow Gen2
- DataFlow Gen2 in Fabric vs Dataflows in ADF
OneLake in Fabric
- Shortcuts in Fabric – Intro
- Prerequisites to Create a shortcut
- Creating a shortcut in Files of Lakehouse
- Criteria to create shortcuts in table section
- Uploading required files and access for synapse
- Right way to create a shortcut in table’s section
- Creating delta file
- Creating shortcut in Table’s section
- Scenario – Creating shortcut with delta in a subfolder
- Scenario – Creating shortcut with only parquet format
- Requirements to create shortcuts in Table and files section
- Updation Scenario 1 – Lakehouse to Datalake
- Updation Scenario 2 – Datalake to Lakehouse
- Shortcut deletion scenarios intro
- Deletion Scenario 1 – Delete in Lakehouse files
- Deletion Scenario 2 – Delete in ADLS
- Deletion Scenario 3 – Delete table data in Lakehouse
- Deletion Scenario 4 – Delete table data in ADLS
- Deletion Scenario 5 – Deleting entire shortcut
- Shortcut deleting scenario summary
Fabric Synapse Data Engineering
- Ingestion to Lakehouse status
- Spark in Microsoft Fabric
- Spark pools in Microsoft Fabric
- Spark pool node size
- Customizing Starter pools
- Creating a custom pool in Workspace
- Standard vs High Concurrency Sessions
- Changing Spark Settings to StarterPool
- Update in attaching Lakehouse to Notebook Option
- Understanding Notebooks UI
- Fabric Notebook basics
- MSSparkUtils – Intro
- MSSparkUtils – FS- Mount
- MSSparkUtils – FS – Other utils
- MSSparkUtils – FS – FastCp
- Creating Folders in Microsoft Fabric
- MSSparkUtils – Notebook Utils – Run exit
- MSSparkUtils – Notebook – RunMultiple
- Access ADLS data to Lakehouse – Intro
- Access ADLS using Entra ID
- Access ADLS using Service principal
- Access ADLS using SP with keyvault
- Call Fabric notebook from Fabric pipeline
- Managed vs External table – Intro
- Create a Managed Table
- Create an External Table
- Shortcut Table is an external or managed table
- Data Wrangler in Fabric Notebook
- Environments in Microsoft Fabric
- Understanding V-order optimization
- Inspire us with your Thoughts
- Spark Job Definition
- What is a data mesh
- Creating domains in Fabric
Synapse Migration to Fabric
- Manual import from Synapse to Fabric
- Automated way to import and export notebooks – Intro
- Migrate all notebooks from Synapse to fabric
- Possibility of Migration of Pipelines to Fabric pipelines
- Ways to migrate ADLS data to Fabric OneLake
- Migrate ADLS data to Onelake using Storage Explorer
- Install Capacity Metrics App
- Understanding UI of Capacity Metrics App
- Capacity Units consumption
- Throttling vs Smoothing
- Throttling stage- Overage Protection Policy
- Other throttling stages
- Throttling stages Summary
- Overages in Fabric
- System Events in Fabric
- Matrix Visual
Fabric Warehouse Synapse
- Creating a Warehouse in Fabric
- Warehouse vs SQL Analytics Endpoint
- Creating a table and Limitations
- Ways to Load Data into Warehouse
- Loading Data using COPY INTO Command
- Loading Data using Pipeline to Warehouse
- Loading Data using DataFlow Gen2
- Data Sharing – Lakehouse & Warehouse
- Cross Database Ingestion in Warehouse
- Lakehouse vs Warehouse when to choose what
- Different Medallion Architectural patterns
- Update Lakehouse data from WH and vice versa
- SQL query as session in Fabric
- Zero Copy clone within and across Schema
- Time Travel in Warehouse
- Benefits & Limitations of Zero Copy clones
- Cloning single or multiple tables using UI
- Query Insights in Warehouse
Fabric Access Control and Permission
- Microsoft Fabric Structure
- Tenant Level permissions
- Capacity Level Permissions
- Creating new user in Entra ID
- Workspace roles- Workspace Administration
- Workspace roles – Data pipeline permissions
- Workspace Roles – Notebook, Spark jobs, etc
- Data Warehouse permissions – Intro
- Workspace Roles – Accessing shortcuts internal to fabric – Theory
- Workspace Roles – Accessing Shortcuts Internal to Fabric – Practical
- Workspace Roles – Accessing ADLS shortcuts – Theory
- Workspace Roles – Accessing ADLS shortcuts – Practical
- Workspace Roles – Lakehouse permissions
- Item level permissions – Intro
- Warehouse Sharing – No additional permissions
- Warehouse Sharing – ReadData permissions
- Warehouse Sharing – ReadAll permissions
- Warehouse Sharing – Build permissions
- Extend Microsoft Fabric Trail
- Lakehouse Sharing – All permissions
- Notebook – Item Sharing
- Manage OneLake data access
- Row-Level Security in Warehouse and SQL endpoint
- Dynamic Data Masking in Warehouse and SQL endpoint
- Column & Object level security in Warehouse and SQL endpoint
End to End project using Fabric
- Different Medallion architectures in Fabric
- Understanding domain and dataset information
- Project Architecture
- Creating workspace for project and review dataset
- Get data from Raw to landing – theory
- Raw to landing zone
- Different incremental loading patterns
- Incrementally ingest from Raw to landing zone
- Automate ingest from Raw to Landing using pipeline
- Ingest data from Landing to Bronze layer – Theory
- Understanding UPSERT logic for Landing to Bronze ingestion
- Landing to Bronze layer – practical
- Reading landing to bronze from next partition
- UPSERT scenario practical – Landing to bronze
- Bronze layer to Silver layer – Theory
- Understanding data transformations and UPSERT logic for Silver table
- Silver table – Data cleaning
- Silver Layer – data transformations
- Gold Layer – Facts and dimensions table – Theory
- Gold Layer – Facts and dimension tables – Practical
- Data modelling and creating a report
- Orchestrate end to end pipeline and execute it
GIT Integration
- Creating data sources for PROD
- Changes made to support Git integration
- Executing to check if changes were working
- Sign up with Azure DevOps account
- Connect Fabric workspace to Azure DevOps
- Git integration permissions and Limitations
- Locking main branch with branch policy
- Understanding Continuous Integration (CI) in Fabric
- Continuous Integration in Fabric Workspace
- Status of workspace created for feature branch
- Understanding Continuous Deployment in Fabric
- Deploying Fabric items from Dev to Prod
- Deployment rules to Change data sources of Prod workspace
- End to End execution in PROD
- Git integration for Power BI developers
Pyspark
Apache Spark using SQL – Getting Started
- Launching and using Spark SQL CLI
- Understanding Spark Metastore Warehouse Directory
- Managing Spark Metastore Databases
- Managing Spark Metastore Tables
- Retrieve Metadata of Spark Metastore Tables
- Role of Spark Metastore or Hive Metastore
- Example to working with Dataframe
- DataFrame with SparkSQL shell
- Spark DataFrame
- working with dataframe row
- working with Dataframe row and unit test
- working with Dataframe row and unstructure data
- working with dataframe column
- DataFrame partition and Executors
- Creating and using UDF
- Aggregation in DataFrame
- Windowing in dataframe
- -Grouping Aggregation in Dataframe
- DataFrame joins
- Internal Joins & shuffle
- Optimizing joins
- Implementing Bucket joins
- Spark Transformation and Actions
- Spark Jobs Stages & Task
- Understanding Execution plan
- Unit Testing in Spark
- Debuging Spark Driver and Executor
- Spark Application logs in cluster
Assignment :
Spark SQL Exercise
Apache Spark using SQL – Pre-defined Function
- Overview of Pre-defined Functions using Spark SQL
- Validating Functions using Spark SQL
- String Manipulation Functions using Spark SQL
- Date Manipulation Functions using Spark SQL
- Overview of Numeric Functions using Spark SQL
- Data Type Conversion using Spark SQL
- Dealing with Nulls using Spark SQL
- Using CASE and WHEN using Spark SQL
Apache Spark using SQL – Basic Transformations
- Prepare or Create Tables using Spark SQL
- Projecting or Selecting Data using Spark SQL
- Filtering Data using Spark SQL
- Joining Tables using Spark SQL – Inner
- Joining Tables using Spark SQL – Outer
- Aggregating Data using Spark SQL
- Sorting Data using Spark SQL
Apache Spark using SQL – Basic DDL and DML
- Introduction to Basic DDL and DML using Spark SQL
- Create Spark Metastore Tables using Spark SQL
- Overview of Data Types for Spark Metastore Table Columns
- Adding Comments to Spark Metastore Tables using Spark SQL
- Loading Data Into Spark Metastore Tables using Spark SQL – Local
- Loading Data Into Spark Metastore Tables using Spark SQL – HDFS
- Loading Data into Spark Metastore Tables using Spark SQL – Append and Overwrite
- . Creating External Tables in Spark Metastore using Spark SQL
- Managed Spark Metastore Tables vs External Spark Metastore Tables
- Overview of Spark Metastore Table File Formats
- Drop Spark Metastore Tables and Databases
- Truncating Spark Metastore Tables
- Exercise – Managed Spark Metastore Tables
Apache Spark using SQL – DML and Partitioning
- Introduction to DML and Partitioning of Spark Metastore Tables using Spark SQL
- ntroduction to Partitioning of Spark Metastore Tables using Spark SQL
- Creating Spark Metastore Tables using Parquet File Format
- Load vs. Insert into Spark Metastore Tables using Spark SQL
- Inserting Data using Stage Spark Metastore Table using Spark SQL
- Creating Partitioned Spark Metastore Tables using Spark SQL
- Adding Partitions to Spark Metastore Tables using Spark SQL
- Loading Data into Partitioned Spark Metastore Tables using Spark SQL
- Inserting Data into Partitions of Spark Metastore Tables using Spark SQL
- Using Dynamic Partition Mode to insert data into Spark Metastore Tables
Azure Databricks
Introduction and Basic Understanding
- Creating a budget for project
- Creating an Azure Databricks Workspace
- Creating an Azure Datalake Storage Gen2
- Walkthough on databricks Workspace UI
- Introduction to Distributed Data Processing
- What is Azure Databricks
- Azure Databricks Architecture
- Cluster types and configuration
- Behind the scenes when creating cluster
- Sign up for Databricks Community Edition
- Understanding notebook and Markdown basics
- Notebook – Magic Commands
- DBUitls -File System Utilities
- DBUitls -Widget Utilities
- DBUtils – Notebook Utils
- Navigate the Workspace
- Databricks Runtimes
- Clusters Part 1
- Cluster Part 2
- Notebooks
- Libraries
- Repos for Git integration
- Databricks File System (DBFS)
- DBUTILS
- Widgets
- Workflows
- Metastore – Setup external Metastore
- Metastore – Setup external Metastore II
- Hands-on: How to navigate to the databricks service?
- Hands-on: How to create a workspace?
- Hands-on: How to create a spark cluster?
- Hands-on: How to create a notebook?
- Hands-on: How to create a table?
- Hands-on: How to delete a spark cluster?
- Hands-on: How to delete all resources in Azure Cloud?
- What is workspace?
- What is Resource Group?
- What is Databricks Runtime?
- What is notebook?
- Hands-on: Using notebook to visualize data
- Hands-on: Set up Apache Spark with Delta Lake
- Hands-on: Using python to operate delta lake
- Hands-on: Download and install postman
- Hands-on: Generate a token
- Hands-on: Create a spark cluster using REST API
- Hands-on: Delete a spark cluster using REST API
- Hands-on: Permanently delete a spark cluster using REST API
Databricks Developer Tools with Hands on Session Example
- Databricks Notebook, Rest API , Delta Lake What is Databricks Developer tools?
- Hands-on: Download and install python
- Hands-on: How to set up databricks cli?
- Hands-on: How to use databricks cli?
- Hands-on: How to use Databricks Utilities?
- Hands-on: Download and install JDK
- Hands-on: Download and install IntelliJ IDEA
- Hands-on: Using Databricks Utilities API Library in IDE
- Hands-on: How to use databricks in Azure Data Factory
- Hands-on: How to debug the notebook in pipeline?
- Hands-on: ETL with Azure Databricks
- Hands-on: How to debug ETL notebook in ETL pipeline?
Databricks CLI and Rest API
- DataBricks CLI
- Setting up Databricks CLI
- Lab : Workspace CLIS
- Lab : Cluster CLI
- Lab : DBFS CLI
- Lab : Jobs CLI
- Databricks CLI on Windows
- REST API
- Lab : Invoke REST API
- Lab : Job REST API
- Lab : Token Rest API
- Lab : Group API
Data Bricks CLI
- DataBricks CLI
- Setting up Databricks CLI
- Lab : Workspace CLIS
- Lab : Cluster CLI
- Lab : DBFS CLI
- Lab : Jobs CLI
- Databricks CLI on Windows
- REST API
- Lab : Invoke REST API
- Lab : Job REST API
- Lab : Token Rest API
- Lab : Group API
Working with Databricks File System & Security
- Working with DBFS Root
- Mounting ADLS to DBFS
- Drawbacks of Azure Datalake
- What is delta lake
- Understanding Lakehouse Architecture
- DataBricks Security
- Lab : Secret management
- Part I -> Column level Security
- Part II -> Column level Security
- Row level Security
Delta Lake & Delta Table
- Drawbacks of Azure Datalake
- What is delta lake
- Understanding Lakehouse Architecture
- Creating databricks workspace and ADLS for delta lake
- Accessing Datalake storage using service principal
- Sharing data for External Delta Table
- Reading Delta Table
- Delta Table Operations
- Drawbacks of ADLS – practical
- Medallion Lakehouse architecture
- Creating Delta Lake
- Understanding the delta format
- Understanding Transaction Log
- Creating delta tables using SQL Command
- Creating Delta table using PySpark Code
- Uploading files for next lectures
- Schema Enforcement
- Schema Evolution
- Delta Table Time Travel
- Time Travel and Versioning
- Vacuum Command
- Convert to Delta
- Understanding Optimize Command – Demo
- Optimize Command – Practical
- UPSERT using MERGE
- Lab : Create Delta Table (SQL & Python)
- Lab : Read & Write Delta Table
- . Lab : Convert a Parquet table to a Delta table
- Lab : Incremental ETL load
- Lab : Incremental ETL load (@version property)
- Convert Parquet to Delta
- Detailed of Delta Table Schema Validation
- Detailed of Delta Table Schema Evolution
- Look Inside Delta Table
- Delta Table Utilities and Optimization
- Processing XML, JSON , Delta Tables :
- Processing Nested XML file
- Processing Nested JSON file
- Delta Table – Time Travel and Vacuum
- UDF using Pyspark – hands on example
- Spark ingestion
- Disk partitioning
- Storage
- Predicate Pushdown
- Serialization
- Bucketing
- Zordering
- Adaptive Query Execution
Unity Catalog
- What is Unity Catalog
- Creating Access Connector for Databricks
- Creating Metastore in Unity Catalog
- Unity Catalog Object Model
- Roles in Unity Catalog
- Creating users in Azure Entra ID
- User and groups management Practical
- Cluster Policies
- What are cluster pools
- Creating Cluster Pool
- Creating a Dev Catalog
- Unity Catalog Privileges
- Understanding Unity Catalog
- Creating and accessing External location and storage credential
- Managed and External Tables in Unity Catalog
- Working with Securable Objects
- Setup Unity Catalog
- Unity Catalog User Provisioning
Unity Catalog- Mini Project
- Create External Location
- Create Catalogs and Schema
- Create External Tables
- Create Managed Tables
- Create Databricks Workflow
- Data Discovery
- Data Audit
- Data Lineage
- Data Access Control Overview
- Data Access Control Demo
Spark Structure Streaming & Autoloader In DataBricks
- Spark Structured Streaming – basics
- Understanding micro batches and background query
- Supported Sources and Sinks
- WriteStream and checkpoints
- Community Edition Drop databases
- Understanding outputModes
- Understanding Triggers
- Autoloader – Intro
- Autoloader – Schema inference
- What is Autoloader & Demo
- Autoloader Schema Evolution
- How to build incremental pipeline using Autoloader
- Schema Evolution – Demo
- Schema Evolution – Practical
Data Bricks incremental Ingestion tools
- Architecture and Need for Incremental Ingestion
- Using Copy Into with Manual Schema Evolution
- Using Copy Into with Automatic Schema Evolution
- Streaming Ingestion with Manual Schema Evolution
- Streaming Ingestion with Automatic Schema Evolution
- Introduction to Databricks Autoloader
- Autoloader with Automatic Schema Evolution
Notebook CI/CD via Azure Devops with Github
- Integrate databricks notebook with Git providers like Github.
- Configure Continuous Integration – Artefacts to deployed in clusters.
- Configure Continuous delivery using datathirst templates.
- Run notebook on Azure Databricks via Jobs.
- Secure cluster via cluster policy and permission
- DataFactory LinkedServices
- Orchestrate notebook via DataFactory
Project Details
- Typical Medallion Architecture
- Project Architecture
- Understanding the dataset
- Expected Setup
- Creating containers and External Locations
- Creating all schemas dynamically
- Creating bronze Tables Dynamically
Ingestion to Bronze
- Ingesting data to bronze layer – Demo
- Ingesting raw_traffic data to bronze table
- Assignment to get the raw_roads data to bronze table
- Ingesting raw_roads data to bronze Table
- To prove autoloader handles incremental loading
Silver & Gold Layer Transformation
- Transforming Silver Traffic data
- To prove only incremented records were being transformed
- Creating a common Notebook
- Run one notebook from another notebook
- Transforming Silver Roads data
- Getting data to Gold Layer
- Gold Layer Transformations and loading
Live Sessions Price:
For LIVE sessions – Offer price after discount is 259 USD 199 125 USD Or USD13900 INR 11900 INR 9900 Rupees
Sample Course Completion Certificate:
Your course completion certificate looks like this……
Typically, there is a one-day break following public sessions.
Important Note:
To maintain the quality of our training and ensure smooth progress for all learners, we do not allow batch repetition or switching between courses. Once you enroll in a batch, please make sure to attend the classes regularly as per the schedule. We kindly request you to plan your learning accordingly. Thank you for your support and understanding.
Course Features
- Lectures 591
- Quiz 0
- Duration 40 hours
- Skill level All levels
- Language English
- Students 0
- Assessments Yes
Curriculum
- 37 Sections
- 591 Lessons
- 40 Hours
- Azure Data Factory5
- ADF Activity, Control Flow & Copy Activity44
- 2.1Lookup Activity
- 2.2Get Metadata Activity
- 2.3Filter Activity
- 2.4For Each Loop
- 2.5If else condition
- 2.6Execute Pipeline activity
- 2.7First Pipeline – Lookup / Set Variable / Datasets
- 2.8Foreach Activity – Processing Items In A Loop
- 2.9Using Stored Procedures With Lookup Activity
- 2.10Read File/Folder Properties Using Get Metadata Activity
- 2.11Validation Activity Vs Get Metadata Activity
- 2.12Conditional Execution Using IF Activity
- 2.13Copy Data Activity – Scenario 1
- 2.14Copy Data Activity – Scenario 2
- 2.15Assignment 1: Copy files from local filesystem to Azure SQL Database
- 2.16Assignment 2: Load a table from one db to another db based on a condition
- 2.17Project -LAB-1 Design & Build First Metadata Driven ETL Framework
- 2.18Proejct-LAB-2 Design & Build First Metadata Driven ETL Framework
- 2.19Using Wait Activity As A Timer
- 2.20Using Fail Activity To Raise Exceptions
- 2.21Using Append Activity With Array Variable
- 2.22Using Filter Activity To Selectively Process Files
- 2.23Using Delete Activity To Cleanup Files After Processing
- 2.24Copy A Single JSON File
- 2.25Copy A Single TEXT File
- 2.26Copy A Single PARQUET File
- 2.27Copy All Files In A Folder
- 2.28Binary File Copy & Implementing File Move
- 2.29File To Table Copy Using Insert & Upsert Techniques
- 2.30File To Table Copy Using Stored Procedure
- 2.31File To Table Copy – Large File Issue
- 2.32Table To File Copy
- 2.33Master-Child Pattern Using Execute Pipeline Activity
- 2.34Using Self Hosted Integration Runtime To Ingest On-Prem Data
- 2.35Parameterized Linked Service & ADF Global Parameters
- 2.36Automated Execution Of Pipeline Using Scheduled Triggers
- 2.37Event-Based Execution Of Pipeline Using Event Triggers
- 2.38ADF Limitation – No Iterative Activity Within Foreach Activity
- 2.39ADF Limitation – No Iterative Activity Within IF Activity
- 2.40ADF Limitation – No Iterative Activity Within SWITCH Activity
- 2.41Sequential Vs Parallel Batch In A Foreach Activity
- 2.42ADF Limitation – Dynamic Execution Of Pipeline
- 2.43ADF Limitation – Record Number & Data Size Restrictions
- 2.44ADF Limitation – Dynamic Variable Name In Set Variable
- Project: Meta Driven ETL Framework12
- 3.1Set Up Azure Active Directory Users/Groups & Key Vault
- 3.2Set Up Azure Storage
- 3.3Set Up Azure SQL Database
- 3.4Set Up Additional Groups & Users
- 3.5ETL Framework Metadata Tables
- 3.6Set Up Azure Data Factory
- 3.7Modular & Reusable Design
- 3.8Generic Pipeline to Extract From SQL Database -1
- 3.9Generic Pipeline to Extract From SQL Database -2
- 3.10Generic Pipeline to Extract From SQL Database -3
- 3.11Use Case : Historical Or Intial Load executing with dynamic configuration approach
- 3.12Use Case : Incremental Load from Azure SQL to Azure Data Lake using 10 minute SLA
- Data Ingestion15
- 4.1Data Ingestion – Integration Runtimes
- 4.2Data Ingestion – What is Self Hosted Integration Runtime
- 4.3Overview of On-premise data source and Datalake
- 4.4Downloading and installing Self Hosted IR in On-premise
- 4.5UPDATE – Self Hosted IR Files Access issue
- 4.6Creating and adding Secrets to Azure Key vault
- 4.7Creating Linked Service for Azure Key vault – Demo
- 4.8Creating Linked Service and Dataset for On-premise File
- 4.9UPDATE -Fix access issue- Create Azure VM and install Self
- 4.10UPDATED- Fix ‘host’ is not allowed error
- 4.11Creating Linked Service and Dataset for Azure Datalake
- 4.12Creating Copy Activity to copy all files from On-premise to Azure
- 4.13Incremental data loading using Last Modified Date of File
- 4.14Incremental Load based on File Name – Demo
- 4.15Incremental Data loading based on Filename – Practical
- Parameterize3
- Real Time Use Case – Frequently used into the project19
- 6.1Apply UPSERT into ADF – using Copy Activity
- 6.2One Prem to Azure Cloud Migration
- 6.3Remove Duplicate record in ADF
- 6.4How to handle NULL From ADF
- 6.5Remove specific Rows in File using ADF
- 6.6Remove 1st few Rows and last few Rows From ADF
- 6.7Handle Error Handling in Data Flow Mapping
- 6.8Get File Name From Source
- 6.9Get File Name From Source
- 6.10Copy Files based on last modified Date
- 6.11Build ETL Pipeline
- 6.12Modular & Resuable Design
- 6.13Passing Parent pipeline Run ID & Parent Pipeline Name to Child Pipeline .
- 6.14Slowly Changing Dimension Type I
- 6.15Lab: Slowly Changing Dimension Type 1
- 6.16Artifacts for Tables used in the Lab session of SCD Type 1
- 6.17Slowly Changing Dimension Type 2 (Concepts)
- 6.18Artifacts for Tables used in the Lab Session of SCD Type II
- 6.19Lab: Slowly Changing Dimension Type 2
- Azure Synapse Analytics13
- 7.1Why Warehousing in Cloud
- 7.2Traditional vs Modern Warehouse architecture
- 7.3What is Synapse Analytics Service
- 7.4Demo: Create Dedicated SQL Pool
- 7.5Demo: Connect Dedicated SQL Pool with SSMS
- 7.6Demo: Create Azure Synapse Analytics Studio Workspace
- 7.7Demo: Explore Synapse Studio V2
- 7.8Demo: Create Dedicated SQL Pool and Spark Pool
- 7.9Demo: Analyse Data using Dedicated SQL Pool
- 7.10Demo: Analyse Data using Apache Spark Notebook
- 7.11Demo: Analyse Data using Serverless SQL Pool
- 7.12Demo: Data Factory from Synapse Analytics Studio
- 7.13Demo: Monitor Synapse Studio
- Azure Synapse Benefits19
- 8.1Introduction:
- 8.2What is Microsoft Fabric?
- 8.3Fabric Signup
- 8.4Creating Fabric Workspace
- 8.5Fabric Pricing
- 8.6Creating storage account in Azure
- 8.7Creating Azure Synapse Analytics Service in Azure
- 8.8Evolution of Data Architectures
- 8.9Delta Lake Structure
- 8.10Why Microsoft Fabric is needed
- 8.11Microsoft’s definition of Fabric
- 8.12How to enable and access Microsoft
- 8.13Fabric License and costing
- 8.14Update in Fabric UI
- 8.15Experiences in Microsoft Fabric
- 8.16Fabric Terminology
- 8.17OneLake in Fabric
- 8.18One copy for all Computes in Microsoft Fabric
- 8.19
- Fabric Lakehouse14
- 9.1Understanding Fabric Workspaces
- 9.2Enable Fabric Trail and Create workspace
- 9.3Purchasing Fabric Capacity from Azure
- 9.4Workspace roles in Microsoft Fabric
- 9.5Update in the create items UI
- 9.6Creating a Lakehouse
- 9.7What is inside lakehouse
- 9.8Uploading data to Lakehouse
- 9.9Uploading Folder into Lakehouse
- 9.10SQL analytics endpoint in Lakehouse
- 9.11Access SQL analytics endpoint using SSMS
- 9.12Visual Query in SQL endpoint
- 9.13Default Semantic Model
- 9.14OneLake File Explorer
- Fabric Datafactory11
- 10.1Fabric Data Factory UI
- 10.2Ways to load data into Lakehouse
- 10.3Fabric Data Factory vs Azure Data Factory Scenario
- 10.4Gateway types in Microsoft Fabric
- 10.5Installing On-prem data gateway
- 10.6Create Connection to SQL Server
- 10.7Pipeline to ingest OnPrem SQL data to Lakehouse
- 10.8Scenario completed using Fabric data factory
- 10.9Dataflow Gen2 – Intro
- 10.10Creating DataFlow Gen2
- 10.11DataFlow Gen2 in Fabric vs Dataflows in ADF
- OneLake in Fabric22
- 11.1OneLake in Fabric
- 11.2Prerequisites to Create a shortcut
- 11.3Creating a shortcut in Files of Lakehouse
- 11.4Criteria to create shortcuts in table section
- 11.5Uploading required files and access for synapse
- 11.6Right way to create a shortcut in table’s section
- 11.7Creating delta file
- 11.8Creating shortcut in Table’s section
- 11.9Scenario – Creating shortcut with delta in a subfolder
- 11.10Scenario – Creating shortcut with only parquet format
- 11.11Scenario – Creating shortcut with delta in a subfolder
- 11.12Scenario – Creating shortcut with only parquet format
- 11.13Requirements to create shortcuts in Table and files section
- 11.14Updation Scenario 1 – Lakehouse to Datalake
- 11.15Updation Scenario 2 – Datalake to Lakehouse
- 11.16Shortcut deletion scenarios intro
- 11.17Deletion Scenario 1 – Delete in Lakehouse files
- 11.18Deletion Scenario 2 – Delete in ADLS
- 11.19Deletion Scenario 3 – Delete table data in Lakehouse
- 11.20Deletion Scenario 4 – Delete table data in ADLS
- 11.21Deletion Scenario 5 – Deleting entire shortcut
- 11.22Shortcut deleting scenario summary
- Fabric Synapse Data Engineering34
- 12.1Ingestion to Lakehouse status
- 12.2Spark in Microsoft Fabric
- 12.3Spark pools in Microsoft Fabric
- 12.4Spark pool node size
- 12.5Customizing Starter pools
- 12.6Creating a custom pool in Workspace
- 12.7Standard vs High Concurrency Sessions
- 12.8Changing Spark Settings to StarterPool
- 12.9Update in attaching Lakehouse to Notebook Option
- 12.10Understanding Notebooks UI
- 12.11Fabric Notebook basics
- 12.12MSSparkUtils – Intro
- 12.13MSSparkUtils – FS- Mount
- 12.14MSSparkUtils – FS – Other utils
- 12.15MSSparkUtils – FS – FastCp
- 12.16Creating Folders in Microsoft Fabric
- 12.17MSSparkUtils – Notebook Utils – Run exit
- 12.18MSSparkUtils – Notebook – RunMultiple
- 12.19Access ADLS data to Lakehouse – Intro
- 12.20Access ADLS using Entra ID
- 12.21Access ADLS using Entra ID
- 12.22Access ADLS using SP with keyvault
- 12.23Call Fabric notebook from Fabric pipeline
- 12.24Managed vs External table – Intro
- 12.25Create a Managed Table
- 12.26Create an External Table
- 12.27Shortcut Table is an external or managed table
- 12.28Data Wrangler in Fabric Notebook
- 12.29Environments in Microsoft Fabric
- 12.30Understanding V-order optimization
- 12.31Inspire us with your Thoughts
- 12.32Spark Job Definition
- 12.33What is a data mesh
- 12.34Creating domains in Fabric
- Synapse Migration to Fabric16
- 13.1Manual import from Synapse to Fabric
- 13.2Automated way to import and export notebooks – Intro
- 13.3Migrate all notebooks from Synapse to fabric
- 13.4Possibility of Migration of Pipelines to Fabric pipelines
- 13.5Ways to migrate ADLS data to Fabric OneLake
- 13.6Migrate ADLS data to Onelake using Storage Explorer
- 13.7Install Capacity Metrics App
- 13.8Understanding UI of Capacity Metrics App
- 13.9Capacity Units consumption
- 13.10Throttling vs Smoothing
- 13.11Throttling stage- Overage Protection Policy
- 13.12Other throttling stages
- 13.13Throttling stages Summary
- 13.14Overages in Fabric
- 13.15System Events in Fabric
- 13.16Matrix Visual
- Fabric Warehouse Synapse18
- 14.1Creating a Warehouse in Fabric
- 14.2Warehouse vs SQL Analytics Endpoint
- 14.3Creating a table and Limitations
- 14.4Ways to Load Data into Warehouse
- 14.5Loading Data using COPY INTO Command
- 14.6Loading Data using Pipeline to Warehouse
- 14.7Loading Data using DataFlow Gen2
- 14.8Data Sharing – Lakehouse & Warehouse
- 14.9Cross Database Ingestion in Warehouse
- 14.10Lakehouse vs Warehouse when to choose what
- 14.11Different Medallion Architectural patterns
- 14.12Update Lakehouse data from WH and vice versa
- 14.13SQL query as session in Fabric
- 14.14Zero Copy clone within and across Schema
- 14.15Time Travel in Warehouse
- 14.16Benefits & Limitations of Zero Copy clones
- 14.17Cloning single or multiple tables using UI
- 14.18Query Insights in Warehouse
- Fabric Access Control and Permission25
- 15.1Microsoft Fabric Structure
- 15.2Tenant Level permissions
- 15.3Capacity Level Permissions
- 15.4Creating new user in Entra ID
- 15.5Workspace roles- Workspace Administration
- 15.6Workspace roles – Data pipeline permissions
- 15.7Workspace Roles – Notebook, Spark jobs, etc
- 15.8Data Warehouse permissions – Intro
- 15.9Workspace Roles – Accessing shortcuts internal to fabric – Theory
- 15.10Workspace Roles – Accessing Shortcuts Internal to Fabric – Practical
- 15.11Workspace Roles – Accessing ADLS shortcuts – Theory
- 15.12Workspace Roles – Accessing ADLS shortcuts – Practical
- 15.13Workspace Roles – Lakehouse permissions
- 15.14Item level permissions – Intro
- 15.15Warehouse Sharing – No additional permissions
- 15.16Warehouse Sharing – ReadData permissions
- 15.17Warehouse Sharing – ReadAll permissions
- 15.18Warehouse Sharing – Build permissions
- 15.19Extend Microsoft Fabric Trail
- 15.20Lakehouse Sharing – All permissions
- 15.21Notebook – Item Sharing
- 15.22Manage OneLake data access
- 15.23Row-Level Security in Warehouse and SQL endpoint
- 15.24Dynamic Data Masking in Warehouse and SQL endpoint
- 15.25Column & Object level security in Warehouse and SQL endpoint
- End to End project using Fabric22
- 16.1Different Medallion architectures in Fabric
- 16.2Understanding domain and dataset information
- 16.3Project Architecture
- 16.4Creating workspace for project and review dataset
- 16.5Get data from Raw to landing – theory
- 16.6Raw to landing zone
- 16.7Different incremental loading patterns
- 16.8Incrementally ingest from Raw to landing zone
- 16.9Automate ingest from Raw to Landing using pipeline
- 16.10Ingest data from Landing to Bronze layer – Theory
- 16.11Understanding UPSERT logic for Landing to Bronze ingestion
- 16.12Landing to Bronze layer – practical
- 16.13Reading landing to bronze from next partition
- 16.14UPSERT scenario practical – Landing to bronze
- 16.15Bronze layer to Silver layer – Theory
- 16.16Understanding data transformations and UPSERT logic for Silver table
- 16.17Silver table – Data cleaning
- 16.18Silver Layer – data transformations
- 16.19Gold Layer – Facts and dimensions table – Theory
- 16.20Gold Layer – Facts and dimension tables – Practical
- 16.21Data modelling and creating a report
- 16.22Orchestrate end to end pipeline and execute it
- GIT Integration15
- 17.1Creating data sources for PROD
- 17.2Changes made to support Git integration
- 17.3Executing to check if changes were working
- 17.4Sign up with Azure DevOps account
- 17.5Connect Fabric workspace to Azure DevOps
- 17.6Git integration permissions and Limitations
- 17.7Locking main branch with branch policy
- 17.8Understanding Continuous Integration (CI) in Fabric
- 17.9Continuous Integration in Fabric Workspace
- 17.10Status of workspace created for feature branch
- 17.11Understanding Continuous Deployment in Fabric
- 17.12Deploying Fabric items from Dev to Prod
- 17.13Deployment rules to Change data sources of Prod workspace
- 17.14End to End execution in PROD
- 17.15Git integration for Power BI developers
- Apache Spark using SQL – Getting Started28
- 18.1Launching and using Spark SQL CLI
- 18.2Understanding Spark Metastore Warehouse Directory
- 18.3Managing Spark Metastore Databases
- 18.4Managing Spark Metastore Tables
- 18.5Retrieve Metadata of Spark Metastore Tables
- 18.6Role of Spark Metastore or Hive Metastore
- 18.7Example to working with Dataframe
- 18.8DataFrame with SparkSQL shell
- 18.9Spark DataFrame
- 18.10working with dataframe row
- 18.11working with Dataframe row and unit test
- 18.12working with Dataframe row and unstructure data
- 18.13working with dataframe column
- 18.14DataFrame partition and Executors
- 18.15Creating and using UDF
- 18.16Aggregation in DataFrame
- 18.17Windowing in dataframe
- 18.18-Grouping Aggregation in Dataframe
- 18.19DataFrame joins
- 18.20Internal Joins & shuffle
- 18.21Optimizing joins
- 18.22Implementing Bucket joins
- 18.23Spark Transformation and Actions
- 18.24Spark Jobs Stages & Task
- 18.25Understanding Execution plan
- 18.26Unit Testing in Spark
- 18.27Debuging Spark Driver and Executor
- 18.28Spark Application logs in cluster
- Assignment1
- Apache Spark using SQL – Pre-defined Function8
- 20.1Overview of Pre-defined Functions using Spark SQL
- 20.2Validating Functions using Spark SQL
- 20.3String Manipulation Functions using Spark SQL
- 20.4Date Manipulation Functions using Spark SQL
- 20.5Overview of Numeric Functions using Spark SQL
- 20.6Data Type Conversion using Spark SQL
- 20.7Dealing with Nulls using Spark SQL
- 20.8Using CASE and WHEN using Spark SQL
- Apache Spark using SQL – Basic Transformations7
- Apache Spark using SQL – Basic DDL and DML13
- 22.1Introduction to Basic DDL and DML using Spark SQL
- 22.2Create Spark Metastore Tables using Spark SQL
- 22.3Overview of Data Types for Spark Metastore Table Columns
- 22.4Adding Comments to Spark Metastore Tables using Spark SQL
- 22.5Loading Data Into Spark Metastore Tables using Spark SQL – Local
- 22.6Loading Data Into Spark Metastore Tables using Spark SQL – HDFS
- 22.7Loading Data into Spark Metastore Tables using Spark SQL – Append and Overwrite
- 22.8. Creating External Tables in Spark Metastore using Spark SQL
- 22.9Managed Spark Metastore Tables vs External Spark Metastore Tables
- 22.10Overview of Spark Metastore Table File Formats
- 22.11Drop Spark Metastore Tables and Databases
- 22.12Truncating Spark Metastore Tables
- 22.13Exercise – Managed Spark Metastore Tables
- Apache Spark using SQL – DML and Partitioning10
- 23.1Introduction to DML and Partitioning of Spark Metastore Tables using Spark SQL
- 23.2ntroduction to Partitioning of Spark Metastore Tables using Spark SQL
- 23.3Creating Spark Metastore Tables using Parquet File Format
- 23.4Load vs. Insert into Spark Metastore Tables using Spark SQL
- 23.5Inserting Data using Stage Spark Metastore Table using Spark SQL
- 23.6Creating Partitioned Spark Metastore Tables using Spark SQL
- 23.7Adding Partitions to Spark Metastore Tables using Spark SQL
- 23.8Loading Data into Partitioned Spark Metastore Tables using Spark SQL
- 23.9Inserting Data into Partitions of Spark Metastore Tables using Spark SQL
- 23.10Using Dynamic Partition Mode to insert data into Spark Metastore Tables
- Introduction and Basic Understanding47
- 24.1Creating a budget for project
- 24.2Creating an Azure Databricks Workspace
- 24.3Creating an Azure Datalake Storage Gen2
- 24.4Walkthough on databricks Workspace UI
- 24.5Introduction to Distributed Data Processing
- 24.6What is Azure Databricks
- 24.7Azure Databricks Architecture
- 24.8Cluster types and configuration
- 24.9Behind the scenes when creating cluster
- 24.10Sign up for Databricks Community Edition
- 24.11Understanding notebook and Markdown basics
- 24.12Notebook – Magic Commands
- 24.13DBUitls -File System Utilities
- 24.14DBUitls -Widget Utilities
- 24.15DBUtils – Notebook Utils
- 24.16Navigate the Workspace
- 24.17Databricks Runtimes
- 24.18Clusters Part 1
- 24.19Cluster Part 2
- 24.20Notebooks
- 24.21Libraries
- 24.22Repos for Git integration
- 24.23Databricks File System (DBFS)
- 24.24DBUTILS
- 24.25Widgets
- 24.26Workflows
- 24.27Metastore – Setup external Metastore
- 24.28Metastore – Setup external Metastore II
- 24.29Hands-on: How to navigate to the databricks service?
- 24.30Hands-on: How to create a workspace?
- 24.31Hands-on: How to create a spark cluster?
- 24.32Hands-on: How to create a notebook?
- 24.33Hands-on: How to create a table?
- 24.34Hands-on: How to delete a spark cluster?
- 24.35Hands-on: How to delete all resources in Azure Cloud?
- 24.36What is workspace?
- 24.37What is Resource Group?
- 24.38What is Databricks Runtime?
- 24.39What is notebook?
- 24.40Hands-on: Using notebook to visualize data
- 24.41Hands-on: Set up Apache Spark with Delta Lake
- 24.42Hands-on: Using python to operate delta lake
- 24.43Hands-on: Download and install postman
- 24.44Hands-on: Generate a token
- 24.45Hands-on: Create a spark cluster using REST API
- 24.46Hands-on: Delete a spark cluster using REST API
- 24.47Hands-on: Permanently delete a spark cluster using REST API
- Databricks Developer Tools with Hands on Session Example12
- 25.1Databricks Notebook, Rest API , Delta Lake What is Databricks Developer tools?
- 25.2Hands-on: Download and install python
- 25.3Hands-on: How to set up databricks cli?
- 25.4Hands-on: How to use databricks cli?
- 25.5Hands-on: How to use Databricks Utilities?
- 25.6Hands-on: Download and install JDK
- 25.7Hands-on: Download and install IntelliJ IDEA
- 25.8Hands-on: Using Databricks Utilities API Library in IDE
- 25.9Hands-on: How to use databricks in Azure Data Factory
- 25.10Hands-on: How to debug the notebook in pipeline?
- 25.11Hands-on: ETL with Azure Databricks
- 25.12Hands-on: How to debug ETL notebook in ETL pipeline?
- Databricks CLI and Rest API12
- Data Bricks CLI12
- Working with Databricks File System & Security10
- Delta Lake & Delta Table48
- 29.1Drawbacks of Azure Datalake
- 29.2What is delta lake
- 29.3Understanding Lakehouse Architecture
- 29.4Creating databricks workspace and ADLS for delta lake
- 29.5Accessing Datalake storage using service principal
- 29.6Sharing data for External Delta Table
- 29.7Reading Delta Table
- 29.8Delta Table Operations
- 29.9Drawbacks of ADLS – practical
- 29.10Medallion Lakehouse architecture
- 29.11Creating Delta Lake
- 29.12Understanding the delta format
- 29.13Understanding Transaction Log
- 29.14Creating delta tables using SQL Command
- 29.15Creating Delta table using PySpark Code
- 29.16Uploading files for next lectures
- 29.17Schema Enforcement
- 29.18Schema Evolution
- 29.19Delta Table Time Travel
- 29.20Time Travel and Versioning
- 29.21Vacuum Command
- 29.22Convert to Delta
- 29.23Understanding Optimize Command – Demo
- 29.24Optimize Command – Practical
- 29.25UPSERT using MERGE
- 29.26Lab : Create Delta Table (SQL & Python)
- 29.27Lab : Read & Write Delta Table
- 29.28. Lab : Convert a Parquet table to a Delta table
- 29.29Lab : Incremental ETL load
- 29.30Lab : Incremental ETL load (@version property)
- 29.31Convert Parquet to Delta
- 29.32Detailed of Delta Table Schema Validation
- 29.33Detailed of Delta Table Schema Evolution
- 29.34Look Inside Delta Table
- 29.35Delta Table Utilities and Optimization
- 29.36Processing XML, JSON , Delta Tables :
- 29.37Processing Nested XML file
- 29.38Processing Nested JSON file
- 29.39Delta Table – Time Travel and Vacuum
- 29.40UDF using Pyspark – hands on example
- 29.41Spark ingestion
- 29.42Disk partitioning
- 29.43Storage
- 29.44Predicate Pushdown
- 29.45Serialization
- 29.46Bucketing
- 29.47Zordering
- 29.48Adaptive Query Execution
- Unity Catalog18
- 30.1What is Unity Catalog
- 30.2Creating Access Connector for Databricks
- 30.3Creating Metastore in Unity Catalog
- 30.4Unity Catalog Object Model
- 30.5Roles in Unity Catalog
- 30.6Creating users in Azure Entra ID
- 30.7User and groups management Practical
- 30.8Cluster Policies
- 30.9What are cluster pools
- 30.10Creating Cluster Pool
- 30.11Creating a Dev Catalog
- 30.12Unity Catalog Privileges
- 30.13Understanding Unity Catalog
- 30.14Creating and accessing External location and storage credential
- 30.15Managed and External Tables in Unity Catalog
- 30.16Working with Securable Objects
- 30.17Setup Unity Catalog
- 30.18Unity Catalog User Provisioning
- Unity Catalog- Mini Project10
- Spark Structure Streaming & Autoloader In DataBricks14
- 32.1Spark Structured Streaming – basics
- 32.2Understanding micro batches and background query
- 32.3Supported Sources and Sinks
- 32.4WriteStream and checkpoints
- 32.5Community Edition Drop databases
- 32.6Understanding outputModes
- 32.7Understanding Triggers
- 32.8Autoloader – Intro
- 32.9Autoloader – Schema inference
- 32.10What is Autoloader & Demo
- 32.11Autoloader Schema Evolution
- 32.12How to build incremental pipeline using Autoloader
- 32.13Schema Evolution – Demo
- 32.14Schema Evolution – Practical
- Data Bricks incremental Ingestion tools8
- 33.1Architecture and Need for Incremental Ingestion
- 33.2Using Copy Into with Manual Schema Evolution
- 33.3Using Copy Into with Automatic Schema Evolution
- 33.4Streaming Ingestion with Manual Schema Evolution
- 33.5Streaming Ingestion with Automatic Schema Evolution
- 33.6Introduction to Databricks Autoloader
- 33.7Autoloader with Automatic Schema Evolution
- 33.8
- Notebook CI/CD via Azure Devops with Github7
- 34.1Integrate databricks notebook with Git providers like Github.
- 34.2Configure Continuous Integration – Artefacts to deployed in clusters.
- 34.3Configure Continuous delivery using datathirst templates.
- 34.4Run notebook on Azure Databricks via Jobs.
- 34.5Secure cluster via cluster policy and permission
- 34.6DataFactory LinkedServices
- 34.7Orchestrate notebook via DataFactory
- Project Details7
- Ingestion to Bronze5
- Silver & Gold Layer Transformation7