Home
Babin Joshi
Cancel

Data Governance with Databricks Unity Catalog

Data governance answers four practical questions: what data exists, who can use it, what they can do with it, and where it came from. Unity Catalog provides a centralized governance layer for Data...

Production Pipelines, Workflows, and Databricks SQL

Moving a notebook into production requires more than scheduling it. A production pipeline needs explicit dependencies, data-quality rules, repeatable configuration, monitoring, notifications, and a...

Incremental Data Processing with Structured Streaming and Auto Loader

A data stream is any source that grows over time: new files arriving in cloud storage, events published to Kafka, change-data-capture records, or rows appended to a Delta table. Spark Structured S...

ELT with Spark SQL and PySpark in Databricks

Databricks supports an ELT workflow in which raw data is loaded into the lakehouse and transformed using Spark SQL or PySpark. Engineers can query files directly, register external data, create Del...

Databricks Lakehouse and Delta Lake Fundamentals

Delta Lake is the storage layer that gives lakehouse tables reliable transactions, schema controls, version history, and efficient data management while retaining data in cloud object storage. It ...

Introduction to Databricks, Lakehouse Architecture, and Compute

Databricks is a multi-cloud data and AI platform built around Apache Spark and the lakehouse architecture. It brings data engineering, analytics, machine learning, and governance into one environme...

Practical AWS Lab Sessions - S3, EC2, Glue, Lambda, RDS and API Gateway

The best way to understand AWS is to build small practical labs. Reading about S3, EC2, Lambda, RDS, Glue, and API Gateway is useful, but the ideas become much clearer when we create resources, wir...

Amazon Glue - Serverless ETL and Data Catalog Fundamentals

AWS Glue is a serverless data integration service for discovering, preparing, transforming, and moving data. It is commonly used in data lake pipelines where raw files land in Amazon S3, metadata i...

AWS Lambda - Serverless Compute Fundamentals

AWS Lambda is a serverless compute service that lets us run code without provisioning or managing servers. We write a function, configure how it should be invoked, give it permissions through IAM, ...

Amazon RDS - Managed Relational Databases in AWS

Amazon RDS, short for Amazon Relational Database Service, is AWS’s managed service for running relational databases in the cloud. Instead of installing database software on an EC2 instance and mana...