Data governance determines what assets exist, who can discover them, who can access them, what operations are allowed, where data is stored, and how it moves through the platform. This module begi...
A production data platform needs reliable dependencies, quality controls, source control, orchestration, monitoring, SQL serving, dashboards, and alerts. The original module uses Delta Live Tables...
A data stream is any data source that grows over time. It may be a directory receiving files, a Kafka topic, a CDC feed, or a Delta table receiving new commits. Spark Structured Streaming processe...
Extract, load, and transform (ELT) is a natural pattern for the lakehouse. Data is first loaded into inexpensive, scalable storage and is then transformed with the distributed processing capabiliti...
Delta Lake is the open storage framework that gives lakehouse tables reliable transactions, schema controls, version history, and data-management operations while keeping data in cloud object stora...
Databricks is a multi-cloud data and AI platform built around Apache Spark. It provides a common workspace for data engineering, analytics, business intelligence, streaming, machine learning, and g...
The best way to understand AWS is to build small practical labs. Reading about S3, EC2, Lambda, RDS, Glue, and API Gateway is useful, but the ideas become much clearer when we create resources, wir...
TTL in Managed Vnet IR in ADF focuses on the compute and network boundary used by Azure Data Factory to move data, dispatch activity execution, and connect to private or on-premises systems. This ...
Managed Virtual Integration Runtime in ADF focuses on the compute and network boundary used by Azure Data Factory to move data, dispatch activity execution, and connect to private or on-premises sy...
Deactivate an Activity in ADF is one lesson in the broader Azure Data Factory series, focused on turning ADF from a collection of screens into a practical data integration workflow. This post is p...