A data architect at a mid-sized distribution company described her team’s situation. Five separate storage systems, each serving a different team, each with its own access management, each producing slightly different numbers for the same metric. Finance trusted their ADLS container. Operations trusted their SQL Server. The BI team had built a third version in Synapse just to reconcile the two. Nobody fully trusted anyone else’s data.
Microsoft Fabric was built for exactly this problem. The mechanism it uses is OneLake, Microsoft Fabric’s built-in data lake that every Fabric workload writes to automatically, in open Delta Parquet format, with governance active from day one. Organizations adopting it aren’t just changing storage technology. They’re collapsing three to five fragmented systems into one platform, and the decisions made in the first 90 days determine whether that delivers.
In this article, we’ll cover how OneLake works, how it compares to ADLS Gen2 and Databricks, what production implementation looks like, and the mistakes that send projects sideways.
Key Takeaways OneLake is provisioned automatically with every Fabric tenant, storing all data in open Delta Parquet format. No infrastructure to configure. Shortcuts let you point to data in AWS S3, GCS, ADLS Gen2, Dataverse, or other OneLake workspaces. No copying required. Mirroring replicates live data from Azure SQL, Cosmos DB, Snowflake, SQL Server 2016+, PostgreSQL, and Oracle into OneLake via CDC. No ingestion pipeline needed. Medallion architecture (Bronze/Silver/Gold) maps directly to OneLake’s folder structure and is the recommended production pattern. Governance is powered by Purview, built into Fabric and active from day one. FLIP migration accelerators automate up to 80% of pipeline migration work (ADF, SSIS, Alteryx, Informatica to Fabric), compressing multi-year codebases into 90-day completions.
Planning a Fabric Data Lake Migration? Kanerika’s data architects can assess your current stack and map the fastest path to OneLake.
Book a Free Assessment
What Is OneLake: Microsoft Fabric’s Built-In Data Lake? Every Fabric tenant gets exactly one Fabric data lake instance through OneLake. Every workload (Lakehouses, Warehouses, Semantic Models, Eventhouses) writes there automatically. No storage accounts to create, no containers to configure. As of March 2026, Microsoft reports more than 28,000 customers, including 80% of the Fortune 500, use Fabric.
Built on ADLS Gen2: fully abstracted. Teams interact with Fabric objects, not the storage layer beneath.Delta Parquet by default: Data is open and readable outside Fabric by Spark, Python, or any Parquet-compatible tool. Switch compute engines without moving data.Full ACID support via Delta Lake : Every write is atomic, concurrent reads and writes stay consistent, and time travel lets teams query earlier table versions when a pipeline failure needs tracing.One instance per org: All teams share the same governed storage tier. No duplicating data across team-level storage accounts.V-Order optimization applied automatically: Fabric rewrites Delta Parquet files on write for faster analytical reads across all workloads. No configuration required.
OneLake Architecture: Medallion Layers, Shortcuts, and Mirroring OneLake Storage Hierarchy: Tenant, Workspace, and Item The Fabric data lake is organized into three levels. The top level is the tenant, with one OneLake instance per organization. Below that are workspaces, which act as containers typically mapped to business domains, teams, or departments. Below workspaces are items, the actual Fabric objects like Lakehouses, Warehouses, and Eventhouses, each with its own dedicated folder in OneLake’s storage structure.
OneLake supports the same ADLS Gen2 APIs and SDKs, so external tools like Azure Storage Explorer, Spark clusters, and custom Python scripts connect directly without learning a new interface. On Windows, OneLake File Explorer lets anyone browse, upload, and download files the same way they would in Office.
Data stays in the Azure region where the Fabric capacity is provisioned. Cross-geo movement is opt-in and auditable, enforced at the infrastructure level. A packaging company with separate workspaces for Finance, Operations, and Supply Chain still has one underlying infrastructure. A Power BI report in Finance can query data from Operations’ Lakehouse without copying anything or building a pipeline.
Medallion Architecture in Microsoft Fabric OneLake Most production Fabric implementations organize OneLake using the medallion pattern, three logical layers within each Lakehouse.
Bronze is raw ingested data, stored as-is in the Files section. No transformation, no quality enforcement. The record of what arrived.Silver is cleaned, conformed data stored as Delta tables. Business rules applied, duplicates removed, schema standardized.Gold is aggregated, business-ready data, consumed by Power BI reports, Warehouses, and Semantic Models.
Each layer has a clear owner. Bronze belongs to ingestion, Silver to data engineering, Gold to analytics. Skipping Silver, sending semi-raw data straight to reporting, is one of the most common failure patterns in enterprise lake implementations.
Delta Table Maintenance in OneLake Delta tables accumulate small files over time as pipelines append, merge, and update data. Without regular maintenance, read performance degrades and storage costs grow. Three operations keep tables healthy.
OPTIMIZE: Compacts small files into larger ones. Run daily on active tables, weekly on stable ones.VACUUM: Removes data files no longer referenced by the current table version. Reclaims storage from deleted and updated rows. Recommended weekly with a seven-day retention window.Z-ORDER: Co-locates related data within files for faster predicate evaluation on high-cardinality filter columns. Apply on initial load and after major schema changes.
OneLake Shortcuts: Access External Data Without Copying It A shortcut is a reference to data stored in other file locations. These locations can be within the same workspace or across different workspaces, within OneLake or external to OneLake like ADLS, S3, or Dataverse. No matter the location, shortcuts make files and folders look like you have them stored locally.
Supported sources include AWS S3, Google Cloud Storage, Azure Data Lake Storage Gen2, Dataverse, and other OneLake workspaces.
Because shortcuts prevent duplication, many organizations reduce storage costs compared to traditional architectures. The bigger shift is in migration economics. Organizations don’t have to move everything to Fabric to start using it. They can shortcut to existing stores on day one and migrate progressively.
OneLake Mirroring: Near-Real-Time Replication Without ETL Pipelines Shortcuts point to existing files without moving them. Mirroring is different. It uses Change Data Capture (CDC) to take an initial snapshot and then keep the data in sync in near real-time as transactions happen. The result lands in OneLake as Delta Parquet tables, queryable across every Fabric workload. Teams can join a mirrored Snowflake table with S3-shortcutted data in the same SQL query.
GA sources include PostgreSQL, Cosmos DB, SQL Server 2016–2022 and 2025, Azure SQL Database, Azure SQL Managed Instance, Oracle, and Snowflake . For operational databases that analytics teams have historically extracted on a batch schedule, Mirroring eliminates the pipeline and keeps the Fabric data lake current without custom ingestion code.
Shortcuts and Mirroring both give access to external data inside OneLake, but they solve different problems.
OneLake vs. ADLS Gen2: Key Differences for Enterprise Data Teams Most organizations considering Fabric are already running ADLS Gen2. The instinct is to treat the Fabric data lake as just another storage layer to evaluate. It’s not. The real question is what responsibilities shift from your engineering team to the platform.
Dimension ADLS Gen2 (Self-Managed) OneLake (Microsoft Fabric) Setup Manual: storage accounts, containers, ACLs Automatic: provisioned with every Fabric workspace Governance Manual Purview scanning setup Native Purview integration. Automatic lineage and classification Access control ACL-based, container-level Workspace roles + item-level; column/row-level security in Warehouse Data format Any (CSV, JSON, Parquet, etc.) Delta Parquet by default; Files section accepts any format ACID transactions Not natively supported Full Delta Lake ACID support Cross-workload access Requires copying or mounting All Fabric workloads read the same data natively External data N/A Shortcuts + Mirroring Compute Separate (Synapse, Databricks, ADF) Integrated (Spark, SQL, Pipelines, Notebooks) Desktop access Azure Storage Explorer OneLake File Explorer Cost model Per GB + per-transaction Capacity units (F2–F2048): compute + storage billed together
ADLS Gen2 is infrastructure your team manages. OneLake is a managed Fabric data lake service. Fabric handles the storage layer, your team focuses on data quality and analytics. Organizations migrating from Synapse follow the same logic. Fabric Lakehouses replace Synapse lake databases, Fabric Pipelines replace Synapse pipelines, everything under one governed tier.
OneLake vs. Databricks Lakehouse: How to Choose Choosing between a Fabric data lake and Databricks Lakehouse comes down to ecosystem fit, governance model, and where the operational complexity sits. Both Databricks and OneLake use Delta Lake as the underlying table format, they support medallion architecture, and both run Python and Spark. The difference comes down to ecosystem fit, governance model, and where the operational complexity sits.
Dimension OneLake (Fabric) Databricks Lakehouse Ecosystem Microsoft-first (Power BI, Purview, Teams, Azure) Multi-cloud (Azure, AWS, GCP) Governance Microsoft Purview (native, automatic) Unity Catalog (powerful, requires configuration) BI integration Power BI native, Direct Lake sub-second refresh Power BI via connector; higher latency SQL experience Fabric Warehouse + SQL endpoint on Lakehouse Databricks SQL Warehouse Streaming Eventstream + Eventhouse (KQL-based) Structured Streaming ML workloads Fabric ML experiments (maturing) MLflow-native, mature MLOps tooling External data Shortcuts + Mirroring External tables via Unity Catalog Multi-cloud Azure-primary (Shortcuts reach S3/GCS) Native multi-cloud Best for Microsoft-first enterprises, Power BI-heavy workloads Multi-cloud, ML-intensive, open-source-heavy workloads
On governance, Purview activates automatically in Fabric with zero manual setup. Unity Catalog is more configurable for complex Databricks deployments but requires deliberate architecture decisions up front. For teams already inside the Microsoft 365 perimeter, that difference matters.
Kanerika works with both platforms as a Microsoft Solutions Partner for Data & AI and a Databricks consulting partner. From production work across both, most mid-market enterprises deep in Microsoft 365 get faster ROI from Fabric. Organizations with intensive ML workloads or hard multi-cloud mandates often get more from Databricks.
OneLake Governance: How Microsoft Purview Works in Practice Governance is where many data lake implementations quietly fall apart. Organizations build the lake, fill it with data, then discover nobody knows what’s in it, nobody trusts it, and compliance teams are asking uncomfortable questions about where sensitive data lives.
The Fabric data lake addresses this at the architecture level. Four things activate automatically.
Lineage tracking: All datasets and storage within OneLake are automatically part of the Microsoft Purview governance boundary. Lineage, classification, and sensitivity labels are available out of the box.Sensitivity label propagation: A dataset labeled “Confidential Finance” carries that classification into every downstream report. No manual re-labeling.Granular access control: Permissions are manageable at the item, folder, row, and column level, enforced by all Fabric engines. One unified model that follows the data across Spark notebooks, Power BI reports, and Fabric data agents.Data residency enforcement: Data stays in the Azure region where the capacity is provisioned. Cross-geo replication is opt-in and auditable, enforceable at the infrastructure level. Application-layer controls don’t govern it.
Organizations that try to manage access through ADLS Gen2-style ACLs underneath Fabric end up with two conflicting security layers. It’s expensive to untangle. For regulated industries, Kanerika holds ISO 27001, ISO 27701, and SOC II Type II certifications, a meaningful baseline for clients with audit requirements tied to their analytics infrastructure.
One practical extension of this governance model: the Fabric Data Agent , now generally available, lets any user query governed OneLake data in plain English. Answers are sourced from the same permission-enforced data, so access controls apply at query time regardless of how the question is asked.
How to Implement a Microsoft Fabric Data Lake: The IMPACT Framework Enterprise implementation has a different shape than the sandbox tutorials Microsoft publishes. What determines whether an implementation delivers is the sequencing of decisions in the first 90 days.
Phase 1: Identify, Map, and Prioritize Your Data Estate Kanerika’s IMPACT Framework maps a structured assessment path (Identify, Map, Prioritize, Architect, Configure, Transition), with a defined deliverable at each phase before work moves forward. Most organizations discover here that 20–30% of existing pipelines either produce data nobody uses or duplicate work happening elsewhere. Retiring those before migration starts compresses total scope materially.
Phase 2: Architect the Workspace and Security Model This is where the technical blueprint gets locked before any provisioning happens. Workspace structure follows two rules: one workspace per data domain (Finance, Operations, Supply Chain), and separate workspaces for each environment (Dev, Test, Prod). Medallion layer definitions and the Shortcuts vs. Mirroring decision per data source are confirmed here too.
Security gets designed at this stage. Sensitivity labels, workspace ownership per domain, and row and column-level controls need to be in the architecture before data starts moving. Retrofitting these after the lake is populated costs far more than building them in from the start.
Phase 3: Configure the Environment and Migrate Pipelines Fabric capacity gets provisioned, Lakehouses and Warehouses set up, and Microsoft Purview activated before the first pipeline runs. FLIP then handles the migration from SSIS, ADF, Alteryx, and Informatica to Fabric equivalents with business logic and scheduling preserved.
FLIP provides purpose-built modules for each migration path.
FLIP for SSIS : extracts package logic (transformations, lookups, conditional splits, data conversions) and converts to Fabric Pipeline or Notebook equivalents with output validation.FLIP for ADF : migrates pipelines including linked services, datasets, triggers, and mapping data flows to Fabric Data Factory with scheduling logic preserved.FLIP for Alteryx : converts workflows to Fabric Notebooks, preserving blending logic and macro structures.FLIP for Informatica : handles PowerCenter mapping and session migration with business rule preservation.Across all modules, FLIP automates up to 80% of migration work, preserving business logic, transformation rules, scheduling, and data quality checks. Verified results across engagements: 50–60% reduction in migration effort, 40–60% faster data loading post-migration, 75% reduction in annual licensing costs, and 90-day completions for codebases that would take 12–24 months manually.
Delta table maintenance schedules get established here too. OPTIMIZE, VACUUM, and Z-ORDER cadences configured at setup prevent the performance degradation that teams typically discover six months into production.
Once the data layer is stable, Gold-layer Semantic Models connect directly to Power BI via Direct Lake mode for sub-second query performance. Microsoft Fabric Data Agents and Fabric Copilot can then query OneLake using natural language.
Phase 4: Transition to Production Migrated pipelines run in parallel against source environments until output parity is confirmed. Cutover happens once both sides match. A hypercare period follows where the team monitors for silent data quality issues that only surface under production load.
Knowledge transfer closes the engagement. Internal teams own the platform operationally before the implementation partner steps back.
6 Mistakes Enterprises Make with OneLake These patterns show up repeatedly in real implementations.
1. Managing the Fabric data lake with storage-level ACLs : OneLake has its own unified security model covering roles, row-level, and column-level controls. Layering ADLS Gen2-style ACLs underneath creates two conflicting access layers. A user removed from a Fabric workspace can still hold storage-level access.
2. Skipping medallion architecture : Sending raw data straight to reporting without a Silver layer produces unreliable results and creates significant retrofitting work later. An ungoverned OneLake becomes a data swamp as fast as any other lake technology.
3. Migrating pipelines manually : Hand-translating SSIS or ADF pipelines takes months per workload. FLIP automates the majority of it. 90-day completions versus 24-month manual efforts.
4. Activating Purview governance after the lake is populated : Retrofitting governance onto an ungoverned data lake is roughly three times harder than building it in from day one. Classification, lineage, and sensitivity labeling should be active before the first pipeline runs.
5. Re-ingesting data that already lives in ADLS Gen2 or S3 : OneLake aims to give you the most value possible from a single copy of data without data movement or duplication. Map which data needs physical ingestion and which can be shortcutted, before designing a single pipeline.
6. Over-partitioning Delta tables : Teams migrating from Hadoop environments often partition by high-cardinality columns like customer ID or transaction ID. On Delta Lake, this creates thousands of tiny files, slows every query, and negates partition pruning. Partition large fact tables by year and month at most. Tables under 100 million rows rarely need partitioning at all.
Is Microsoft Fabric Data Lake Right for Your Organization? The Fabric data lake is a strong fit when you’re Microsoft-first, already running Azure, Microsoft 365, and Power BI, and want a unified platform with one billing model, one governance layer, and one access control system. It’s especially compelling if you’re migrating from ADLS Gen2, ADF, Synapse Analytics, or SSIS, where FLIP accelerators compress timelines by months.
A Fabric data lake is a harder fit for organizations with hard multi-cloud mandates and no Azure commitment, ML-intensive workloads already optimized in Databricks Unity Catalog, or on-premises data residency requirements in regions where Azure capacity is limited.
Strong Fit for OneLake Harder Fit / Consider Alternatives Already running Azure, Microsoft 365, Power BI Hard multi-cloud mandate with no Azure commitment Migrating from ADLS Gen2, ADF, SSIS, Synapse, or Informatica Existing Databricks Unity Catalog investment with mature MLOps Power BI is the primary BI layer ML-intensive workloads needing mature MLflow/MLOps tooling Want unified governance under Microsoft Purview On-premises data residency in non-Azure-served regions Need to connect multi-cloud file stores without copying data Deep GCP-native workloads with no Azure footprint Compliance teams already in the Microsoft 365 security ecosystem Organization committed to open-source-only compute
OneLake in Production: How Kanerika Builds Enterprise Fabric Data Lakes Kanerika is a Microsoft Solutions Partner for Data and AI with Analytics Specialization, a Microsoft Fabric Featured Partner, and one of the earliest Microsoft Purview implementors globally. The firm has deployed Fabric data lake architectures across manufacturing, distribution, logistics, financial services, and healthcare, recognized by Everest Group as a Top Aspirant in Data and AI Services (North America, 2025). Across 100+ enterprise clients, retention sits at 98%.
On the migration side, FLIP handles the mechanical work: automated conversion of SSIS, ADF, Alteryx, and Informatica pipelines to Fabric equivalents, with business logic and scheduling preserved. FLIP is available on the Microsoft Azure Marketplace . On the governance side, KANGovern, KANComply, and KANGuard extend Microsoft Purview with structured enforcement across data ownership, regulatory compliance, and access monitoring. Both capabilities are productized, software IP deployed across engagements, not rebuilt from scratch each time.
Qualifying organizations may also be eligible for Microsoft Azure Migrate and Modernize funding to offset project costs. Kanerika can assess eligibility as part of an initial architecture review.
See How FLIP Cuts Migration Time 50–60% less effort. 90-day completions. Available on Azure Marketplace.
Learn about FLIP
The Challenge Data from fleet management, service operations, and inventory control sat in separate systems (SQL Server, SharePoint, and others) with no unified view KPIs were unreliable and reporting was manual Branch managers had no real-time visibility into their own operations
The Solution Built a Microsoft Fabric-based Data Lakehouse with OneLake as the central storage tier Ingested data from SQL Server and SharePoint, conformed through medallion architecture Deployed a 1:3:10 Power BI reporting framework: one executive dashboard, three managerial scorecards, and ten operational reports across parts, service, and branch operations Role-based dashboards gave each function a view tailored to their decisions
The Results 90% data accuracy across unified reporting 85% improvement in operational visibility 8–10% reduction in inventory costs 3–5% improvement in labor efficiency. 5%+ improvement in customer ratings
“The ability to bring in many data sources and shape a strong analytics setup will be a game changer for SSMH. Kanerika’s flexibility in matching Microsoft Fabric with our goals ensures we are building a system that will lead to even better results across our operations.”
— Delano Gordon, CIO, Southern States Material Handling
Wrapping Up The best data infrastructure is the kind engineers stop thinking about. They focus on data quality and business outcomes, not storage accounts or access management spreadsheets.
What trips organizations up is the implementation choices made in the first 90 days. Medallion architecture or not. Governance from the start or retrofitted. Accelerators or manual migration. Those decisions shape whether a Fabric data lake delivers or becomes another silo with a new name.
Schedule a consultation with Kanerika to map out your OneLake architecture, assess migration scope, and explore whether your project qualifies for Microsoft Azure Migrate and Modernize funding .
Ready to Build Your Fabric Data Lake? Talk to Kanerika’s team about architecture, migration, and governance.
Schedule a Consultation
FAQs What is the Microsoft Fabric data lake called? Microsoft Fabric’s built-in data lake is called OneLake. Every Fabric tenant gets one instance automatically, with no storage accounts to configure. All workloads, including Lakehouses, Warehouses, and Eventhouses, write data there automatically in open Delta Parquet format. It’s designed to be the single storage tier for your entire analytics estate, shared across teams without data duplication.
Is OneLake the same as Azure Data Lake Storage Gen2? OneLake is built on top of ADLS Gen2 but operates as a fully managed, abstracted service. Engineers don’t create storage accounts or configure containers. Fabric handles that layer. OneLake supports the same ADLS Gen2 APIs and SDKs, so external tools like Azure Databricks and Storage Explorer connect directly without Fabric compute.
What is the difference between OneLake Shortcuts and Mirroring? Shortcuts create a logical pointer to data stored elsewhere, nothing moves. Mirroring uses Change Data Capture to take an initial snapshot and keep data in sync in near real-time as transactions happen. Use Shortcuts for existing file stores like S3 or ADLS Gen2. Use Mirroring for live transactional databases like Azure SQL, Snowflake, Cosmos DB, PostgreSQL, and Oracle.
What is medallion architecture in Microsoft Fabric? Medallion architecture organizes OneLake data into three layers. Bronze for raw ingestion, Silver for cleaned and conformed data, and Gold for aggregated, business-ready output. Each layer maps to folder and table structures within a Fabric Lakehouse. It gives teams clear data quality ownership at each stage and prevents semi-raw data from reaching reporting before it has been validated.
How does governance work in OneLake? Governance is powered by Microsoft Purview, which is built into Fabric and activates automatically. Permissions, sensitivity labels, and lineage tracking are inherited across all Fabric items without manual setup. Access is manageable at the item, folder, row, and column level, and those controls are enforced across every engine, including Spark notebooks, Power BI reports, and Fabric Data Agents.
What is the difference between a Fabric Lakehouse and a Fabric Data Warehouse? A Lakehouse stores structured and unstructured data as Delta tables in OneLake, queryable via a T-SQL endpoint or Spark notebooks, suited for data engineering, ELT, and exploratory analytics. A Fabric Data Warehouse is optimized for structured, relational workloads with full T-SQL support including DDL, DML, and stored procedures. Both sit on the same OneLake storage tier.
How long does migrating to a Fabric data lake typically take? Using Kanerika’s FLIP migration accelerators covering ADF, SSIS, Alteryx, and Informatica to Fabric, complex multi-year codebases have been completed in 90 days. Without accelerators, similar migrations typically take 12 to 24 months of manual effort. FLIP automates up to 80% of migration work while preserving business logic, transformation rules, scheduling configurations, and data quality checks.
When is a Fabric data lake the wrong choice? OneLake is a harder fit for organizations with hard multi-cloud mandates and no Azure commitment, teams with ML-intensive workloads already optimized in Databricks Unity Catalog, or environments with on-premises data residency requirements in regions where Azure capacity is limited. For GCP-native workloads or open-source-only compute requirements, Databricks or a cloud-native lake may be a better starting point.