For years, choosing Snowflake meant choosing its proprietary storage format. Your data lived inside Snowflake, and only Snowflake could read it. That trade was easy to accept when Snowflake was the only engine your team used.
Then the data lake grew up. Teams now run Apache Spark for machine learning, Trino for ad hoc queries, and Snowflake for business intelligence, often over the same datasets. Copying data between these engines is slow, expensive, and a governance headache.
This is the promise of the data lakehouse : one open storage layer that serves both lake and warehouse workloads. Iceberg is the table format that makes that promise real on Snowflake.
Apache Iceberg tables in Snowflake solve that copy problem. They let Snowflake read and write an open table format that other engines can use at the same time, with the files sitting in your own cloud storage. You get Snowflake performance on data that is no longer locked inside Snowflake.
This guide explains what Snowflake Iceberg tables actually are, how managed and externally managed Iceberg differ, how the catalog layer ties everything together, and when an Iceberg table beats a native Snowflake table. We close with a practical migration path and the trade-offs that matter in production.
Watch on YouTube
Why Do Teams Fear Data Platform Migration
A quick look at how Kanerika turns scattered data into governed, decision-ready analytics.
Key Takeaways Snowflake Iceberg tables let Snowflake read and write the open Apache Iceberg format, with the data files sitting in your own cloud storage instead of inside Snowflake. A Snowflake-managed Iceberg table (Snowflake as the catalog) gives full read and write access plus platform features like cloning, replication, and automatic compaction. An externally managed Iceberg table uses an external catalog such as AWS Glue or Snowflake Open Catalog, trading some platform features for a neutral source of truth across engines. The catalog layer is the architectural decision: it holds the current metadata pointer that every engine must agree on, so pick the catalog the most engines can share. Use native Snowflake tables for Snowflake-only hot paths, Snowflake-managed Iceberg when you need speed plus openness, and externally managed Iceberg when many engines share one lake. Kanerika, a Snowflake Select Tier Partner, has used governed Snowflake migrations to cut manual reconciliation effort by 60% for a distributed global enterprise. What Are Apache Iceberg Tables in Snowflake? Apache Iceberg is an open table format originally built at Netflix and now governed by the Apache Software Foundation. It adds a metadata layer on top of plain data files so that a folder of files behaves like a real database table.
That metadata layer is what makes Iceberg useful. According to the Apache Iceberg table specification , it brings ACID transactions, full schema evolution, hidden partitioning, and point-in-time snapshots to files in a data lake.
If you are new to the platform itself, our overview of the Snowflake data warehouse explains how storage and compute separate, which is the same principle that lets Iceberg keep files outside the engine.
A Snowflake Iceberg table is an Iceberg table that Snowflake can query and, in most cases, write to directly. Per the Snowflake Iceberg documentation , these tables combine the performance and SQL semantics of regular Snowflake tables with external cloud storage that you manage.
The data and metadata files always live in your own object storage, such as Amazon S3, Azure Storage, or Google Cloud Storage. Snowflake reaches that storage through a named object called an external volume, and the table data uses the Apache Parquet file format.
This is a different idea from Snowflake’s older external tables. An external table exposes raw files as read-only rows with no real transaction support. An Iceberg table is a first-class, ACID-compliant table you can update, and that other engines can read concurrently. Many of these concepts build on the broader platform, which we cover in our guide to Snowflake architecture and the wider cloud data warehouse model.
The result is interoperability without copies. The same Iceberg table can be queried by Snowflake for dashboards and by Spark for feature engineering, because both engines speak the same open format and read the same files.
How Snowflake Iceberg Tables Work Under the Hood An Iceberg table in Snowflake has three moving parts: the storage that holds your files, the catalog that tracks the current table state, and the compute that runs your queries. Understanding these three pieces is the key to every decision that follows.
Storage: your cloud, your files Iceberg tables store their data and metadata in an external cloud storage location that you own and control. Snowflake does not put these files in its internal storage by default, so you handle backup, recovery, and lifecycle rules for that bucket.
Snowflake connects to that bucket through an external volume, an account-level object that holds the identity and access management details for the location. One external volume can serve many Iceberg tables, which keeps setup tidy.
Because the files live in your storage, Iceberg tables incur no Snowflake storage charges when you manage the bucket yourself. Your cloud provider bills you for that storage directly instead. This sits on top of whatever you already use for your data lake .
Catalog: the pointer that finds your table The catalog is the part most newcomers miss. An Iceberg catalog stores the current metadata pointer for each table, mapping a table name to the location of its latest metadata file, and it performs the atomic swap that commits a new table version.
Snowflake supports two broad catalog choices. You can let Snowflake itself act as the Iceberg catalog, or you can connect Snowflake to an external catalog such as AWS Glue or Snowflake Open Catalog through a catalog integration.
This single choice, who owns the catalog, decides almost everything else: whether Snowflake can write to the table, whether it manages compaction, and which platform features you get. We unpack that decision in detail in the next section.
Compute: Snowflake virtual warehouses Queries against Iceberg tables run on the same Snowflake virtual warehouses that power native tables. Snowflake bills you for that compute and for cloud services usage, exactly as it would for any other query.
Snowflake also uses a snapshot-based query model, where a snapshot captures the state of the table at a point in time. That snapshot model is what powers Snowflake Time Travel and reproducible queries on Iceberg data.
Managed vs Externally Managed Iceberg Tables Snowflake supports two ways to run an Iceberg table, and they behave very differently. The split comes down to which system owns the catalog, which in turn controls what Snowflake is allowed to do.
Snowflake-managed Iceberg (Snowflake as the catalog) When Snowflake acts as the Iceberg catalog, you get a Snowflake-managed Iceberg table. This gives you full read and write access plus the full Snowflake platform, with performance close to a native table.
Snowflake handles lifecycle maintenance for these tables, including compaction that keeps small files from piling up. You can also use familiar features such as cloning, replication, and clustering keys, which externally managed tables do not support.
With a Snowflake-managed table you still choose where the files live. They can sit in your own external volume, or, as of 2026, in Snowflake-managed storage where Snowflake stores and optimizes the files for you. These tables also work with Snowflake Snowpark for Python-based processing.
Kanerika Service
Snowflake Consulting and Implementation
Kanerika is a Snowflake Select Tier Partner that designs, migrates, and operates Snowflake and open-lakehouse environments end to end, from architecture and catalog design to cost governance.
Explore Snowflake Services Externally managed Iceberg (external catalog) When an external system such as AWS Glue or Snowflake Open Catalog owns the catalog, you get an externally managed Iceberg table. Snowflake uses a catalog integration to read the table’s metadata and schema from that external source.
Historically these tables were read-only inside Snowflake. Snowflake now supports writes to externally managed tables that use a remote Iceberg REST catalog, though it does not take over lifecycle management, so you handle compaction and retention with your own tools.
This pattern shines when many engines share one lake and a neutral catalog must be the source of truth. Snowflake becomes one well-behaved participant among Spark, Trino, and others, rather than the owner. It fits naturally into a data mesh where domains own their own data products.
You can convert between the two The choice is not permanent. If you start with an externally managed table for fast onboarding and later make Snowflake your primary engine, you can convert the table to use Snowflake as the catalog with a single ALTER ICEBERG TABLE command.
That flexibility is worth planning around. Many teams onboard external data quickly as read-mostly tables, then convert the high-value ones to managed once usage patterns settle. Our team applies this same staged thinking to broader Snowflake data engineering work.
The Catalog Layer: Open Catalog, Polaris, and AWS Glue The catalog is where open interoperability is won or lost, so it deserves its own section. The catalog is the only component that knows which metadata file is the current truth for a table, which is why every engine must agree on it.
Snowflake Open Catalog and Polaris Snowflake Open Catalog is Snowflake’s managed catalog service built on Apache Polaris , the open-source Iceberg REST catalog that Snowflake contributed to the community. It lets any Iceberg-compatible engine discover and query tables through a standard REST interface.
You can sync a Snowflake-managed table to Open Catalog so that third-party engines can read it. You can also point Snowflake at tables that Open Catalog already manages, which makes it a neutral hub for a multi-engine lakehouse.
AWS Glue and other REST catalogs If your lake already runs on AWS Glue, Snowflake connects to it with a catalog integration and reads those Iceberg tables in place. The same pattern works for any catalog that speaks the Iceberg REST protocol.
Snowflake adds a convenience layer on top of this called a catalog-linked database. The database automatically discovers and stays in sync with the namespaces and tables in your remote catalog, so new tables appear in Snowflake without manual registration. Strong metadata management is what keeps that sync trustworthy.
Bidirectional access with Unity Catalog Snowflake also supports bidirectional access with Databricks Unity Catalog, so an Iceberg table written by one platform can be read by the other. This matters for teams that run both stacks, a pattern we explore in our Snowflake vs Databricks comparison and alongside work like Databricks vector search .
The catalog decision is genuinely architectural. Pick the catalog that the most engines can agree on, then let Snowflake connect to it rather than forcing every engine to adopt a Snowflake-only catalog. If you are still evaluating, our roundup of data catalog tools is a useful starting point.
Listen on Spotify
From Data to Decisions: AI-Powered Analytics in 2025
The table below maps each catalog option to the source it connects and the access you get, so you can match a catalog to your existing lake.
Catalog option Connects Snowflake access Best for Snowflake as catalog Snowflake-managed tables Full read and write Snowflake is the primary engine Snowflake Open Catalog (Polaris) Iceberg REST Read and REST writes A neutral multi-engine hub AWS Glue Data Catalog Iceberg REST Read and REST writes An existing AWS data lake Object store (metadata files) Iceberg or Delta files Read, refresh-based Files with no live catalog
A Simple CREATE ICEBERG TABLE Example The fastest way to make this concrete is to create a Snowflake-managed Iceberg table. The steps are: set up an external volume, then create the table with Snowflake as the catalog.
First, create the external volume that points Snowflake at your storage bucket. This is a one-time account-level setup that many tables can share.
CREATE OR REPLACE EXTERNAL VOLUME iceberg_vol
STORAGE_LOCATIONS = (
(
NAME = 'my-iceberg-data'
STORAGE_PROVIDER = 'S3'
STORAGE_BASE_URL = 's3://my-bucket/iceberg/'
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789:role/snowflake-role'
)
);Next, create the Iceberg table itself with Snowflake named as the catalog. Snowflake now owns the metadata, so you get full read and write access.
CREATE OR REPLACE ICEBERG TABLE customer_iceberg (
c_custkey INTEGER,
c_name STRING,
c_address STRING,
c_acctbal NUMBER(12,2)
)
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'iceberg_vol'
BASE_LOCATION = 'customer/';From here you load and query the table with ordinary SQL, including INSERT, UPDATE, and DELETE. Loading typically follows an ELT pattern , landing raw data first and transforming it in place. To turn an existing externally managed table into a managed one, a single command does the job.
ALTER ICEBERG TABLE customer_iceberg CONVERT TO MANAGED
BASE_LOCATION = 'customer/';That is the whole arc: one external volume, one table definition, and full SQL access to open data. The same table is now readable by Spark or Trino through the catalog.
Iceberg Tables vs Native Snowflake Tables: When to Use Which Iceberg tables are powerful, but they are not a replacement for native Snowflake tables in every case. The right answer depends on who owns the data and how many engines need to read it.
Native Snowflake tables remain the best choice when Snowflake is your only engine and raw performance is the priority. They support every Snowflake feature and need no external storage to manage. The same trade-off shows up across data warehouse concepts in general.
Iceberg tables earn their place when you need open interoperability, want to avoid vendor lock-in on the storage layer, or already have a data lake that several engines query. The cost is a bit more setup and, for externally managed tables, fewer platform features. Iceberg has quickly become a core building block of the modern data stack .
Performance is closer than many expect. Independent testing by Prequel found native Snowflake tables still edged out Iceberg, but the margin was small for Snowflake-managed Iceberg and much wider for purely external tables. In short, managed Iceberg gives you most of native speed with open access.
The practical rule: use native tables for Snowflake-only hot paths, use Snowflake-managed Iceberg when you need both speed and openness, and use externally managed Iceberg when a shared catalog must stay neutral across engines. This same logic guides how we compare options like Snowflake dynamic tables and Snowflake hybrid tables for different workloads.
Talk to Kanerika
Not Sure Which Table Type Fits Your Workloads?
Kanerika maps your tables, engines, and storage, then recommends native, managed Iceberg, or externally managed Iceberg per workload. A short working session turns the choice into a plan.
Schedule a Demo → Performance and Cost Trade-offs You Should Plan For Iceberg changes your cost model in ways that are easy to miss until the bill arrives. Because the files sit in your storage, you trade some Snowflake convenience for direct control, and that control comes with new line items.
Where the costs move You still pay Snowflake for compute and cloud services on every query. You no longer pay Snowflake for storage on tables you manage in your own bucket, but your cloud provider bills you for that storage instead.
The cost most teams forget is data egress. If your external volume sits in a different region or cloud from your Snowflake account, querying the table triggers cross-region transfer charges, which Snowflake documents under cross-cloud and cross-region support. Disciplined Snowflake cost optimization keeps these surprises small.
Performance levers File layout drives Iceberg performance more than anything else. Many small files slow queries, so regular compaction into larger files is essential, and Snowflake handles this automatically for managed tables. The same care applies to any data pipeline architecture feeding the lake.
For externally managed tables, you own that maintenance. You must schedule compaction and clean up old delete files with your own engine, or read performance degrades over time. We treat this as a core part of every data integration design.
Governance is the final lever, and the one with the biggest downstream cost. Open data is easy to copy and easy to misuse, so policy enforcement and lineage need to be designed in, not bolted on, a point we stress in our data governance practice.
Limitations and Common Pitfalls Snowflake Iceberg tables are production-ready, but they carry real constraints that you should know before you commit. Most pitfalls trace back to one fact: externally managed tables give Snowflake less control, so they support fewer features.
Here are the limitations that most often surprise teams in their first project:
Iceberg tables support only Apache Parquet data files, so other formats must be converted first. Externally managed tables do not support cloning, clustering, or replication, and only insert-only streams work on them. Several Snowflake features, including Fail-safe, hybrid tables, and standard schema evolution, are not available on Iceberg tables. For tables you manage yourself, Snowflake does not delete orphan files, so storage can drift above what the table actually uses. Cross-region external volumes add egress costs that can quietly dominate the bill if you ignore region placement. None of these are dealbreakers, but each one is a design decision. The teams that struggle are the ones that treat an Iceberg table exactly like a native table and discover the gaps in production. A clean data migration plan and a solid data governance framework head off most of them.
To make the choice concrete, the matrix below maps common workloads to the table type we would recommend and the main reason behind it.
Workload Recommended table Why Snowflake-only BI dashboards Native Snowflake table Best raw speed, every feature available Shared data read by Spark and Snowflake Snowflake-managed Iceberg Open format with near-native speed Lake owned by a neutral catalog Externally managed Iceberg No single engine owns the truth Onboard external data fast, then optimize External, convert to managed later Quick start, upgrade when usage settles
How Kanerika Builds Open Lakehouses on Snowflake Iceberg Kanerika is a Snowflake Select Tier Partner that designs, migrates, and operates Snowflake environments end to end. We treat Iceberg not as a feature to switch on, but as an architecture decision that shapes cost, governance, and how many engines your data can serve. Our delivery follows a clear, staged path.
Assess. We start by mapping your tables, engines, and storage, then decide managed or externally managed Iceberg per workload rather than applying one rule everywhere. Hot Snowflake-only paths stay native, shared datasets move to Iceberg, and the catalog choice is made deliberately so the most engines can agree on it.
Design and build. We stand up the external volumes, catalog integrations, and catalog-linked databases, then wire Snowflake Open Catalog or AWS Glue so Spark, Trino, and Snowflake read the same tables without copies. Our Snowflake consulting and implementation practice does this with repeatable patterns, not one-off scripts.
Case Study
60% Less Manual Reconciliation via Snowflake Migration
A global technology consulting firm replaced manual reconciliation across regional systems with governed, centralized Snowflake data, cutting reconciliation effort by 60% and giving distributed teams real-time operational visibility.
Read the Case Study → Govern and enable. Open data demands stronger governance, so we design masking, row access policies, and lineage into the lakehouse from day one, then set up compaction and cost guardrails so unit costs stay flat as adoption grows. Our accelerator FLIP handles the integration and quality plumbing that keeps these pipelines reliable.
This approach is grounded in real delivery. For a global technology consulting firm, Kanerika replaced manual reconciliation across regional systems with governed, centralized Snowflake data, cutting reconciliation effort by 60% and giving distributed teams real-time operational visibility. As a CMMI Level 3, ISO 27001, and SOC II Type II assessed firm, we build these platforms to enterprise governance standards.
The pitfalls we watch for most: choosing externally managed tables when the team really needs Snowflake’s platform features, ignoring region placement until egress costs spike, and skipping compaction on tables Snowflake does not maintain. Catching these early is the difference between an open lakehouse that scales and one that quietly bleeds cost.
Frequently Asked Questions What are Snowflake Iceberg tables? Snowflake Iceberg tables are tables that use the open Apache Iceberg table format, with their data and metadata files stored in your own cloud storage such as Amazon S3, Azure, or Google Cloud. Snowflake queries and, in most cases, writes to them directly, while other engines like Spark and Trino can read the same files at the same time. This gives you Snowflake performance on data that is not locked inside Snowflake’s proprietary format.
What is the difference between an external table and an Iceberg table in Snowflake? A Snowflake external table exposes raw files in cloud storage as read-only rows with no transaction support and limited performance. An Iceberg table is a first-class, ACID-compliant table built on the open Iceberg format, so it supports inserts, updates, deletes, schema evolution, and snapshots. In short, external tables are a read-only window onto files, while Iceberg tables are full tables that happen to live in open storage.
What is the difference between managed and externally managed Iceberg tables? A Snowflake-managed Iceberg table uses Snowflake as the Iceberg catalog, which gives full read and write access plus platform features such as cloning, replication, clustering, and automatic compaction. An externally managed table uses an external catalog like AWS Glue or Snowflake Open Catalog, so Snowflake reads the metadata from there and has fewer platform features. You can convert an externally managed table to a Snowflake-managed one with a single ALTER ICEBERG TABLE command.
Are Snowflake Iceberg tables slower than native Snowflake tables? Native Snowflake tables are still slightly faster, but the gap is small for Snowflake-managed Iceberg tables and much larger for purely external tables. Independent testing has shown Snowflake-managed Iceberg performs close to native, because Snowflake handles compaction and optimization. The trade-off is that Iceberg gives you open interoperability and no storage lock-in, which native tables do not.
What catalogs can Snowflake use for Iceberg tables? Snowflake can act as the Iceberg catalog itself, or connect to an external catalog through a catalog integration. Supported external catalogs include Snowflake Open Catalog, which is built on Apache Polaris, AWS Glue, any catalog that speaks the Iceberg REST protocol, and Iceberg or Delta metadata files in object storage. Snowflake also supports bidirectional access with Databricks Unity Catalog.
Do Iceberg tables cost more than native Snowflake tables? The cost model shifts rather than simply rising. You still pay Snowflake for compute and cloud services on every query, but you no longer pay Snowflake for storage on tables you keep in your own bucket, because your cloud provider bills that directly. The cost most teams forget is cross-region data egress, which appears when your external storage sits in a different region or cloud from your Snowflake account.
What are the main limitations of Snowflake Iceberg tables? Iceberg tables support only Apache Parquet data files, and externally managed tables do not support cloning, clustering, replication, or standard streams. Several Snowflake features, including Fail-safe, hybrid tables, and standard schema evolution, are not available on Iceberg tables. For tables you manage yourself, Snowflake does not delete orphan files, so storage can drift above what the table actually uses unless you maintain it.
How do you create an Iceberg table in Snowflake? First create an external volume that points Snowflake at your cloud storage bucket, which is a one-time account-level setup that many tables can share. Then run CREATE ICEBERG TABLE with CATALOG set to SNOWFLAKE for a managed table, naming the external volume and a base location. From there you load and query the table with ordinary SQL, and you can convert an externally managed table to managed later with ALTER ICEBERG TABLE CONVERT TO MANAGED.