“Every company is now an AI company. The question is whether every worker will be an AI worker.” — Satya Nadella
But before any of that AI promise becomes real, someone has to build the pipes. And anyone who has spent time building data warehouses knows that a surprising amount of that work isn’t sophisticated. It’s repetitive. Writing MAX(ID) + 1 for the hundredth time. Adding ROW_NUMBER() logic to yet another pipeline. Babysitting surrogate key generation so nothing breaks when two records land at the same time.
Microsoft Fabric is built to modernize that entire layer, bringing data engineering, integration, analytics, and business intelligence into one unified platform. And with its Fabric Data Warehouse, the focus is squarely on making large-scale analytical workloads simpler to build and maintain.
One of its recent additions quietly solves one of those everyday frustrations: IDENTITY columns, a feature that automatically generates unique numeric keys whenever a new row is inserted, no custom logic required. In this post, we’ll walk through how they work, how they differ from SQL Server, and what you need to know before using them.
Key Takeaways
- IDENTITY Columns automatically generate unique values: Fabric Data Warehouse can automatically create a unique numeric identifier whenever a new row is inserted into a table.
- Useful for creating surrogate keys in warehouse tables: Identity columns are commonly used in dimensional models where tables require stable numeric identifiers to maintain relationships between fact and tables.
- Designed for Fabric’s distributed architecture: Unlike traditional SQL Server identity columns, Fabric generates identity values across multiple compute nodes to support parallel processing and scalable workloads.
- Identity values may not always be sequential: Because values are generated across different nodes, identity numbers may contain gaps or appear out of order. This behavior is expected in distributed data systems.
- Reduces manual SQL logic in ETL pipelines: Identity columns eliminate the need for methods like MAX(ID) + 1 or ROW_NUMBER() to generate keys, making data pipelines simpler and easier to maintain.
What Are IDENTITY Columns in Fabric Data Warehouse?
IDENTITY columns in Fabric Data Warehouse automatically generate a unique numeric value whenever a new row is inserted into a table. These values are commonly used as surrogate keys to uniquely identify each record without requiring manual ID generation.
By letting the system assign identifiers during data insertion, identity columns simplify table design and remove the need for custom key logic. They also help maintain reliable relationships between tables, especially in dimensional models where fact and dimension tables depend on consistent and stable keys.
Understanding Natural Keys vs Surrogate Keys
In data warehouse design, keys are used to uniquely identify records and maintain relationships between tables. Two common types of keys are natural keys and surrogate keys. Identity columns are mainly used to generate surrogate keys automatically.
| Feature | Natural Key | Surrogate Key |
| Definition | A key derived from real business data | A system-generated key with no business meaning |
| Source | Comes from the source system or business data | Generated by the database or data warehouse |
| Business Meaning | Has meaning in the business context | No business meaning |
| Stability | May change if business data changes | Usually stable once created |
| Data Type | Can be text, numbers, or mixed formats | Typically numeric |
| Example | Email ID, Product Code, Order Number | CustomerID, ProductID, OrderID |
| Usage in Warehouse | Sometimes used but may cause issues if values change | Commonly used as primary keys in dimension tables |
| Generation Method | Provided by source systems | Often generated using identity columns |
Example
If a customer table uses email address as the key, that is a natural key because it comes from business data. However, if the table uses a system-generated CustomerID (1, 2, 3, …), that is a surrogate key. Identity columns help create these surrogate keys automatically whenever new rows are inserted into the table.
Why Identity Columns Are Better Than Manual Surrogate Key Methods
Before IDENTITY Columns, data engineers handled surrogate key generation manually — using MAX(ID) + 1, row numbers, or GUIDs. It worked, but it added steps to pipelines and required careful handling to avoid duplicates or performance issues. IDENTITY Columns remove that overhead by letting Fabric automatically generate unique numeric values at insertion, making key management simpler and more reliable.
Common Manual Methods Used Earlier
- Using MAX(ID) + 1: This method finds the highest existing ID in a table and adds one to generate the next value. While simple, it can cause conflicts when multiple records are inserted at the same time. Concurrent inserts may attempt to generate the same ID, leading to errors or duplicates.
- Using ROW_NUMBER(): Some pipelines generate surrogate keys using the ROW_NUMBER() window function during data transformation. This assigns sequential numbers to rows based on a defined order. However, it adds an extra processing step and may slow down transformations on large datasets.
- Using GUID or Hash Values: Another approach is to generate globally unique identifiers (GUIDs) or hash-based keys. These values ensure uniqueness across systems and datasets. However, they require more storage and can reduce query performance compared to numeric keys.
Advantages of Identity Columns
- Automatic Key Generation: Identity columns automatically generate a unique numeric value whenever a new row is inserted into the table. Users do not need to calculate or assign the ID manually. The system handles the key creation during data insertion.
- Simpler ETL Pipelines: Since the database generates the keys, ETL pipelines no longer need extra SQL logic to create surrogate IDs. This reduces transformation steps and makes pipelines easier to maintain. It also helps keep the data loading process cleaner.
- Reduced Risk of Duplicate Keys: Identity columns ensure that each generated value is unique within the table. This removes the risk of duplicate IDs that can occur when keys are generated manually. As a result, table relationships remain consistent and reliable.
- Better Performance with Numeric Keys: Numeric surrogate keys are usually more efficient for joins and indexing compared to larger identifiers like GUIDs. They also require less storage space. This can help improve query performance in warehouse environments.
By removing the need for manual key generation logic, identity columns make data warehouse design cleaner and easier to maintain.
Partner with Kanerika to Modernize Your Enterprise Operations with High-Impact Data & AI Solutions
How IDENTITY Columns Work in Fabric Data Warehouse
Identity columns in Fabric Data Warehouse behave differently from traditional SQL Server because Fabric runs on a distributed architecture. Instead of pulling from a single counter, values are generated across multiple compute nodes simultaneously. Each row gets a unique numeric value on insertion, but the values may not appear in strict sequence. This is expected behavior — distributed ingestion across nodes means the order of value generation cannot be guaranteed.
How Identity Values Are Generated
- Data Insertion Request: When rows are inserted into a table with an identity column, the request is processed by the Fabric warehouse engine. The system distributes the workload across multiple compute nodes instead of handling the operation in a single instance. This approach helps improve performance during large data loads.
- Range Allocation to Compute Nodes: To generate identity values efficiently, the system assigns a specific range of numbers to each compute node. These ranges allow nodes to independently generate unique values without needing to coordinate with other nodes. This reduces contention and allows faster processing.
- Parallel Value Assignment: Each compute node processes a portion of the incoming data and assigns identity values from its allocated range. Since multiple nodes work at the same time, identity values are generated in parallel. This parallel processing helps Fabric handle high-volume data ingestion.
- Data Written to the Table: After identity values are assigned, the processed rows are stored in the target warehouse table. Each row now contains a unique numeric identifier generated by the system. These values can then be used as surrogate keys for relationships between tables.
How to Use IDENTITY Columns in Fabric Data Warehouse
IDENTITY Columns in Fabric Data Warehouse provide a built-in way to generate unique numeric values automatically for each new row inserted into a table. This feature was introduced in November 2025 (Preview) to simplify the process of creating surrogate keys in warehouse tables. Instead of writing custom SQL logic to generate IDs, Fabric now assigns the values automatically during data insertion.
In a data warehouse, tables often need a unique identifier so that relationships can be created between fact tables and dimension tables. Earlier, engineers handled this using manual methods like MAX(ID) + 1, ROW_NUMBER(), or GUID values. Identity columns remove that extra effort by letting the system generate the keys automatically, making table design and data pipelines easier to manage.
Key Benefits of IDENTITY Columns in Fabric Data Warehouse
- Automatic unique value generation: When a new record is inserted into a table that contains an identity column, Fabric automatically assigns a unique numeric value to that column. Users do not need to include the column in the insert statement.
- Simplifies surrogate key creation: Surrogate keys are widely used in dimensional models. Identity columns make it easy to generate these keys without building custom scripts or transformations in the pipeline.
- Reduces custom SQL logic in ETL pipelines: Earlier methods required additional SQL operations to compute the next available ID. Identity columns move this responsibility to the system, reducing the amount of code needed in ETL processes.
- Helps maintain consistent table relationships: Since identity columns always generate unique values, they provide a reliable key that can be used to connect related tables in a warehouse model.
- Designed for modern data workloads: The feature is built to support the architecture of Fabric Data Warehouse, which focuses on handling large data volumes and analytics workloads efficiently.
This feature plays an important role in simplifying warehouse design and improving the way surrogate keys are created in Fabric-based analytics solutions.
Difference Between SQL Server Identity and Fabric Identity Columns
Although identity columns exist in both SQL Server and Fabric Data Warehouse, their behavior differs because Fabric is built on a distributed architecture. SQL Server typically generates identity values from a single database instance, which results in sequential numbers. In Fabric Data Warehouse, identity values are generated across multiple compute nodes working in parallel, which improves scalability and data ingestion performance but may produce gaps or non-sequential values.
Key Differences Between SQL Server and Fabric Identity Columns
| Feature | SQL Server Identity Columns | Fabric Data Warehouse Identity Columns |
| System Architecture | SQL Server usually runs on a single database instance where identity values are generated centrally. | Fabric Data Warehouse uses a distributed architecture with multiple compute nodes processing data in parallel. |
| Identity Value Generation | Values are generated from a single counter within the database engine. | Identity values are generated across multiple nodes, each working with its own allocated range of numbers. |
| Sequence Behavior | Identity values are typically sequential and continuous when records are inserted. | Values may not appear strictly sequential because different nodes generate them simultaneously. |
| Handling of Gaps | Gaps are uncommon unless rows are deleted or transactions fail. | Gaps can occur naturally because identity ranges are assigned to different nodes. |
| Parallel Data Processing | Limited parallel identity generation since a single instance manages the values. | Designed for parallel processing where multiple nodes assign identity values at the same time. |
| Data Ingestion Performance | Suitable for traditional database workloads with moderate insert operations. | Optimized for large-scale data ingestion and cloud analytics workloads. |
| Scalability | Works well for standard OLTP or smaller analytical workloads. | Built for large distributed workloads that require high scalability and performance. |
Syntax for Creating IDENTITY Columns in Fabric Data Warehouse
Creating an identity column in Fabric Data Warehouse is straightforward and is done during table creation using T-SQL. The identity column is defined by adding the IDENTITY keyword to a numeric column, which instructs Fabric to automatically generate unique values whenever new rows are inserted into the table. Unlike SQL Server, Fabric currently supports identity columns in a simplified form and does not require specifying seed or increment values.
Basic Syntax
Below is a simple example of creating a table with an identity column.

In this example:
- ID is defined as an identity column.
- Fabric automatically generates a unique numeric value for this column whenever a new row is inserted.
- Users do not need to include the identity column in insert statements.
Key Things to Know About the Syntax
- Identity Column Must Use BIGINT: Currently, Fabric supports identity columns only with the BIGINT data type. This allows the system to generate a very large range of unique numeric values.
- No Seed or Increment Parameters: Unlike SQL Server syntax such as IDENTITY(1,1), Fabric only uses the IDENTITY keyword. The starting value and increments are handled internally by the system.
- Automatic Value Assignment During Inserts: When inserting rows into the table, users should not include the identity column in the insert statement. Fabric automatically assigns the next available identity value.
- Typically Used as Primary Keys: Identity columns are commonly used as primary keys or surrogate keys in warehouse tables, helping maintain relationships between dimension and fact tables.
How to Drive Greater Analytics ROI with Microsoft Fabric Migration Services
Leverage Kanerika’s Microsoft Fabric migration services to modernize your data platform, ensure smooth ETL, and enable AI-ready analytics
Supported Data Type for Identity Columns
In Fabric Data Warehouse, identity columns currently support only the BIGINT data type. This means that when defining an identity column in a table, the column must be created using BIGINT IDENTITY. Other integer types such as INT are not supported for identity generation in Fabric at this time.
The BIGINT data type uses 8 bytes of storage and supports a very large range of numeric values. Because of this large capacity, identity columns can generate a massive number of unique identifiers without running out of values, making them suitable for large-scale data warehouse environments.
Key Details About BIGINT Identity Columns
- Large Numeric Range: BIGINT supports values up to approximately 9.22 quintillion, which provides a huge range for generating unique identifiers. This makes it suitable for large datasets where billions of records may be inserted over time.
- Suitable for Large Data Warehouses: Data warehouse tables often contain millions or even billions of records. Using BIGINT ensures that the identity column can continue generating unique values even as the table grows significantly.
- Stable Numeric Identifier for Rows: Because BIGINT values are numeric and compact, they work well for joins, indexing, and relationships between tables. This helps maintain efficient query performance in warehouse workloads.
- Low Risk of Running Out of Identity Values: Given the extremely large range of BIGINT values, it is very unlikely that a Fabric warehouse table will exhaust the available identity values under normal workloads.
Is IDENTITY INSERT Supported?
Currently, IDENTITY INSERT is not supported in Fabric Data Warehouse during the preview stage. This means users cannot manually insert values into an identity column. Instead, Fabric automatically generates the identity value whenever a new row is inserted into the table.
Because identity generation is managed by the system, the identity column should not be included in the INSERT statement. Fabric assigns the unique numeric value automatically during the data load process.
What This Means in Practice
- You Cannot Manually Set Identity Values: Unlike SQL Server where SET IDENTITY_INSERT ON allows manual ID insertion, Fabric currently does not support this capability. Every inserted row must rely on system-generated identity values.
- Identity Values Are Fully System Controlled: Fabric manages identity generation internally across its distributed architecture. This ensures that every generated value remains unique without requiring user intervention.
- Possible Enhancement in Future Releases: Microsoft has indicated that future updates may allow inserting negative values into identity columns. This could provide more flexibility for advanced data loading scenarios.
Enable Responsible and Ethical AI in Your Enterprise
Discover a practical roadmap for implementing transparent and accountable AI systems.
Step-by-Step Example: Creating and Using IDENTITY Columns in Fabric Data Warehouse
Now that we understand how identity columns work, let’s look at a simple example of creating and using them in Fabric Data Warehouse. In this example, we will create a table with an identity column, insert a few records, and then query the table to see how the identity values are generated automatically.
This demonstration uses the Microsoft Fabric portal where the warehouse environment can be accessed through the workspace.
Step 1: Open Your Fabric Workspace
First, log in to the Microsoft Fabric portal and navigate to your workspace.
Steps to follow:
- Go to the Fabric portal.
- Open Workspaces.
- Select the workspace where your warehouse is stored.
- Open your warehouse instance.
In the example from the transcript, the warehouse named WH01 is used.

Step 2: Create a Table with an Identity Column
Next, create a table that contains an identity column.
Open a New SQL Query and run the following script.

Here:
- ID is defined as an identity column.
- Fabric automatically generates a numeric value for this column when rows are inserted.
Step 3: Insert Data into the Table
After creating the table, insert some sample records. Notice that the identity column is not included in the insert statement.

When this query runs, Fabric automatically generates identity values for the ID column.
Step 4: Query the Table
Now run a query to check the inserted data.

You should see the identity column populated with unique numeric values assigned by the system.
Step 5: Insert More Records
Insert additional rows into the table.

When you query the table again, you may notice that the new identity values do not always follow the previous sequence exactly.
Why the Identity Values May Look Different
This behavior occurs because Fabric generates identity values across multiple compute nodes. Each node receives its own range of numbers, which means the final values may appear slightly out of order or contain gaps.
This design is intentional and helps Fabric support high-performance parallel data ingestion for large analytics workloads.
Key Limitations of IDENTITY Columns in Fabric Data Warehouse
While IDENTITY Columns in Fabric Data Warehouse simplify surrogate key generation, the feature is currently available in preview and comes with a few limitations. Understanding these limitations helps data engineers design their warehouse tables and data pipelines more effectively.
- No Seed and Increment Configuration: Unlike SQL Server where identity columns can be defined using syntax like IDENTITY(1,1), Fabric currently supports only the IDENTITY keyword. The starting value and increment behavior are handled internally by the system.
- Identity Values May Not Be Sequential: Since Fabric generates identity values across multiple compute nodes, the numbers may not always appear strictly sequential. Gaps or out-of-order values can occur, which is expected in distributed processing environments.
- Manual Identity Insert Not Supported: Fabric does not currently support IDENTITY INSERT, meaning users cannot manually specify identity values during data insertion. All values are automatically generated by the system.
- Limited Data Type Support: Identity columns currently support only the BIGINT data type. Other integer types, such as INT, are not supported for identity generation.
Despite these limitations, identity columns still provide a reliable way to generate surrogate keys automatically and simplify data warehouse design in Fabric. As the feature evolves beyond preview, additional capabilities may be introduced.
Why Kanerika is Your Trusted Partner for Microsoft Fabric Implementation
Implementing Microsoft Fabric data agents requires more than just technical knowledge. You need a partner who understands both the platform’s capabilities and your specific business context. Kanerika brings that combination as a certified Microsoft Solutions Partner for Data and AI, along with Databricks partnership credentials.
We specialize in deploying Microsoft’s analytics ecosystem including Fabric, Copilot, Power BI, and Purview. Our team holds advanced Microsoft specializations in Analytics on Microsoft Azure and Data Warehouse Migration to Microsoft Azure. This isn’t theoretical expertise. We’ve built production solutions that handle real enterprise data challenges across industries.
Security and quality standards matter when you’re working with business-critical data. Kanerika maintains CMMI Level 3, ISO 27001, ISO 27701, and SOC 2 Type 2 certifications. These aren’t just compliance checkboxes. They represent audited processes that protect your data throughout implementation and beyond.
Our approach combines Microsoft Fabric’s native capabilities with Databricks’ data intelligence platform when your use case demands it. We don’t push specific tools. We assess what you actually need, design solutions that address those requirements, and deploy them with minimal disruption to ongoing operations.
Whether you’re starting fresh with Fabric data agents or integrating them into existing analytics infrastructure, we handle the technical complexity while you focus on the business outcomes. That’s the partnership model that works.
Drive Business Growth with Microsoft Fabric Analytics!
Partner with Kanerika for Smooth and Scalable Fabric Adoption
FAQs
Why are IDENTITY Columns used in Fabric Data Warehouse?
Identity columns are mainly used to generate surrogate keys that help maintain relationships between tables in a data warehouse model. They simplify key generation and reduce the need for manual SQL logic in ETL pipelines.
- Common benefits include:
- Automatic generation of unique numeric identifiers
- Simplified data loading processes
- Reduced risk of duplicate key values
- Better performance for joins compared to non-numeric identifiers
How do IDENTITY Columns work in Fabric Data Warehouse?
When a row is inserted into a table that contains an identity column, Fabric automatically generates a unique numeric value for that column. Unlike traditional SQL Server implementations, identity values in Fabric are generated across multiple compute nodes in parallel. This distributed generation improves scalability and performance for large data workloads.
Are identity values always sequential in Fabric Data Warehouse?
No, identity values in Fabric Data Warehouse are not guaranteed to be sequential. Because the platform uses distributed processing across multiple nodes, identity values may contain gaps or appear out of order. This behavior is expected and does not affect the uniqueness of the generated values.
What data type is supported for IDENTITY Columns in Fabric Data Warehouse?
Currently, Fabric supports identity columns only with the BIGINT data type.
Key details:
- Uses 8 bytes of storage
- Supports a very large numeric range
- Suitable for large-scale data warehouse tables
- Provides billions of unique identity values
Can we manually insert values into an IDENTITY column in Fabric?
No, manual insertion of identity values is not supported during the current preview stage. Fabric automatically generates identity values whenever new rows are inserted into the table. Users should exclude the identity column from insert statements so the system can generate the value automatically.
How are identity columns different from SQL Server identity columns?
While both systems generate unique numeric identifiers, Fabric implements identity columns differently due to its distributed architecture.
Key differences include:
- Fabric generates values across multiple compute nodes
- Identity numbers may not always be sequential
- Fabric currently supports only BIGINT IDENTITY
- Seed and increment values are not configurable
What are surrogate keys and why are they important?
Surrogate keys are system-generated identifiers used to uniquely identify records in a table. They do not have business meaning and are typically numeric values. In data warehouses, surrogate keys are commonly used to maintain stable relationships between dimension tables and fact tables.
When should you use IDENTITY Columns in Fabric Data Warehouse?
- Identity columns are useful in several scenarios, including:
- Creating surrogate keys for dimension tables
- Generating primary keys for warehouse tables
- Simplifying ETL processes that require unique identifiers
- Maintaining relationships between fact and dimension tables
Using identity columns helps reduce complexity in data pipelines and ensures consistent key generation.
Do identity columns improve performance in Fabric Data Warehouse?
Identity columns can help improve performance by simplifying key generation and enabling efficient numeric joins. Since Fabric generates identity values automatically during data insertion, ETL pipelines require fewer transformations. Numeric surrogate keys also perform better for joins and indexing compared to larger identifiers such as GUIDs.



