Power BI gives developers two options when blending datasets. They can either merge two or more tables together, or they can build a relationship between the two and leave them as individual tables in the data model. While they can both produce similar results, their use cases, performance implications, and underlying mechanics differ significantly.
We’ll provide a brief overview of what each function does when combining datasets together, and then go into more detail about how Power BI treats each one, and what you should know when thinking through report optimization to ensure that your users are receiving the best possible experience.
Table of Contents
Methods for Combining Data in Power BI
There are three different methods for combining datasets together in Power BI. The first thing to know is that Power BI doesn’t use standard SQL terms such as join and union, instead they refer to the same operations as merge and append.
An append or union stacks two datasets on top of each other. Imagine you have a separate file for each month of the year with the exact same columns and data structure. An append or union would match up each dataset based on column name and provide you with one giant table that includes all months of the year.
A merge or join combines tables based on a common value and presents data as one wide table with matching values from each one aligned to the appropriate row. For example, if you had one table of addresses with state abbreviations, and another with the state abbreviation and full state name you could use a merge like a vlookup and match up the full the state name on from the second table to the state abbreviation on the first table.
In the follow article, we may use append and union, or join and merge interchangeably.
Merging Tables in Power BI
Merging tables in Power BI is the same as performing a join in SQL. It combines rows from two or more tables based on common columns, creating a single, extended table. This process is done in Power Query Editor. There are different types of merges available depending on the desired results.
- Inner Merge – Combines rows that have matching values in both tables.
- Left Outer Merge – Includes all rows from the left table and matched rows from the right table.
- Right Outer Merge – Opposite of the left outer, it includes all rows from the right table.
- Full Outer Merge – Combines all rows from both tables, with nulls where there is no match.
- Anti Join – Returns rows from one table that have no match in the other.
Our Comprehensive Guide to performing merges and joins in Power BI goes into even more detail about the different types of merges that are available and when you may want to consider using each one.
Tables can either be merged into each other, or merged as an entirely new table using the Merge table wizard in the Power Query Editor when creating your data model.
When to Merge tables
Merging tables can be a necessary data modeling step, or can also be used as a data analysis step. For example, if you are writing formulas in Power Query M it can be easier to combine tables prior to creating the formula. It can also be useful to identify which values in a dataset match vs. ones that do not match such as performing a reconciliation. Both matching and non-matching records could be displayed as individual tables.
Merging is useful when:
- Consolidating data from multiple sources.
- Perform row-level operations before loading data into the model.
- Flattening a star schema into a single large table.
- Performing analysis and comparing values across tables for a single or multiple columns.
Power BI Relationships
Creating relationships in Power BI involves linking two tables through key columns. These relationships allow you to model data in a way that is more visual, easier to keep track of, and can quickly and easily be changed when new tables are added or changed. Rather than making changes to the data model using Power Query, these changes are made using the Power BI relationship editor.
Relationships can be created in several different methods.
- One-to-One (1:1): Each row in Table A corresponds to one row in Table B.
- One-to-Many (1:N): Each row in Table A can relate to many rows in Table B.
- Many-to-One (N:1): Many rows in Table A relate to one row in Table B.
- Many-to-Many (N:N): Introduced in recent versions of Power BI for more complex scenarios.
Power BI relationships are also directional and have two different options.
- Single Direction: The default setting, filters flow from one table to another in one direction.
- Both Directions: Filters flow in both directions, which can be useful but also risk creating ambiguity in the data model.
Microsoft provides further detail about relationship cardinality and explains how to Model relationships in Power BI Desktop in further detail. It’s worth a read to understand the nuance of each one of these settings that can be established.
Instead of using the Power Query Editor, relationships are managed in the relationship editor screen of Power BI Desktop. It is accessible on the left side of Power BI Desktop, and shows you a diagram of related tables, along with annotations for which fields are being used as keys and the direction of each relationship.
When to Build Power BI Relationships
First, it’s worth noting that depending on the source of data that you are using the merge option is not always available. When working with direct query datasets you may not have the ability to modify or merge the dataset using Power Query. There are some exceptions with composite models, but in general relationships could be your only option to consolidate data.
Other cases where a relationship will make sense over merging.
- Use relationships for large datasets where merging would be inefficient.
- You don’t want to, or don’t have the ability to merge tables in Power Query.
- Complex data models containing many different tables that would be hard to keep track of multiple merges.
VertiPaq Engine Treatment of Merge vs Relationships
Power BI uses the VertiPaq engine to process calculations and store tabular data within the semantic layer. Similar to how the VertiPaq engine processes changes in the Power Query Editor differently than creating a DAX Measure, merge and relationships are also handled at different points.
Merged tables are calculated at the time that a dataset is refreshed along with other Power Query operations. The new merged tables are stored in the compressed data model, but will also take up more storage space in the .pbix file.
This has several important implications.
- Scheduled refreshes will take longer than using relationships.
- You may hit the .pbix data size limit sooner when publishing to the Power BI Service.
- There is no visibility to the merges happening once data is loaded into the Power BI reporting layer.
- Once the data is loaded into the model, the merged table behaves as a single table, and any calculations or transformations have already been applied.
- Reduces the amount of time spent to execute queries when report viewers interact with reports.
Relationships are executed at query time. This makes them more dynamic but also means that the execution of a query may take longer when interacting with a report compared to offloading the processing to a flattened data model by merging queries.
There are several important implications of using relationships.
- Relationships between tables are defined in the data model but are utilized at query time.
- When you establish a relationship between tables, Power BI maintains them as separate entities in the model.
- Smaller data model size
- Increases flexibility, allowing for more dynamic and complex analyses.
- Potentially increases the processing time when a query is executed.
When to Use Relationships vs. Merge
The decision to use a relationship vs merging data tables often comes down to the size of data being used, the specific use case, and how dynamic the data source is. Meaning how often the data source is subject to change or how likely you are to have to modify the data model in the future.
Because it’s often unknown how often data will change in the future, and the scale of corporate data is ever increasing our general recommendation is to lean towards relationships over merging, but there are several considerations worth noting.
1.) How large is your dataset?
- Small to Medium Datasets: Merging tables can be viable, but always be mindful of the potential for increased complexity and size.
- Large Datasets and Enterprise Solutions: Lean towards relationships to ensure scalability and maintainability of the data model.
2.) Query Performance
Merged tables process during data refreshes and can reduce query time. However, the VertiPaq engine does optimize query execution when utilizing relationships and working with complex data models.
- Merged tables can slow down scheduled data refreshes.
- Relationships allow the VertiPaq engine to optimize queries, resulting in faster performance.
We would not automatically assume that we should use merges over relationships to offload processing to a central refresh however. There are a number of tools available in Power BI to analyze query performance and speed up reports.
The following video from Guy in a Cube goes into depth about one of the optimization tools available.
Microsoft also provides information specific to the topic available here: Optimization guide for Power BI. It’s a useful starting point but is not necessarily comprehensive in all of the ways to optimize Power Query, Power BI, and the VertiPaq engine.
3.) Data Integrity and Transformation Visibility
If you were to merge all tables together from a complex data model, it could be incredibly difficult to go back and trace the transformations performed in Power Query. If data were ever to change or there was an error in an individual table transformation it could be incredibly time consuming to troubleshoot.
By utilizing relationships, you can keep source tables intact allowing you to take advantage of a star schema and maintain the integrity of individual tables.
4.) Best Practices
As a general rule, merging should be used sparingly and mostly with small datasets. The Power BI Service is subject to data size limits and different connection methods may dictate that relationships have to be used.
Relationships are the go-to for most data modeling needs, and allows you to have a smaller data model size while ensuring visibility to the connections between tables.
Table of Power BI Relationships and Equivalent Merge Function
As you can see, the options to merge vs build a relationship are semi-interchangeable. The following table shows the relationship type and corresponding merge type under different scenarios.
|Power BI Relationship Type
|Power BI Merge Type
|Inner Merge (with unique keys)
|Both relationship and merge involve matching unique records from two tables. In a 1:1 relationship, each record in Table A corresponds to one unique record in Table B and vice versa. Similarly, an inner merge with unique keys combines rows that have matching, unique values in both tables.
|Left Outer Merge
|In a 1:N relationship, each record in Table A (typically a dimension table) can relate to multiple records in Table B (typically a fact table). This corresponds to a left outer merge, where all rows from the left table (Table A) are included along with matched rows from the right table (Table B).
|Right Outer Merge
|The inverse of One-to-Many. In N:1, many rows in Table A relate to a single row in Table B. This is akin to a right outer merge where all rows from the right table are included along with those that match from the left table.
|Full Outer Merge
|In an N:N relationship, rows from both tables can have multiple corresponding rows in the other table. This is similar to a full outer merge, which combines all rows from both tables, filling in with nulls where there is no match.
|Not Directly Applicable
|Anti joins in Power BI merges (like Left Anti or Right Anti) return rows from one table where there are no corresponding matches in the other. There’s no direct equivalent in Power BI relationships, as relationships are inherently designed to define connections, not to filter out non-matching rows.
There are also scenarios such as an Anti-Join where you would have to perform the operation in Power Query using a Merge and no direct equivalent is available using a relationship.
Merging tables and building relationships in Power BI serve different purposes and have distinct implications for data model performance and maintainability. Merging is best suited for data transformation and consolidation before loading into the model, whereas relationships are essential for efficient data modeling and large-scale reporting.