Merging tables in Power Query is a way to automate the consolidation of multiple spreadsheets or datasets. The merge function is similar to performing a VLOOKUP where columns are combined based on a common value. For those familiar with the SQL database world, a Power Query Merge performs the same function as a SQL JOIN.
Learning how to Merge in Power Query is an essential skill for anyone looking to automate their Excel workflows or build data models in Power BI. We’ll explain single and multi-column selects, various merge types, and the nuances of fuzzy matching.
Let’s jump in!
Table of Contents
What Merging Tables Does in Power Query
Merging tables in Power Query enables you to combine data from two different tables based on a common key. This operation is similar to a VLOOKUP in Excel but is more powerful and efficient. Instead of bringing in one column at a time, you can bring in a single column or an entire table of columns.
Merging allows you to combine two separate Power Query Tables based on a common value. In the following example, we’ll use a dataset where we want to combine sales data on the left, with a table of customer data on the right.
Within Excel, we could combine both of these data sets with a VLOOKUP but it wouldn’t be practical if we had lots of columns we wanted to combine, or if we had such a large number of rows that regular Excel formulas slow down. Power Query is a much more efficient solution for both.
Power Query Merge vs. SQL Join
Power Query’s merge operation is comparable to SQL’s join clause, but with a more user-friendly interface suited for non-programmers. Power Query provides a graphical interface that simplifies the process, making it accessible to users without extensive SQL knowledge. Both perform similar functions, allowing for the combination of data from different tables, but Power Query’s merge feature is most commonly used within Excel or Power BI environments.
Merge vs Append in Power Query
There are two primary methods of combining tables in Power Query, merging and appending. Merging combines two tables based on a common column or columns, effectively joining rows from these tables to create a single, unified table. Appending stacks one table on top of another and aligns the columns based on column name. An append combines multiple data sets into one long continues table for analysis.
How to Merge Tables in Power Query
To combine tables using Power Query import the data sets into the Power Query Editor. Next, navigate to the Home Tab of the Power Query Editor and click Merge Queries. Choose to merge the tables into an existing one or create a new table. Then choose the columns that share common values and a join type.
Here’s how in more detail.
Start by importing multiple data tables into Power Query. The data tables will show up on the left side of the screen under the Queries section.
Then select the Merge Queries button under the Combine section of the Power Query Ribbon. The table you are currently viewing becomes the starting point for the merge. In general, it will be the one that will be on the left side of the joined tables.
Select whether to Merge Queries, or Merge as New. Both of these options will combine the tables based on a common column. Merge Queries will combine a second table with the first selected table. Merge Queries as New will leave both starting tables intact and create a third table of consolidated data.
After selecting a method, the Merge Dialog box will appear. Select the starting table, and the table you want to merge with it
Next, choose columns with common values. You can select a single column, or multiple columns at this point. Selecting multiple columns is useful as it avoids having to perform a concatenate to create a complex common key for an Excel VLOOKUP.
The final option is to choose a Join Kind. This is how Power Query will react when values exist on one data table and not the other.
Another less frequently used option is to Fuzzy Match, we’ll leave this alone for the moment but explain it in further detail later.
Press OK and the tables will be merged together.
Expanding Merged Tables in Power Query
When tables are merged in Power Query it’s not immediately clear that the operation worked. This is because Power Query presents the merged table as a collapsed field. To expand the table, click on the double-sided arrow on the title of the merged column. A dropdown appears to select columns to expand.
When you choose to expand the columns, the dropdown gives a list of columns from the second table being merged in. You can also choose whether to use the original column names as a prefix, or you can unselect this to use the original column headers.
Once you click OK, the column will be expanded to show all of the original columns and the selected columns for the expansion.
Any columns with the same name will automatically be re-named to prevent duplication of column titles.
Merge Types in Power Query
Power Query offers several merge types, including Inner, Left Outer, Right Outer, and Full Outer joins. These options determine how rows from the merged tables are included in the final dataset. For example, an Inner join includes only rows with matching keys in both tables, while a Full Outer join includes all rows from both tables, matching or not. Selecting the appropriate merge type is crucial for achieving the desired outcome in your data analysis.
Here’s a table explaining the different types of merges (joins) available in Power Query, along with some brief examples for each.
Here’s a more concise version of the table explaining Power Query merge types:
|Combines matching rows from two tables based on a specified column.
|Only employees with departments.
|Left Outer Join
|Includes all rows from the left table and matched rows from the right table.
|All products, with sales info where available.
|Right Outer Join
|Includes all rows from the right table and matched rows from the left table.
|All shipping details, including unsent orders as NULLs.
|Full Outer Join
|Combines all rows from both tables, filling in NULLs for non-matches.
|All customers and orders, with NULLs for unmatched records.
|Returns rows from the first table that have no match in the second table.
|Employees who haven’t completed training.
Be careful when selecting a merge type to make sure that you’re not accidentally dropping off necessary data from your data table. Keeping track of the row count can be a useful tool for making sure that the full number of rows are being accounted for or being dropped off as expected.
Editing an Existing Merge in Power Query
To edit a Power Query Merge after it’s been completed, select the resulting table and click on the cog icon next to the Source Applied Step on the right side of the screen. This will re-open the Merge Dialog box allowing you to adjust settings, change merge types and common columns.
People new to Power Query may find it useful to experiment with multiple join types until they’re comfortable with what each one days. Editing a query is a fast way to adjust it.
Fuzzy Matching Options When Merging Queries
Fuzzy matching in Power Query allows you to merge tables based on keys that are not exactly the same but similar. This feature is invaluable when dealing with data inconsistencies, such as slight variations in spelling. Fuzzy matching options enable you to set thresholds for similarity, allowing users to combine tables when there isn’t a 100% exact match between common columns.
To enable fuzzy matching, click on Use Fuzzy Matching to Perform the Merge. You can then set options such as similarity threshold, whether or not to take into account upper case, lower case, combining based on parts of a text column etc.
Whenever you use Fuzzy Match, we recommend reviewing the results to make sure that they’re as expected. Depending on the threshold and matching algorithm there is a risk of false positives and negatives that can adversely affect the analysis of data.
Merging More than Two Tables Together
Power Query only allows you to Merge two tables together at one time. To merge multiple tables in Power Query, you have to do so sequentially. Merge the first two tables, then perform another merge against the newly combined table. Make sure to rename datasets to keep track of the merge order.
Merging tables in Power Query is a highly effective way to consolidate data, avoid multiple VLOOKUPS in Excel and manage large data models. Perform a merge by first importing all required data into the Power Query Editor, then select a table and launch the Merge Dialog box. Select a common column, or multiple columns and a join type. The results will be a collapsed column combined with the initial data set to expand and view the matching values based on the set criteria.