Excel users can group and aggregate data in Power Query to remove duplicate rows while maintaining the integrity of the values stored in the dataset. It’s a common data prep technique to avoid duplication of rows during merge operations, but equally important is a way to optimize query speed and reduce the total size of your data model.
We’ll explain what grouping does in Excel Power Query along with several considerations to keep in mind to determine if grouping and aggregating at the Power Query level is appropriate or if it’s better to load a full dataset and aggregate inside of a pivot table or within a regular Excel sheet.
Let’s jump in!
Table of Contents
What is Microsoft Excel Power Query?
Power Query is an Excel feature that allows users to import, transform, and automate the processing of data from various data sources. It enables users to work with large amounts of data, and to automate repetitive processes with minimal technical knowledge required.
While it’s one of our favorite features of Excel, we find that many people are unfamiliar with what it’s capable of and tend to underutilize it. To learn more about Power Query, check out our Explanation of Power Query for Excel Users.
Why You Should Group Data in Excel Power Query
Grouping in Excel Power Query is a technique used to consolidate data into categories or groups, making it easier to perform aggregate calculations like sums, averages, counts, or more complex operations. For example, sales data can be grouped by region or product category to reduce the number of rows that you work with when loading data back into Excel or summarizes data down to a single row per region so you can merge it with another table or perform a VLOOKUP with unique values.
Some of the additional advantages include:
- Reduced Data Model Size – Excel has a 1 million row limit per sheet, anything above the limit has to go to another sheet or loaded into a Power Pivot data model.
- Power Query Automation – Because you can connect Power Query to folders of files, or live data connections you can setup a process once and re-use it many times which eliminates the need to manually repeat data prep processes.
- Power Query Audit Trail – An added benefit of Power Query is that it tracks Applied Steps, meaning that you can quickly see what actions have been applied to a dataset and can click through the history of it.
- Processing Efficiency – Even well before the 1 million row limit, Excel can become unusable with large datasets due to the time it takes to calculate formulas. Power Query uses the Vertipaq Engine and is optimized for these situations.
Now let’s look at how to do some grouping!
How to Group using Power Query
To Group in Power Query, first launch the Power Query editor. Apply any necessary data transformations to clean the data. Then either click the Group By button on the Transform section of the Power Query Ribbon or right click on a column header and select Group By. A grouping dialog will appear where you can define grouping and aggregation settings.
Here’s how in more detail.
Step 1.) Launch the Power Query Editor and Select Group By
We’ll work through an example of Cookie Sales by Date and Cookie Type. Once the data is prepped and cleaned, go to the Transform section of the Power Query Ribbon and select Group By.
Tip: You can also access many common features of Power Query by right clicking on the title of a column. A contextual menu will appear with the most common Power Query actions to save you time and avoid navigating the ribbon.
Step 2.) Adjust the Group By Settings
There are several settings available in the Group By settings. By default, it will begin with Basic Grouping settings that ask you to choose which column you want to summarize by, and then what the name of the resulting column is and the aggregation method.
Because Power Query keeps an audit trail of columns, and works mostly by column name reference you will need to provide a New Column Name.
In our example, we tell Power Query to Aggregate the Quantity column into a Total Quantity Column based on the Sum of the Quantity Column.
Values will be aggregated at the Cookie Type column level.
The results are as follows:
As you can see the results are a single column called Cookie Type, with the Total of the Quantity Column for our results of each category.
Power Query Group By Aggregation Methods
While grouping based on the Sum of values is the most common, Power Query provides several options to aggregate data. This is also a good way to Count Distinct values inside of Excel that’s easier than using a Pivot Table or Excel Formulas to calculate the same result.
- Count Rows
- Count Distinct Rows
- All Rows
Note: Power Query Grouping allows you to use multiple aggregation methods in one step, meaning that you can create multiple columns from a single dataset. If you need to further aggregate the data at different levels you may also want to copy a data table or create a reference table
Grouping by Multiple Criteria in Excel Power Query
To group by multiple columns in Power Query, launch the Power Query Editor, then click on the Group By button in the Transform section of the Power Query Ribbon. When the Group By settings window appears, choose Advanced. An option will appear to Add Grouping. Each added group will be a second tier of column. Power Query interprets it from top to bottom and produces columns grouped from left to right.
When grouping in Power Query, you can add multiple aggregation methods to create multiple results columns at one time. Similar to Adding Grouping, you can click Add Aggregation at the bottom of the grouping settings editor.
Power Query Grouping vs. Pivot Tables Explained
Deciding whether to group using a Pivot Table or Power Query will largely depend on your use case and the size of your dataset. As a general rule, Pivot Tables are quicker, easier and more universal to understand for many end users when working with small datasets. Power Query is more suited for large datasets and repeatable multi-step processes.
- Ease of Use – Pivot Tables are more straightforward while Power Query offers more advanced capabilities.
- Data Size – Power Query can handle larger datasets more efficiently.
- Data Source Flexibility – Power Query supports a broader range of data sources.
- Refresh Capability – Power Query can automate data refreshes and automate processes.
There are also scenarios where you can mix and match to improve performance. Data that is initially grouped in Power Query can be loaded into a worksheet that becomes the basis of a pivot table. The one drawback is that users will not be able to drill into the lowest level of detail directly in a Pivot Table.
Excel Power Query is one of the most powerful features of Excel. By understanding when to group data, you can more efficiently work with large datasets and automate processes. Grouping is often a technique used to either limit the data model size, or create aggregations so that summarized columns are unique to merge against or apply other formulas against.
Even if the end result of your Power Query steps is to load the data into a Pivot Table, by pre-aggregating it, many users will see significant performance gains and lower refresh times when adjusting the pivot table data.