Grouping data using the Power Query Editor in Power BI is a fast and efficient way to aggregate data, combine duplicate rows, and improve the speed of your report. In addition to Power Query, there are also options to group and summarize data using DAX queries which are more dynamic but have implications on report speed that are important to understand.
Grouping in Power BI is a feature that allows you to combine data into groups or categories, making it easier to analyze and visualize data in your reports. It’s especially useful in scenarios where you have a large number of rows that can be consolidated based on common values. The one drawback is that you may lose a lower level of detail that can be drilled into.
Let’s jump in!
Pros and Cons of Grouping Data in Power BI
There are two primary methods for grouping data in Power BI, Power Query and DAX (Data Analysis Expressions). Power Query transforms data during the refresh of the semantic model, while DAX is more dynamic and executed when a report viewer interacts with a report.
Each approach has its own pros and cons. Grouping in Power Query can reduce the overall size of the dataset, can offload the calculations to a scheduled refresh but is more static in nature. Once data is aggregated in Power Query, users of the report will be unable to drill into the pre-aggregated level of data.
Grouping data using DAX is more dynamic, and in some ways more forgiving in that you won’t reduce the amount of underlying data, but it can force complex calculations to occur as a report viewer is filtering and drilling into a report. With a large enough dataset size, it can have an adverse impact on visual load and refresh times.
We’ll cover how to group using both methods..
How to Group Data in Power BI with Power Query
To Group data in Power BI, first launch the Power Query Editor. Then navigate to the Transform section of the Power Query Ribbon, and press the Group By button. Alternatively, you can right click on a column header and select Group By. This launches the Group By Dialog box to choose columns to group and aggregation methods.
We will use an example dataset of cookie sales by date that will be grouped and aggregated. After launching the Power Query Editor, we navigate to the Transform section of the Power Query Ribbon and select Group By
When the Group By dialog box appears, we can choose a column to group the data by, and then we have to define a new column for the aggregated values to be shown, along with an aggregation method and which column to apply it to.
Because Power Query performs all operations at the column level, it does not allow us to use the name of an existing column. However, we can add a second step later to the rename the column after the data has been grouped.
Once grouped, the table of data will be reduced, showing only the unique categories that were grouped and the new aggregated values.
How to Group By Multiple Columns in Power Query
To group by multiple columns in Power Query, go to the Transform section of the Power Query Ribbon, and select Group By. When the Group By dialog box appears, toggle it from Basic to Advanced. With advanced mode enabled, you can add additional columns to group by at the top, and you can choose multiple output columns with different aggregation methods.
As you can see from the dialog box below, the Add Grouping buttons appear after toggling from basic to advanced mode.
When you add grouping, Power BI will group data based on the hierarchy top to bottom of the multiple columns assigned to the group. When you click OK and accept the grouping the group at the top of the dialog box will be the column furtherst to the left, and each column below in the dialog box will be to the right.
Grouping by multiple columns and aggregation methods is similar to using a pivot table to group data, meaning that each column you add becomes part of the hierarchy, the order in which they are grouped will make a difference in your resulting dataset.
How to Group Data with DAX SUMMARIZE
To create a similar result of the Power Query Group By function using DAX, we can do so with SUMMARIZE formula. To SUMMARIZE data in Power BI, create a New Table, then use the SUMMARIZE formula, defining the Table Name, Grouping Column, Resulting Column Name, and the Aggregation Method.
The Summarize Syntax is:
SUMMARIZE( <Table>, <GroupBy_ColumnName1>[, <GroupBy_ColumnName2>, ...], ["NewColumnName1", <Expression1>][, "NewColumnName2", <Expression2>, ...] )
Using our dataset the filled in formula would look like the following.
SUMMARIZE = SUMMARIZE( CookieSales, CookieSales[Category], "TotalAmount", SUM(CookieSales[Amount]) )
After creating the New Table, it will show up in the data panel of the Power BI report page. When filled into a table visual it will look the same as if it were aggregated using Power Query.
The benefit of using DAX is that we don’t have to modify the underlying dataset in Power Query, and the full level of detail is retained for reporting in other parts of our report.
Note: When using SUMMARIZE, you have to create a New Table. Power BI offers the ability to create New Measures, New Columns, and New Tables. Each one of them serves a different purpose and must fit the function being utilized to avoid errors.
Implications of Using Power Query Vs DAX
Choosing between Power Query and DAX for data grouping in Power BI depends on your specific needs. Power Query is best for pre-processing and structuring data before loading it into Power BI. It’s great for static groupings and reducing data size.
On the other hand, DAX is used for dynamic data analysis within Power BI. It allows for more complex, on-the-fly groupings based on filters and interactions in your reports. Both have their strengths, and often, the best Power BI reports use a combination of both to achieve efficient and dynamic data analysis.
Grouping data in Power BI is quick and easy. There are multiple methods to perform the operation, and even DAX formulas that provide more flexibility than the SUMMARIZE function, such as GROUPBY. With so many ways to perform a similar operation it’s important to choose which one is best for you, the size of your data, and the audience of other developers that will have to manage and maintain your reports over time.
The group by function in Power Query is probably the most common method to aggregate data, but it limits the level of detail that gets loaded into the reporting layer. DAX SUMMARIZE is dynamic and doesn’t impact the underlying data but may not be practical to execute on large datasets.