Power BI provides a number of different ways to remove duplicates and the method used will depend on your unique dataset. Duplicates can either be aggregated or removed. Aggregation is appropriate when you do not want to lose data when removing duplicates, other times duplicate rows can be automatically detected and deleted.

Identifying Columns with Duplicates in Power BI

Duplicates are most easily identified using the Power Query editor within Power BI. Either click Transform Data or right click on a data table and click on “Edit Query”. Then enable the Column Distribution and Column Profile options to view the count and distinct count of the values in your data table.

Right click a data table to launch the Power BI Power Query Editor

This will place a small chart at the top of each column of your dataset. The bars tell you how many times a value appears within a column. By comparing the count of values in a column against the unique count you will identify which columns contain duplicate values.

Make sure to click on the Column Profiling based on 1000 rows option at the bottom of the page and tell Power Query to sample the entire data set.

Finding Which Rows Duplicate in Power BI

To determine which rows duplicate in Power BI, launch the Power Query Editor, and add an index column that will number all rows from 1 to the last row in your dataset. Then remove duplicate rows. You will be able to determine which rows were duplicates based on the missing index value numbers.

Step 1.) Right Click on a Query and Select Reference

Reference queries in Power Query allow you to apply data transformations to a table without modifying the underlying data of your original table. We do this so we can see the duplicated values before we remove them from our original table.

Create a reference table in Power BI Power Query to view duplicate rows

Step 2.) Keep Duplicates

On the new reference table, go to Home > Keep Rows > Keep Duplicates. This is the opposite of removing duplicates and allows us to see which columns contain duplicate values before removing them. You could also choose to add an index column before keeping duplicates to see the exact row numbers the duplicates occur on.

Use Power BI Keep Duplicates option to filter on only duplicated values. This will let you preview duplicates before removing them for a dataset

The results are a second data table that you could re-name and keep in your data model as a reference.

Screenshot of a reference column created in Power Query

Once you remove the duplicates from your primary data source, this table will no longer show the duplicates if it is a reference table. A reference table starts where the previous query ends.

Removing Duplicate Values in Power BI

To remove duplicates in Power BI, right click on a column header in Power Query and select Remove Duplicates. Power Query will add a step to remove any duplicate values for that specific column.

Right click a column name in Power BI and choose Remove Duplicates to add a step to remove duplicate values in a column

A second option for removing duplicate rows is by navigating to Home > Remove Rows in the Power Query ribbon of Power BI. It works the same as right clicking on a column header. There are some additional options to remove blanks, errors, as well.

Power BI developers can also use the remove duplicates option under Remove rows on the Power BI ribbon

Aggregate Duplicate Rows with Group By

When you want to aggregate rows instead of filtering them, use the Power BI Group By function to consolidate duplicate rows. This method is appropriate when you do not want to lose data, but also don’t want duplicate rows.

Step 1.) Launch the Group by Wizard

From the Power Query editor, go to Home > Group By

Screenshot of the Power BI Group By button in Power Query

The following screen will appear with a Basic and an Advanced mode. In our example below, we will tell Power BI to keep all of our columns as groups within our dataset.

Step 2.) Define Aggregation Columns and Methods

Then we add an aggregation method to Sum our Sales Column.

Screenshot of Power BI Advanced Group BY Settings used to add duplicate rows together

Power BI will go through all of the rows in a dataset, and remove the duplicate rows while consolidating the values into a single row.

Final result of grouped columns in Power BI that add duplicate rows together instead of filtering them from a dataset.

By using the Power BI Group By function, you can consolidate rows. Ultimately this will take duplicates out of your dataset, but also ensures that no data is lost in the process.

How to Find and Remove Duplicates in Power BI

Table of Contents

Identifying Columns with Duplicates in Power BI

Finding Which Rows Duplicate in Power BI

Step 1.) Right Click on a Query and Select Reference

Step 2.) Keep Duplicates

Removing Duplicate Values in Power BI

Aggregate Duplicate Rows with Group By

Step 1.) Launch the Group by Wizard

Step 2.) Define Aggregation Columns and Methods

Table of Contents

Identifying Columns with Duplicates in Power BI

Finding Which Rows Duplicate in Power BI

Step 1.) Right Click on a Query and Select Reference

Step 2.) Keep Duplicates

Removing Duplicate Values in Power BI

Aggregate Duplicate Rows with Group By

Step 1.) Launch the Group by Wizard

Step 2.) Define Aggregation Columns and Methods

Related Articles