Power Query makes trimming data to remove white space easy. If you have one column or multiple columns to trim Power Query can be much faster than working with traditional Excel formulas. It’s also a must know functionality for Power BI and other Microsoft solutions where Power Query is utilized to clean and transform data.
We’ll explain the different ways of managing whitespace and non-printable characters in Power Query, including practical tips on trimming and cleaning data across single columns, multiple columns, or entire datasets. Understanding these techniques is crucial for anyone looking to refine their data transformation skills.
Let’s jump in!
Table of Contents
Getting Started with Power Query
In our example we are going to use an Excel dataset that has a number of problems that can be anywhere from annoying to tricky to address. Some of the problems in this specific dataset include, extra whitespace around words, carriage returns with new lines, and white space in the column headers.
We’ll start by bringing this data into Power Query, and even though it’s only a few columns the same techniques can be applied to datasets with many more columns and thousands of rows of data.
Understanding Power Query Data Types
One of the first things to be aware of when trying to clean up a dataset in Power Query using a trim function is that the functionality is only available when working with Text columns. Each column has a data type assigned to it, most common they are either dates, numbers, or text but Power Query has a number of additional options.
The data type can be identified by the icon at the top of a column in Power Query next to the header name.
Power Query automatically attempts to detect a data type for each column when it is imported. Some of the data types will automatically force a regular formatting to a date or numeric data type that removes leading or trailing white space.
In the example below, we selected all of the columns, right clicked on a column header, and changed the data type to Text. Once it’s been changed you’ll be able to see the original data that went into Power Query and the white space that was automatically trimmed during import.
When you import data into Power Query it is not always evident which columns have white space in their values, and which ones do not.
How to Identify Whitespace in Power Query
Identifying whitespace in Power Query is crucial for cleaning data effectively. Whitespace can include spaces, tabs, and line breaks. It’s usually invisible when you first import data into Power Query and only noticed when a merge or some other operation doesn’t act as expected, or even worse the end report is incorrect and someone points it out to you.
To see whitespace in Power Query, navigate to the View Tab of the Power Query Ribbon and turn on Show Whitespace. Below are two screenshots comparing how power Query presents data with Show Whitespace disabled and enabled.
Show Whitespace Disabled
Show Whitespace Enabled
We would caution users to enable Show Whitespace whenever possible. Just because Power Query shows that there is no white space when you look at a field doesn’t mean that it doesn’t exist when you double click into the values of a field to explore them further.
How to Trim in Power Query
To Trim in Power Query, right click on a column header, go to the Transform section, and select Trim. You can also hold CTRL or SHIFT to Multi-select columns to right click on, Transform, and Trim. Once applied all preceding and trailing white space will be removed.
The following screenshot shows where the Trim option is available in Power Query.
As a general rule of thumb, we prefer right clicking on column headers in Power Query to perform as many operations as possible because it’s quicker than navigating the Power Query Ribbon. However, the Transform options are available under the Transform Tab, and Format button to Trim, Clean or change text case.
The operations are applied to all of the selected columns at once.
How to Trim All Columns in Power Query
To trim all columns in Power Query at one time, first select all by pressing CTRL + A. Then right click on a column header, and navigate to Transform, Trim. The Trim operation will be applied to all columns within a dataset in a single step.
Using Power Query to work with large datasets is a massive time saver compared to working in regular Excel spreadsheets where an operation would have to be applied to all columns in a dataset separately and then copied and pasted back into place as values.
The Difference Between Trim and Clean in Power Query
The main difference between the Trim and Clean functions in Power Query lies in their purposes and applications. Trim focuses on removing leading and trailing spaces from text, a common necessity for ensuring data consistency. On the other hand, Clean is designed to remove non-printable characters from text data, which are often invisible.
How to Clean Data in Power Query
To remove invisible, special characters like carriage returns and new lines in Power Query, right click the column header, select Transform, and Clean.
As you can see from the example dataset below, we applied a trim function remove the whitespace around the values that had it. However, it left the word Sugar Cookie split into two separate lines.
While Trim is essential for addressing visible whitespace issues, Clean is crucial for tackling hidden data integrity problems caused by non-printable characters. The two functions are often used together and compliment each other.
How to Replace Special Characters in Power Query
For more control over the data prep process than using Trim or Clean, you can right click on a column header in Power Query and select Replace Values. Expand Advanced Options and select a special character to replace. Options exist for Tab, Carriage Return, Line Feed, and Non-breaking Space.
To remove the special characters from a dataset, use the Insert Special Character as the Value to Find and then leave the Replace With field empty. When it’s empty it will find and delete the especial character from the selected column.
You can also replace values from a single column, multiple columns or an entire dataset all at once using this method.
Trim and Clean Data Using Power Query M
Power Query M is the formula language of Power Query. You can use formulas when creating custom columns or adjusting queries using the Advanced Editor. The syntax to trim text is: Text.Trim(Text) while the syntax for cleaning is: Text.Clean(Text) where the parenthesis contains the column name that you’re applying the formula to.
Sometimes using a formula to apply a trim or clean can give you more control over a process, or combined with conditional logic when you don’t necessarily want to apply the same function to a full column at one time.
Trimming and cleaning data in Power Query is essential for any data professional looking to ensure the accuracy and reliability of their analyses. Each one of the operations has the capability of causing a merge to fail, or for incorrect analysis to occur.
If you have low confidence in the data coming into Power Query, it may be worthwhile to apply a Trim and Clean function on a full dataset as an early data transformation step even if it’s not readily apparent that it’s needed. When Power Query is based on reports that are manually prepared it’s pretty common to have extra spaces and typos that people add without even knowing it that can disrupt Power Query. Because of this, we recommend using base reports directly out of computer systems where even if the formatting is bad, it’s consistently bad and can be more easily managed.