Microsoft gives users a number of ways to parse, extract and delimit text using Power Query in Power BI and Excel. We explain the different options users have when working with text along with when you would use each one.
Text manipulation is an indispensable tool to keep in your Power BI tool belt. Data comes in all shapes and sizes. Power BI lets users interact with it and parse it in a number of ways to prep your data for providing meaningful insights.
Power BI Users can leverage the “Split Column” feature to split text into columns using defined delimiters. The “Extract” feature can be used to pull text from between two delimiters. The “Column by Example” function allows you to provide Power BI with sample output and extract text with zero manual setup.
In our examples today, we are going to use Power BI.
Many of the concepts will be similar in Microsoft Excel Power Query there could be some differences in button locations and exact methodologies.
Table of Contents
Start by Launching the Power Query Editor
The first step to splitting, delimiting, or extracting text in Power BI is launching the Power Query editor. Select the table that you wish to work with and you can open Power Query 1 of two ways.
Option 1.) Right Click the Table Name and Select Edit Query
Option 2.) Click Home on the Power BI Ribbon, Select Transform Data, and Transform Data again.
This launches the Power Query Editor.
Power Query is Microsoft’s data preparation tool that’s embedded into a number of their products like Power BI and Excel. It lets users connect to various data sources, blend them, and prep them for further analysis and visualization.
How to Split and Delimit Columns in Power BI
From the Power Query Editor navigate to the Split Column button located in the middle of the Home Tab.
From here, you have a number of delimiting options to split a single column into multiple other columns.
Note: Most tasks in Power Query can be accessed by right clicking on a column name, if you prefer to avoid going to the ribbon each time. Right clicking a column is typically faster.
Once you’ve chosen to split a column there are a number of options available to you.
Split a Power BI Column by Delimiter
Splitting columns a delimiter is the most similar function to Microsoft Excel’s text to columns button. It allows you define a specific character and separate out portions of a single column into new columns based on it.
An advantage of Power Query is that it gives you many more options than what is normally available in Excel.
Rather than splitting on each occurrence of a delimiter, you can tell Power Query to split on the first occurrence or the last occurrence of it.
Split a Power BI Column with a Custom Delimiter
Custom delimiters are especially useful when working with specific datasets.
Some of the scenarios that custom delimiters come in useful for are:
- Double Spaces
- Carraige Return (New Lines)
- Line Feed
- Any Combination of Multiple Letters and Characters
To delimit on any of these Select or Enter Delimiter Custom
The box below it allows you to type in multiple characters such as double spaces as the identifier to split columns or any other combination of characters.
Some special characters are already defined for you, by selecting Insert Special Characters a dropdown box will be activated.
Split a Power BI Column into Rows
If you are working with data that has everything concatenated into a single column but need to split a column or cell into multiple rows, you can do so with Split Column by Delimiter.
Under Advanced Options, Select Split into Rows
Rather than creating additional columns to work with that you may have to pivot or unpivot, Power BI will create a new row for each of the different values in the dataset.
The example below is delimited by comma.
Note: A benefit of delimiting to rows is that large datasets can be easier to work with. It’s easier to work with 100 rows than it would be to interact with that many ore more individual columns. You can always pivot the data later if they need to be switched back to columns.
Delimiting a Power BI Column Based on Number of Characters
You can split a column in Power Query based on the number of characters or position of each characters.
This is especially useful when working with legacy systems or reports. SAP, Oracle and IBM (just to name a few) all have systems that will generate reports that output to a CSV file and have to be delimited based on the number of characters in a row instead of having a specific delimiting character to key off of.
To handle these Microsoft Power Query offers two options.
- By Number of Characters – Delimits based on an equal number of characters
- By Position – Delimits based on different number of characters for each column
When using by number of characters Power Query assumes that it is an equal distance to split in the data. For example, split into a new column every 20 characters.
Splitting by Position will create new columns based on different numbers of characters. You must define each record position separated by a comma.
Here’s an example
This will split a column on the 23rd character, the 55th character and 67th character when counting from left to right.
Note: Unless the report you output provides a map or explanation you may need to do some trial and error. We recommend limiting the dataset to a handful of rows at first to ensure that they work prior to extending to a larger dataset.
How to Extract Specific Text From a Column in Power BI
To extract specific text from a column in Power BI, use the Power Query Extract feature. The extract function allows users to isolate a specific portion of text if it is surrounded by delimiters.
Extract is available in Power Query under Transform, Extract
Along with the ability to separate out specific subsections of text from a string, you can also use the extract function to only pull in a set number of characters such as the Left and Right functions of Excel or set it up to be more dynamic and pull in letters up until a delimiter.
In our example, we will use a windows file path.
Our goal is to extract only the name of the file without the extension.
While it’s possible to hard code the text extraction, we want it to be dynamic in case it is moved into another folder at a future point in time.
To Extract Numbers, you have to define where to start and end the extraction. These are defined with specific characters.
Open the Advanced Options
The advanced options tell Power BI which direction to look when searching for the start and end delimiters. You can tell it to start at the end of the string, or from the beginning. Then once it finds the first delimiter, which direction it should go to search for the ending delimiter.
Quickly Extract Text Between Delimiters in Power BI with Columns from Examples
We love using the Column From Examples feature of Power BI. The feature samples the data in a column, and will attempt to create a column based on examples of what you want the end result to look like. It’s available under Add Column, Column From Examples. By using this feature you can avoid writing Power Query or setting up complex delimiting steps.
If we use our example of extracting text above, we will now use Column From Example on a similar column of data.
The interface is a bit odd.
Once you launch it, look to the right. There is an additional set of columns that you can type a text pattern into.
Power BI will allow you to type in as many “hints” as you need to in additional rows. The more data that you provide Power BI the better it will do at creating a formula to extract the data for you.
You will see Column From Example auto-complete additional rows. When the answers seem like they are doing what you want, click okay and Power BI will generate the code for you to create a column with the extracted text.
There are many great ways to use Columns from Examples. Check out this video to learn about some additional interesting use cases to help you speed up your development.
Does Power BI Support RegEx?
Power BI does not support Regular Expressions or RegEx. Regular Expressions are part of the Pearl programming language. They are used to define complex text patterns and that can later be parsed or replaced.
As a substitute for RegEx, users can try to replicate some of the functionality by combining the Extract and Delimit functions of Power BI. It may take multiple steps to delimit, split and extract columns, but many times a similar result can be achieved although in more steps.
Another option is to delimit and split text using the Power Query M language.
Manipulating Text with Power Query M
For those that want to type in the code to extract text in Power BI manually. Create a Custom Column in the Power Query Editor. From there, you can type in a formula to extract or manipulate text as needed.
Below is a compiled list of Power Query M functions for text along with example formulas for a column named “CookieNames”
|Text Manipulation||Text.Length||Returns the number of characters in a text value.|
|Text Manipulation||Text.Start||Returns the first specified number of characters from a text value.|
|Text Manipulation||Text.End||Returns the last specified number of characters from a text value.|
|Text Manipulation||Text.Middle||Returns a string that is the specified part of a text value.|
|Text Manipulation||Text.Trim||Removes all leading and trailing white-space characters from a text value.|
|Text Manipulation||Text.Lower||Converts all alphabetic characters in a text value to lowercase.|
|Text Manipulation||Text.Upper||Converts all alphabetic characters in a text value to uppercase.|
|Text Manipulation||Text.Proper||Converts a text value to proper case, or title case.|
|Text Manipulation||Text.Replace||Replaces all occurrences of a substring within a text value.|
|Text Manipulation||Text.PadStart||Pads the start of a text value with a specified character until the string reaches a specified length.|
|Text Manipulation||Text.PadEnd||Pads the end of a text value with a specified character until the string reaches a specified length.|
|Text Manipulation||Text.Clean||Returns a text value with non-printable characters removed.|
|Text Manipulation||Text.Combine||Combines a list of text values into a single text value.|
|Text Manipulation||Text.Split||Splits a text value into a list of text values based on a separator.|
|Text Manipulation||Text.ToList||Converts a text value into a list of text values, where each value corresponds to a character.|
|Text Manipulation||Text.From||Converts a value of any primitive type into a text value.|
|Text Manipulation||Text.Contains||Returns a logical (true/false) value indicating whether a text value contains a specified substring.|
|Text Manipulation||Text.StartsWith||Returns a logical (true/false) value indicating whether a text value starts with a specified substring.|
|Text Manipulation||Text.EndsWith||Returns a logical (true/false) value indicating whether a text value ends with a specified substring.|
Power BI offers a number of ways to delimit, split, extract and work with text. For an easy way to extract text from the middle of a string, try creating a custom column from an example. If it works, you will save a lot of time. If it is not granular enough, try using extract formulas. For ultimate flexibility users can resort to using Power Query M and write custom code themselves or ask for the help of ChatGPT.