Since the introduction of Power BI Data Flows Gen 2 as part of the Microsoft Fabric Preview there has been a lot of confusion around what they are and when to use this exciting new feature. The easiest way to understand Power BI Data Flows Gen2 is to think about it as a modern cloud first implementation of Power Query.
Power BI developers can create new Data Flows as part of the Data Factory section of new Microsoft Fabric Features. Over 158 different connectors are currently supported to bring in files of different types whether they are on-premises or on a cloud service. Once the data is imported, users can leverage an interface that should look very familiar if they’re used to editing and transforming data with Power Query in Power BI.
Let’s look at how to use Data Flows and some of the unique features that it offers compared to Power BI Desktop.
How to Create a New Power BI Gen 2 Data Flow
To create a Power BI Data Flow Gen 2, first log into PowerBI.com rather than starting in Power BI Desktop. Navigate to+ Create, See All, Data Factory, and Select Power BI Data Flow Gen 2.
After selecting Dataflow Gen2 you’ll be prompted to select a data source. In our example we’ll use Excel but there are many different sources to choose from.
Once the files are imported, the experience will look very similar to Power Query for Desktop. You can right click on column headers to make changes or navigate to View and turn on Diagram View. It’s incredibly useful when working with multiple data sources to visually see how your incoming data is being transformed.
The Diagram View is only available on the Power BI Service
Many of the screens look very similar to Power BI Desktop, but they have been improved with new visuals, and some of the wording has been updated to be easier to understand if you don’t work in Power BI as often as you work in SQL.
Tip: You can also drag and drop files from your Windows Desktop computer and add them to Dataflows Gen2
As you can see there is a lot of familiar functionality with some welcome refinements and cool new tricks for the Power Query faithful. Even with all of this, power users can still write and edit Power Query M for custom data transformations and advanced modeling.
How and Where to Publish a Power BI Dataflow
Once you’ve setup a dataflow, you have two options. You can either publish it which makes the Dataflow available as a data source that you can connect to using Power BI Desktop, or you can publish it to another part of the Microsoft Fabric such as an Azure SQL Database, a Data Warehouse or Lakehouse.
After your last data transformation, click on the Add Data Destination button at the top right corner of applied steps.
You’ll then be prompted to setup a connection to a data source you can write to.
There are a several key take aways from the new Dataflow Gen2 Experience in Power BI.
- Developers will be able to either publish a dataflow then connect to it from Power BI Desktop. Even though you can create dashboards using PowerBI.com there does not seem to be a way to start building dashboards directly from a Dataflow at this time. We expect it will come in the future.
- Dataflows can write to common data sources like SQL, data lakes, data warehouses making the Power Query experience easy to use but much more powerful.
Patrick from Guy in a Cube also does a great overview of Dataflows Gen2 and goes into even more detail about the benefits of being able to output data into multiple systems.
Schedule Automatic Refreshes of Dataflows Gen2
After creating a Power BI Gen2 Dataflow, you have the same abilities to schedule automatic refreshes as you would when working directly with a Power BI Data Source. Navigate to the workspace where the dataflow is saved, and click on the schedule refresh button when hovering your mouse over the dataflow name.
From there you can turn on a refresh schedule, assign a time and frequency.
These settings will look identical if you’ve scheduled a refresh of a data source.
Creating Dashboards from Dataflows Gen2
The only way to create a dashboard and visualizations with a Power BI Dataflow is to connect to it from Power BI Desktop. Navigate to Home, Get Data, Dataflows and select the newly created dataflow. From there you can work with it the same as you would any other Power BI Dataset.
We’re assuming that it’s only a matter of time until you can create a dashboard directly in PowerBI.com from a dataflow but the experience is not currently available.
So, what is the point of Dataflows Gen2?
Dataflows are likely going to be a big part of replacing Power BI Desktop at some point in the future. Most of the features in Power BI Desktop are already in the Power BI Service under different names. We’re assuming it’s only a matter of time until Microsoft combines them into an easier to use experience and stop supporting Power BI Desktop altogether.
This is just speculation and likely years in the future. Microsoft has an excellent track record of supporting legacy software well beyond the time that most people have already moved on from using it.
Dataflows make the entire ETL, Extract Transform and Load functions of working with data much easier than would be possible using SQL and does not require users to have access to an Azure tenant to work in Data Factory. You can use the same drag and drop style interface of Power Query with much larger, much more complex datasets in a cloud first environment.
After a dataflow is created, you can either publish it and use it to create a dashboard, or you can use the same Dataflow to output data to a data warehouse, SQL database, or Lakehouse making it easier to transform data once and re-use it across an entire organization in a platform agnostic way.
Dataflow Gen1 vs Gen2
If you’ve already bee using Power BI dataflows, Microsoft provides a feature overview comparing Gen1 vs Gen2. It’s important to note that Dataflow Gen2 is currently in preview and features are subject to change without notice.
While many new features have been added to Gen2 there are some notable omissions at this point in time, like the lack of Incremental Refresh, AI Insights and Direct Query support.
The screenshot below shows a Feature overview from the Microsoft Learn website as of 10/16/2023.
Hopefully the old features are added in the near future. Microsoft publishes monthly releases of Power BI and is constantly adding new features.
Power BI Dataflows Gen2 is a modern interpretation of Power Query designed for the Cloud. The interface will be familiar to anybody that’s already using Power BI and creating data models in Power Query. It is also an early step at future proofing the Power BI Platform as increasingly more data becomes available in the cloud.
Microsoft has a number of data solutions available as part of Microsoft Azure cloud services, but they have historically been walled off from Power BI users. One of the major benefits of Microsoft Fabric is the integration of advanced data engineering tools of Azure with the easy to use interface of Power BI workspaces. By placing all of the tools under one umbrella and one place they’re available for less technical users to take advantage of.
At the moment, Power BI dataflows Gen2 might be somewhat underwhelming but it’s important to remember that this is the starting point of the next generation of Power BI.