OneLake is Microsoft’s big data solution that removes the technological barriers that have kept advanced data storage and sharing reserved for the most technical of data engineers. With OneLake, you can share data at scale as easily as dragging and dropping files into OneDrive.
This makes it possible for anyone in your organization to access and use data, regardless of their technical expertise.
OneLake uses a combination of Azure Data Lake Storage Gen 2 and combines it with an easy-to-use desktop app that most business users will be familiar with.
There are a number of benefits to this approach that we’ll dive into.
But first, let’s explain at a really high level what the terms are that we’re talking about.
What is a Data Lake vs Data Warehouse vs Database?
It’s important to note the differences between databases, warehouses, lakes and lake houses to fully appreciate the value that Microsoft OneLake will bring to the Power BI ecosystem.
The following is a brief breakdown of the differences between the major data storage storage types used in today’s modern business intelligence landscape.
- SQL Database: A SQL (Structured Query Language) database is a type of database that allows data to be stored in a structured format. It’s typically used for OLTP (Online Transaction Processing) operations. It requires data to be organized in a predefined schema – tables, rows, columns, data types, relationships, etc. Examples include MySQL, Oracle Database, Microsoft SQL Server, PostgreSQL, etc.
- Data Warehouse: A data warehouse is a large, centralized repository of data that can come from various sources. It’s used for OLAP (Online Analytical Processing) operations, like data analysis and reporting. Data warehouses are structured, schema-on-write systems that transform and load the data into a format ready for analysis. They are highly optimized for complex queries over large volumes of data. Examples include Amazon Redshift, Google BigQuery, and Microsoft Azure SQL Data Warehouse.
- Data Lake: A data lake is a vast pool of raw data, the purpose of which is to store all types of data (structured, semi-structured, and unstructured) in its native format until it’s needed. Unlike a data warehouse, data lakes employ a schema-on-read approach, which means data doesn’t have to be pre-processed before it’s stored. Examples include Amazon S3, Google Cloud Storage, and Microsoft Azure Data Lake Storage.
- Data Lakehouse: A data lakehouse is a new paradigm that combines the best features of data lakes and data warehouses. It aims to provide the low-cost scalable storage of a data lake, the performance and strong schema of a data warehouse, and the low-latency streaming capabilities of recent data systems. It provides support for all kinds of data (structured, semi-structured, unstructured) and uses a schema-on-read and schema-on-write approach. Examples include Databricks Delta Lake and Apache Hudi.
A comparison of data storage methods:
SQL Database | Data Warehouse | Data Lake | Data Lakehouse | |
---|---|---|---|---|
Data Types | Structured | Structured | All types | All types |
Schema | Predefined (schema-on-write) | Predefined (schema-on-write) | On demand (schema-on-read) | Both (schema-on-read and schema-on-write) |
Storage | Moderate | Large | Massive | Massive |
Cost | Varies | High | Low | Varies |
Query Performance | High (for transactional operations) | High (for complex analytical queries) | Varies | High (optimized for both analytical and transactional operations) |
Data latency | Low | High | Varies | Low |
Suitable Use Case | Transaction Processing, Traditional Applications | Business Reporting, BI, Analytics | Big Data, Machine Learning, AI | All-purpose, both OLTP an |
As you can see, there are different use cases, pros and cons for each type of data storage. Within each technology, you could dig even further and see the nuance of how these different architectures are specifically setup and what the impact is on performance.
The latest technology is the advent of the Data Lake and Data Lakehouse.
These are the technologies that Microsoft aims to make more approachable with OneLake.
What is Microsoft Fabric?
Microsoft Fabric is a new unified analytics platform that brings together all the data and analytics tools that organizations need. It integrates technologies like Azure Data Factory, Azure Synapse Analytics, and Power BI into a single unified product, empowering data and business professionals alike to unlock the potential of their data and lay the foundation for the era of AI.
Microsoft Fabric will enable a number of new features for Power BI users.
- Improved performance and scalability: Fabric will help to improve the performance and scalability of Power BI reports and dashboards by providing a single platform for managing data and workloads.
- Enhanced security and governance: Fabric will help to enhance the security and governance of Power BI solutions by providing a centralized platform for managing security policies and permissions.
- Automated delivery of reports and dashboards: Fabric will help to automate the delivery of Power BI reports and dashboards, ensuring that users always have access to the latest data.
- Integration with other data sources and applications: Fabric will help to integrate Power BI with other data sources and applications, extending the reach of Power BI and making it easier to share data insights across the organization.
Fabric is one of the most significant upgrades for Power BI in recent history. It will enable new functionality, and setup the infrastructure for a future where data lakes are the most common way to store data.
It will also enable Power BI to have more cloud functionality, like Power BI for the cloud.
We also expect a big part of this move is to make it easier to share data between systems, so that Power BI developers can more easily integrate AI features like ChatGPT to answer questions about business data.
What is Microsoft OneLake?
Microsoft OneLake was announced as part of Microsoft Fabric.
OneLake is Microsoft’s solution for making it incredibly easy for business users to add data into a Data Lake.
Similar to how you can drag and drop files from your desktop computer to One Drive, and click a Share button to send it to other people within your company, you will be able to do the same thing with a large amount of data.
Just as OneDrive revolutionized file sharing by making it super simple, OneLake will take a similar approach to making it really easy for business users to share large amounts of data, like databases with other users.
Think of Microsoft OneLake as your organization’s OneDrive, but on steroids, designed specifically to handle large amounts of data for the entire organization. Similar to how OneDrive acts as a centralized repository for your personal files, OneLake serves as a comprehensive platform to store, manage, and analyze your organization’s data.
The following video also does a great overview of explaining some of the benefits and features coming with Microsoft Fabric and OneLake
The Advantages of Microsoft OneLake
- Centralized Management: Like having your personal files neatly organized in OneDrive, OneLake streamlines data management across the board. It offers a single, unified platform that simplifies tracking, securing, and governing your data.
- Optimized Performance: While OneDrive facilitates easy access to your files, OneLake takes a leap forward. It significantly improves data analysis performance with its centralized data store, further amplified by a suite of optimization techniques.
- Enhanced Security: Your files are safe in OneDrive, but with OneLake, your data’s security reaches a new zenith. It offers robust encryption, stringent access control, and detailed auditing.
- Scalability: Your data needs grow with your organization. Hence, OneLake, much like OneDrive, ensures scalability to meet the demands of organizations of all sizes.
Microsoft OneLake and OneDrive: Similarities and Differences
Although Microsoft OneLake and OneDrive share certain similarities, like being cloud-based platforms that offer a range of features for managing and accessing data from anywhere, they serve different purposes and offer distinct capabilities.
While OneDrive is a general-purpose file storage platform, OneLake is specially designed for managing data lakes. It provides features not available in OneDrive, such as comprehensive data management, superior performance optimization, and advanced security measures.
How to Use OneLake?
If your organization already has a data lake setup, or an instance of Azure Data Lake Storage Gen 2, then you can already begin testing out OneLake.
The interface is very underwhelming at the moment, and that’s the best feature!
OneLake integrates with Windows Explorer and looks the same as OneDrive does when it’s installed on a PC.
The difference is that when you drag and drop files into a OneLake folder they are stored in a datalake and can either be ingested into a data model or made accessible immediately to folders shared with a Power BI workspace.
This eliminates the need to upload files to SharePoint, OneDrive or in many cases you could even skip using a PowerBI gateway if your organization allows you to drag and drop files onto the cloud.
If you’d like to give it a try, you can download OneLake File Explorer can currently be downloaded from Microsoft.
Conclusion
As heavy users of PowerBI we are excited that Microsoft is making it so easy to add files to data lakes. It seems on the surface like an incremental upgrade from OneDrive, but the simplicity is the key benefit of it.
Without having direct integration to drag and drop files, you would typically need to setup jobs to pull files from FTP, or log into an instance of Azure, navigate around a bunch of screens, or send requests to a data engineer to build a pipeline to load files in that you would then have to wait for as a business user.
We’re excited to see Microsoft making a big step into the future with this new solution and keep it easy to use for the average user!