Power BI can handle large data volumes, but just how much data can you load into Power BI? Can Power BI handle big data? How big is “big” anyway? These are questions you may have when you’re starting out on a new Power BI project and the answers can be hard to find. Indeed the answer in most cases is “it depends”, which isn’t very helpful. In this post I will try to explain the various limits on Power BI dataset size and how you can know if you’re likely to hit them.
Power BI storage modes
First of all, what does it even mean to “load data into” Power BI? There are two fundamental ways that Power BI can work with data: Import mode and DirectQuery mode. Different tables in your Power BI dataset can have different storage modes; there’s also an option called Dual mode where a table can switch between Import mode and DirectQuery mode depending on the circumstances.
In DirectQuery mode no data is stored in your Power BI dataset and when your report renders and queries are sent to the dataset, the dataset sends queries back to the data source to get data on demand. This means the only limits on the amount of data you can work with are the limits set by your datasource.
In Import mode Power BI stores a copy of your data inside the dataset in its own internal database engine, known as either the Vertipaq engine or the Analysis Services engine. Import mode is the default storage mode and for good reason – it will almost always give you the best performance for your reports and it allows you to use all Power BI functionality. The “how much data?” question is only relevant to Import mode because when you use it Power BI imposes various limits on the amount of data that can be stored inside a dataset.
How big is your dataset?
In order to answer the question of “how much data can you load into Power BI?” you need to know how to measure the size of your dataset. There are various different ways of doing this but the best way is to install a free, community-developed tool called DAX Studio: its Model Metrics feature shows you the total size of your dataset on the Summary tab:
This figure is good enough for most purposes. It’s not quite that simple though, and this video by Marco Russo and Patrick Leblanc has a great explanation of all the different ways of measuring dataset size and what the figures really represent.
Can you predict how large your dataset is if you know how much data you have in your data source?
You have to import all your data into your dataset to find out how large your dataset is. But can you work out how large your dataset will be before you do this? The answer is no. You can be pretty sure that the size of your dataset will be smaller than the size your data is in your data source but it could only be a bit smaller or it could be a lot smaller, maybe even just 10-20% of the original size. This is because Power BI compresses your data very efficiently when you import it. What’s more, how you model your data can have a big impact on how well the compression works and making a few changes can result in a much smaller dataset as, for example, this post by Nikola Ilic shows.
How much data can you load into Power BI Desktop?
The only practical limit on the amount of data you can load into Power BI Desktop is the amount of memory you have on your PC; you’ll need at least 16GB of RAM, ideally 32GB, to get the best experience. However Power BI Desktop is just a development tool – you’ll need to publish to the Power BI Service for other people to view your report and that’s where the limits take effect. What’s more there’s a 10GB limit on the size of a dataset you can publish to the Power BI Service from Power BI Desktop, although you can have datasets much larger than that in Power BI Premium.
To be honest you should never be working with anything like 10GB of data in Power BI Desktop anyway: the file size will be huge, saving will be slow and you’ll spend a long time waiting for data to import while you’re development. What you should do is work with a small subset of your data in Desktop and only load the full volume after you’ve published. You can do this in a few different ways, for example using incremental refresh or by using M parameters to limit the amount of data you load in Desktop and then changing the parameter after publishing, as I showed here.
How much data can you load in the Power BI Service if you’re using Shared capacity (aka “Power BI Pro”)?
If you are not using Power BI Premium or Premium Per User, when you publish to the Power BI Service you are using Shared capacity (often called “Power BI Pro” by Power BI users because you only need a Power BI Pro licence to use it). The maximum size of a dataset in Shared capacity is 1GB; if you hit that limit you’ll get the “Unable to save the changes” error I blogged about here. There is one exception though: as mentioned here, if you use the rather obscure option in the Power BI Service to upload an Excel workbook with data in the Excel Data Model/Power Pivot, then the resulting dataset is limited to 250MB.
How much data can you load in the Power BI Service if you’re using Power BI Premium or Premium Per User (PPU)?
The default maximum dataset size in a Power BI Premium capacity or PPU is 10GB, but if you turn on the Large Dataset storage format you can have datasets larger than that and the maximum size depends on the amount of memory available in your Premium capacity. The “Max memory per dataset” column in the table here shows the maximum amount of memory available in each Premium or Power BI Embedded capacity SKU to an individual dataset; the maximum amount of memory available per dataset in PPU is 100GB. However the maximum amount of memory available to a dataset is not the maximum size of a dataset: apart from the memory used to store the data for a dataset, more memory will be needed when the dataset is queried or refreshed. If you do a full refresh of a dataset it may require almost double the amount of memory needed to store the dataset itself; incremental refresh may require less memory. You can use the Premium Capacity Metrics App to see how your dataset’s memory usage changes over time.
Other Import mode limits
There are a few other limitations of Import mode that should be mentioned. As documented here you can only have 1,999,999,997 distinct values in a single column and there is a limit of 16,000 on the combined total number of tables and columns in those tables – but if you have that many tables and columns you have definitely made a mistake with how you model your data. Also, some DAX functions such as Median() only work on tables with less than 2 billion rows in them, as I blogged here.
Power BI can load all my data, but will my reports be fast enough to use?
Loading all your data into Power BI is one thing but what really matters is whether the reports you build are fast enough to use. In my experience performance problems relating to data volume are not as common as performance problems caused by inefficient DAX in measures or poor modelling decisions, although they do occur – for example distinct counts on columns with a very large number of unique values can be slow. If you’re following best practices you’re unlikely to encounter them unless you’re working with the kind of data volumes that require Premium. DAX Studio’s Server Timings feature can help you understand why your queries are slow and whether data volumes are an issue.
This is a lot to take in – what does it all mean?
Let me finish up by making a few broad generalisations about how much data you can load into Power BI Import mode. Assuming you have followed all the best practices around modelling data you should be able to work with tables with up to a few million rows, probably tens of millions of rows, in Shared capacity and tables with up to a few billion rows in Premium. If your data source is Excel then Power BI can definitely handle the amount of data you have; if you are working with a relational database like SQL Server then it’s still very likely Import mode will work; even if you’re working with hundreds of gigabytes of data or more in a source like Azure Synapse, Snowflake or BigQuery then Import mode may still be ok. You will need to know Power BI very well to get good performance in Import mode with the largest data volumes but it is certainly possible and I know of several customers that are doing it.