Performance Tuning · Power BI

Why Load Testing Power BI Is Important

If you’re about to put a big Power BI project into production, have you planned to do any load testing? If not, stop whatever you’re doing and read this!

In my job on the Power BI CAT team at Microsoft it’s common for us to be called in to deal with poorly performing reports. Often, these performance problems only become apparent after the reports have gone live: performance was fine when it was just one developer testing it, but as soon as multiple end-users start running reports they complain about slowness and timeouts. At this point it’s much harder to change or fix anything because time is short, everyone’s panicking and blaming each other, and the users are trying to do their jobs with a solution that isn’t fit for purpose. Of course if the developers had done some load testing then these problems would have been picked up much earlier.

What is the cause of the problem? As the saying goes, the cloud is just somebody else’s computer and when you publish your datasets to the Power BI Service they run (at least at the time of writing) on a single physical node – which of course has a finite amount of memory and CPU. If you have too many users trying to run reports at the same time then that memory and CPU will be exhausted and performance will suffer. This is a problem that the new dataset scale-out feature for Premium, currently in preview, will partially solve by creating multiple replicas of your dataset that run on separate physical nodes in the Power BI Service. I say “partially solve” because there is another constraint: the amount of memory and CPU that is available for use in Shared capacity (commonly called Pro) or with Premium Per User, neither of which are fully documented, or that you have bought with your Premium capacity. If you try to use more memory than you’re allowed you’ll get errors; if you try to use more CPU than is available you’ll be throttled. It’s worth taking some time reading the Premium documentation to understand how limits on CPU usage are calculated and enforced.

As a result of all this it’s important to check that normal production usage of your dataset does not result in any of these limits being hit. These limits are very generous and in the vast majority of cases your reports will perform well without you doing anything, but if you’re working with large data volumes, complex calculations, reports with many visuals and/or thousands of users you can’t assume that everything will be ok. This applies whether you’re building an internal BI solution or using Power BI Embedded to do B2B reporting; similar questions are raised when you’re trying to work out how much Premium capacity you should buy to support a large number of small self-service solutions, or how much Premium capacity is needed when you’re migrating from Azure Analysis Services. If you’re using DirectQuery mode then you might be surprised at how much CPU still gets used on your Premium capacity, and of course you have the additional task of ensuring your data source can handle all the queries Power BI needs to run on it; if you’re connecting to your data source via an on-premises data gateway then you need to ensure the machine your gateway is running on is properly specified too.

How can you do load testing with Power BI? There are a few options for automated load testing, but the most widely used is the Realistic Load Test Tool (see also my colleague Sergei Gundorov’s simplified version here) and Kristyna Hughes has a great post on how to use it here. If automated testing sounds too complex then you can always round up ten or so people and get them to run through a series of interactions on the reports; this won’t give you results you can compare reliably from test to test, but it will tell you whether you have a problem or not. While running a load test you should measure report rendering performance on the front-end – the Realistic Load Testing Tool will capture this information for you – and if you’re using Premium you should also capture information on the back-end either by running a Profiler trace or enabling Log Analytics on your workspace, so you can get accurate DAX query timings. The Premium Capacity Metrics App will tell you how close you’re coming to the limits of your Premium capacity.

One mistake that people often make when load testing Power BI is over-estimating how many concurrent users they’ll have. Even with an enterprise BI solution it’s relatively rare to have more than ten people running queries at exactly the same time. The most concurrent users I’ve ever seen is at a large retail customer where, every morning at 8:30am, staff in every store had a meeting where they all looked at the same Power BI report and even then there were no more than 2-300 concurrent users.

What can you do if your load testing shows you have a problem? The first thing is to do some tuning – and remember, it’s more important to tune your queries to reduce CPU Time than to make them faster. Look at the way your data is modelled, make your DAX calculations more efficient, reduce the number of visuals in your report and so on. Once you’ve done that, if you’re using Shared (Pro) then you should consider buying PPU or a Premium capacity. If you’re already using Premium you should consider testing the scale-out feature mentioned above and either moving up to a larger SKU or enabling autoscale.

9 thoughts on “Why Load Testing Power BI Is Important

  1. Thanks for the article.

    You mentioned the new “dataset scale-out” feature for Premium, currently in preview. That sounds interesting. In the past there have been some other cool features for moving analytical SSAS data around. Eg. from the on-prem SSAS team we used to have scalability via NLB load balancing. Another thing we’ve seen in the past is “offline cubes” that could be hosted locally on the client machine, with a copy of the same data that would otherwise be found on a server.

    It would be awesome if Microsoft could consider options that would “stretch” our PBI datasets down into our local environment. Eg. perhaps a local SSAS server in tabular mode could receive a replica of a dataset from the PBI server. Or perhaps we could even get a Power Pivot data model (that is running in Excel) to receive an approximate replica of our dataset that lives in the cloud. That would be a pretty slick experience!

    Whereas the PBI service is already capable of servicing pivot tables (Excel), it isn’t really the “snappy” experience that it used to be when SSAS was running on dedicated hardware, and was less than 1 ms away from the local Excel client. I’d guess most PBI datasets are a long way away (30 ms) from the Excel users, and the experience of pivot tables always suffers as a result. No matter how much improvement is made in the PBI infrastructure, it won’t ever eliminate the latency issues that occur when a complex Excel workbook is making lots of round-trips to a PBI server.

    Our power users would definitely run their datasets locally in the PowerPivot engine, if such an option was ever supported. There would be overhead in synch’ing to the server, but it is a one-time (daily) penalty that users would find acceptable….

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      I don’t think we’ll ever consider a local sync option. To be honest, though, while I have seen a few cases where latency affects query performance it’s not a factor most of the time in my experience.

  2. Would be nice if you could discuss the impact of query caching on the load testing. In a real world scenario as you mentioned with the retail customer – does it really matter how many queries you get if they are all the same and caching is enabled?

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      You’re right – there’s no point running the same queries over and over when you’re doing load testing. The load testing tool allows you to pass different parameters to the report you’re testing with.

  3. We work with shared datasets connected to Databricks using Import mode but we are currently having some trouble when users decide to use external tables (Excel, Sharepoint) and relate them with data on the dataset. These tables have ~50k rows.

    Once the dataset is transformed from a live connection to a composite model, we identify a high use on the Power BI CPU and constant errors on queries/visuals when combining data from the dataset + data from the external tables.

    Do you have any articles, documentations or tips on how to perform better or something we could check to enhance the performance with Composite Model?

    Thanks, Chris.

  4. Hey Chris,

    The scale-out currently in preview looks great. We do have quite a few concurrent users running queries at an identical time.
    We refresh using XMLA and REST, it would be nice in GA to avoid having to sync manually.
    The fact it’s no extra cost is great 🙂

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.