Memory Overhead Of Distinct Count Measures In Power BI

The series of blog posts I wrote last year on semantic model memory usage, in particular this post on the query memory limit and the “This query uses more memory than the configured limit” error in Power BI, gets a lot of traffic. Since writing that post on the query memory limit I’ve written a few follow-ups on common mistakes that lead to increased query memory usage, such as this one on measures that never return a blank. Today’s post is sort of in that series but it isn’t about a design mistake – it’s just to point out that distinct count measures can be surprisingly memory-hungry.

To illustrate this I built a semantic model consisting of a single table with two columns and 99,999,000 rows, published it and ensured the Large Semantic Model format was enabled:

I created two measures:

Number Of Rows = COUNTROWS('MyTable')
Distinct Customers = DISTINCTCOUNT(MyTable[CustomerID])

Here’s what the model metrics looked like in DAX Studio:

The total model size in memory was 255MB.

I then ran the following DAX query to get the number of rows in the table for each of the 1800 dates in the Date column:

EVALUATE
SUMMARIZECOLUMNS(
    MyTable[Date],
    "Number Of Rows", [Number Of Rows]
)

The approximatePeakMemConsumptionKB from the Execution Metrics Profiler event showed that this query consumed 2109KB – so not much.

Then I ran a similar query to get the number of distinct customer IDs by date:

EVALUATE
SUMMARIZECOLUMNS(
    MyTable[Date],
    "Distinct Customers", [Distinct Customers]
)

The approximatePeakMemConsumptionKB metric for this query was 800325KB – so a lot more than the previous query. In fact even though this model was well under the 1GB size limit for a model not in Premium capacity, the query here used a lot more memory (782MB) than the size of the model itself in memory and it came close to the 1GB limit on the amount of memory a query can consume when the model is not in Premium capacity.

Is there something wrong here? Can the query or model be tuned to reduce memory usage? Not really, no – distinct count queries are almost always more memory intensive than other types of measures. I tested a number of different things such as forcing the use of hash encoding on the CustomerID column, partitioning (Phil Seamark suggested creating one partition for each of the 1800 dates and actually that did reduce memory consumption but it also made the queries extremely slow), changing the ordering of the source data to change how well each column was compressed, calculating the distinct count using the SUMX method, and nothing resulted in lower query memory usage.

What I did find for the model above was that the number of rows returned by the query influenced the memory consumption of the query. So reducing the number of dates returned on rows in my DAX query from 1800 to 366 resulted in approximatePeakMemConsumptionKB going down to 200278KB. So if you’re running into memory errors when running queries with distinct count measures the first thing you should ask yourself is whether you need to show so many distinct counts: I recently ran into this problem with a customer that wanted to plot a line chart of distinct values with dates on the x axis, and we solved the problem by only plotting one day per week for the time period shown on the chart instead of every date. The chart looked almost identical, the DAX query was a lot faster and the memory usage of the DAX query was a lot lower. Distinct count measures combined with table visuals with lots of rows can be dangerous.

The other thing you can do is see if you can remodel your data to turn a distinct count into a count because, as shown above, counts are a lot faster and memory efficient than distinct counts. For example, if you have a fact table containing line items for orders and you need to find the distinct count of order ids, then consider creating a second fact table at the order granularity so you can count the number of rows in it to find the number of distinct orders. This may increase the size of your model but it should certainly reduce your query memory consumption for many queries because you won’t need to do a distinct count.

Current Status Of Snowflake Query Tags In Power BI

Since the November 2024 Power BI release blog post announced that queries sent to Snowflake by Power BI include a query tag I’ve had a lot of questions from people who couldn’t see this happening or wanted to know what the query tags contained, so in this blog I thought I would outline the current status.

The query tagging feature for the Power BI Snowflake connector actually didn’t get released in November 2024 and even now, in April 2025, it’s only available for DirectQuery connections and Import mode refreshes that use the V1.0 connector (the V2.0 connector will support query tags soon). Here’s an example of what a query tag looks like for a SQL query generated by Power BI from a DirectQuery semantic model:

{"PowerQuery":true,"Host":"PBI_SemanticModel_MWC","HostContext":"PowerBIPremium-DirectQuery"}

And here’s what a query tag for a SQL query generated for a semantic model refresh:

{"PowerQuery":true,"Host":"PBI_SemanticModel_MWC","HostContext":"PowerBIPremium-Import"}

At the time of writing only SQL queries sent from the Power BI Service contain query tags, not those sent from Power BI Desktop. Also there is no way to customise the contents and unlike SQL queries sent to SQL Server-related sources there is no information on the report or visual that generated the SQL query. In the future some of these limitations may go away.

Fabric Data Agents: Unlocking The Full Power Of DAX For Data Analysis

Now that Fabric Data Agents (what used to be called AI Skills) can use Power BI semantic models as a data source I’ve been spending some time playing around with them, and while I was doing that I realised something – maybe something obvious, but I think still worth writing about. It’s that there are a lot of amazing things you can do in DAX that rarely get done because of the constraints of exposing semantic models through a Power BI report, and because Data Agents generate DAX queries they unlock that hitherto untapped potential for the first time. Up until now I’ve assumed that natural language querying of data in Power BI was something only relatively low-skilled end users (the kind of people who can’t build their own Power BI reports and who struggle with Excel PivotTables) would benefit from; now I think it’s something that will also benefit highly-skilled Power BI data analysts as well. That’s a somewhat vague statement, I know, so let me explain what I mean with an example.

Consider the following semantic model:

There are two dimension tables, Customer and Product, and a fact table called Sales with one measure defined as follows:

Count Of Sales = COUNTROWS('Sales')

There’s one row in the fact table for each sale of a Product to a Customer. Here’s all the data dumped to a table:

So, very simple indeed. Even so there are some common questions that an analyst might want to ask about this data that aren’t easy to answer without some extra measures or modelling – and if you don’t have the skills or time to do this, you’re in trouble. One example is basket analysis type questions like this: which customers bought Apples and also bought Lemons? You can’t easily answer this question with the model as it is in a Power BI report; what you’d need to do is create a disconnected copy of the Product dimension table so that a user can select Apples on the original Product dimension table and select Lemons on this new dimension, and then you’d need to write some DAX to find the customers who bought Apples and Lemons. All very doable but, like I said, needing changes to the model and strong DAX skills.

I published my semantic model to the Service and created a Data Agent that used that model as a source. I added two instructions to the Data Agent:

Always show results as a table, never as bullet points
You can tell customers have bought a product when the Count of Sales measure is greater than 0

The first instruction I added because I got irritated by the way Data Agent shows the results with bullet points rather than as a table. The second probably wasn’t necessary because in most cases Data Agent knew that the Sales table represented a sale of a Product to a Customer, but I added it after one incorrect response just to make that completely clear.

I then asked the Data Agent the following question:

Show me customers who bought apples and who also bought lemons

And I got the correct response:

In this case it solved the problem in two steps, writing a DAX query to get the customers who bought lemons and writing another DAX query to get the customers who bought apples and finding the intersection itself:

At other times I’ve seen it solve the problem more elegantly in a single query and finding the customers who bought apples and lemons using the DAX Intersect() function.

I then asked a similar question:

For customers who bought apples, which other products did they buy?

And again, I got the correct answer:

In this case it ran five separate DAX queries, one for each customer, which I’m not thrilled about but again at other times it solved the problem in a single DAX query more elegantly.

Next I tried to do some ABC analysis:

Group customers into two categories: one that contains all the customers with just one sale, and one that contains all the customers with more than one sale. Show the total count of sales for both categories but do not show individual customer names.

And again I got the correct answer:

I could go on but this post is long enough already. I did get incorrect answers for some prompts and also there were some cases where the Data Agent asked for more details or a simpler question – but that’s what you’d expect. I was pleasantly surprised at how well it worked, especially since I don’t have any previous experience with using AI for data analysis, crafting prompts or anything like that. No complex configuration was required and I didn’t supply any example DAX queries (in fact Data Agents don’t allow you to provide example queries for semantic models yet) or anything like that. What does this all mean though?

I’m not going to argue that your average end user is going to start doing advanced data analysis with semantic models using Data Agents. The results were impressive and while I think Data Agents (and Copilot for that matter) do a pretty good job with simpler problems, I wouldn’t want anyone to blindly trust the results for more advanced problems like these. However if you’re a data analyst who is already competent with DAX and is aware that they always need to verify the results they get from Data Agent, I think this kind of DAX vibe-coding has a lot of value. Imagine you’re a data analyst and you’re asked that question about which products customers who bought apples also bought. You could search the web, probably find this article by the Italians, get scared, spend a few hours digesting it, create a new semantic model with all the extra tables and measures you need, and then finally get the answer you want. Maybe you could try to write a DAX query from scratch that you can run in DAX Studio or DAX Query View, but that requires more skill because no-one blogs about solving problems like this by writing DAX queries. Or you could ask a Data Agent, check the DAX query it spits out to make sure it does what you want, and get your answer much, much faster and easier. I know which option I’d choose.

To finish, let me answer a few likely questions:

Why are you doing this with Fabric Data Agents and not Power BI Copilot?

At the time of writing Data Agents, the Power BI Copilot that you access via the side pane in a report and Power BI Copilot in DAX Query View all have slightly different capabilities. Power BI Copilot in the side pane (what most people think of as Power BI Copilot) couldn’t answer any of these questions when I asked them but I didn’t expect it to because even though it can now create calculations it can still only answer questions that can be answered as a Power BI visual. Copilot in DAX Query View is actually very closely related to the Data Agent’s natural language-to-DAX functionality (in fact at the moment it can see and use more model metadata than Data Agent) and unsurprisingly it did a lot better but the results were still not as good as Data Agent. Expect these differences to go away over time and everything I say here about Data Agents to be equally applicable to Power BI Copilot.

This isn’t anything new or exciting – I see people posting about using AI for data analysis all the time on LinkedIn, Twitter etc. What’s different?

Fair point. I see this type of content all the time too (for example in the Microsoft data community Brian Julius and my colleague Mim always have interesting things to say on this subject) and I was excited to read the recent announcement about Analyst agent in M365 Copilot. But typically people are talking about taking raw data and analysing it in Python or generating SQL queries. What if your data is already in Power BI? If so then DAX is the natural way of analysing it. More importantly there are many advantages to using AI to analyse data via a semantic model: all the joins are predefined, there’s a lot of other rich metadata to improve results, plus all those handy DAX calculations (and one day DAX UDFs) that you’ve defined. You’re much more likely to get reliable results when using AI on top of a semantic model compared to something that generates Python or SQL because a lot more of the hard work has been done in advance.

Is this going to replace Power BI reports?

No, I don’t think this kind of conversational BI is going to replace Power BI reports, paginated reports, Analyze in Excel or any of the other existing ways of interacting with data in Power BI. I think it will be a new way of analysing data in Power BI. And to restate the point I’ve been trying to make in this post: conversational BI will not only empower low-skilled end users, it will also empower data analysts, who may not feel they are true “data scientists” but who do have strong Power BI and DAX skills, to solve more advanced problems like basket analysis or ABC analysis much more easily.

Speed Up Dataflow Publishing/ Validation Times In Power BI And Fabric

If you’re working with slow data sources in Power BI/Fabric dataflows then you’re probably aware that validation (for Gen1 dataflows) or publishing (for Gen2 dataflows) them can sometimes take a long time. If you’re working with very slow data sources then you may run into the 10 minute timeout on validation/publishing that is documented here. For a Gen1 dataflow you’ll see the following error message if you try to save your dataflow and validation takes more than 10 minutes:

Failed to analyze issues in the query

For a Gen2 Dataflow, where you can save the Dataflow and publishing takes place in the background, you’ll see the following error in your workspace:

Dataflow publish failed

Apart from tuning your data source and tuning your queries, what can you do about this? Well one of the things that happens when you publish a dataflow is that it works out the columns returned, and the data types of those columns, for all of the queries in the dataflow. It does this by trying to run the queries until they return data by applying a top 0 row filter to them; if you can make that faster then validation/publishing will be faster. Obviously query folding is important here because that top 0 filter should fold, as are more obscure, source-specific settings like this one for ODBC sources. However, there is another trick that you can use if you are happy writing some moderately complicated M code – the trick I blogged about here for making Power Query in Power BI Desktop faster.

Let’s see an example with Dataflows Gen2. Conside the following M code which returns a table with three columns and is deliberately written to take 11 minutes and 1 second to return (see this post for more details on how to create artificially slow Power Query queries).

let
  Source = Function.InvokeAfter(
    () => 
    #table(
      type table
      [
        #"Number Column"=number,
        #"Text Column"=text,
        #"Date Column"=date
      ],
      {
        {1,"Hello",#date(2016,1,1)},
        {2,"World",#date(2017,12,12)}
      }
      )
    
    , 
    #duration(0, 0, 11, 1)
  )
in
  Source

As you would expect, trying to publish a Gen1 or Gen2 dataflow that uses this query will fail because it takes more than 10 minutes before it returns any rows. However in this case – as in most cases – you know what columns the query returns so it’s possible to use the Table.View M function to intercept the zero-row filter applied during validation/publishing and return a table with no rows in and the columns that the query above returns. You can do this by adding two extra steps in the M code like so:

let
  Source = Function.InvokeAfter(
    () => 
    #table(
      type table
      [
        #"Number Column"=number,
        #"Text Column"=text,
        #"Date Column"=date
      ],
      {
        {1,"Hello",#date(2016,1,1)},
        {2,"World",#date(2017,12,12)}
      }
      )
    
    , 
    #duration(0, 0, 11, 1)
  ),
  TableTypeToReturn = 
    type table
      [
        #"Number Column"=number,
        #"Text Column"=text,
        #"Date Column"=date
      ],
  OverrideZeroRowFilter = Table.View(
    null, 
    [
    GetType = () => 
      TableTypeToReturn, 
    GetRows = () => 
      Source, 
    OnTake = (count as number) => 
      if count = 0 then 
      #table(
        TableTypeToReturn, 
        {}
      ) 
    else 
     Table.FirstN(Source, count)]
  )
in
  OverrideZeroRowFilter

The first step added here, called TableTypeToReturn, defines the columns and data types of the table returned by the query; if you use this technique yourself, you will need to alter it so it returns the columns and data types of your query. You can read more about #table and table types here and I have a function that will automatically generate this code from an existing query for you here. The second step, called OverrideZeroRowFilter, looks for situations where a Top N filter is being applied and if N=0 returns a table of the type defined in the previous step with zero rows. For a more detailed explanation see that original blog post.

This new version of the query validates/publishes immediately, although it still takes 11 minutes and 1 second to refresh. Of course if you use this technique and then change your query so that different columns or data types are returned you have to update the extra code every time, which can be fiddly, but if you’re running into a timeout then you don’t have any choice and even if validation/publishing is slow it’s probably worth the extra effort.

Refreshing Hidden Tables In Power BI Semantic Models With Fabric Data Pipelines

Following on from my recent post about refreshing semantic models with Fabric Data Pipelines and the semantic model refresh activity, a few people asked me how to refresh hidden tables because they are not displayed in the Pipeline configuration UI. I got the answer from my colleague Alex Powers (aka reddit celebrity u/itsnotaboutthecell) who kindly allowed me to blog about it.

To demonstrate how to do this, I created a semantic model with two tables: one visible, called VisibleTable, and one hidden, calledHiddenTable.

I then published the semantic model, created a Data Pipeline and added a semantic model refresh activity; selected the connection, workspace and semantic model; waited for the Table(s) dropdown to populate (yes I know it’s slow, we’re working on it):

…and then, when it loaded, noted that only the visible table was shown in the dropdown:

I didn’t select anything and instead clicked “Add dynamic content” to use an expression to select the table instead:

Then in the Pipeline expression builder I entered the following:

@json('
    [
        {
            "table":"HiddenTable"
        }
    ]
')

Having done this I ran the Pipeline and just the hidden table was refreshed. Easy!

The expression needs to be a JSON array of table and partition names. Here’s an example showing how to refresh the table called HiddenTable and the sole partition of the table called VisibleTable (which also happens to be called VisibleTable) in the same refresh:

@json('
    [
        {
            "table":"HiddenTable"
        },
        {
            "table": "VisibleTable",
            "partition": "VisibleTable"
        }
    ]
')

It’s useful to know how to construct the expression even if you don’t need to refresh hidden tables – for example, you might want to dynamically generate the list of tables or partitions to refresh with an expression.

Monthly Power BI Semantic Model Refreshes With Fabric Data Pipelines

I’m sure you already know how to configure scheduled refresh for your semantic models in Power BI. While the options you have for controlling when refresh takes place are generally good enough – you can configure daily or weekly refreshes and set up to eight times a day for refreshes to take place – there are some scenarios it doesn’t work for, such as monthly refreshes. Up to now the workaround has been to use Power Automate to trigger refreshes (see here for an example) or to call the refresh API from another application. Now, with Fabric, you have a much better option for scheduling refreshes: Data Pipelines.

The semantic model refresh activity in Fabric Data Pipelines was released last year and at the time all the focus was on the extra control it gives you over what happens in a refresh: it allows you to refresh individual tables or partitions and control the amount of parallelism, for example; it also allows you to schedule your refresh after other ETL operations, which in Fabric will probably also be scheduled using Pipelines, have completed. What I want to draw your attention to is the fact that Fabric Data Pipelines use the new Fabric scheduler which offers more flexibility for controlling when they run.

There’s some documentation here on how to schedule a Data Pipeline run but it’s very straightforward to use. I created a Fabric Pipeline consisting of a single semantic model refresh activity like so:

..then hit the Schedule button on the toolbar, selected “Monthly” on the Repeat dropdown and configured it to run on the first Sunday of the month:

Apart from the option to run on the Nth instance of a given day of the week you can also run the Data Pipeline on a given day number of the month; you can also run every N months, add up to 10 times to run per day, and set a start and end date.

There are two other options for scheduling that aren’t available in the scheduler for semantic models: the ability to run the Data Pipeline every N hours or minutes.

Be warned though: refreshing your semantic model every few minutes is dangerous because it can result in excessive CU consumption on your capacity and maybe even throttling if you’re not careful.

The same options to run a Data Pipeline daily and weekly that exist in the scheduler for semantic models with one notable limitation: the semantic model scheduler allows you to specify up to 48 times to refresh every day for models stored on a Premium/Fabric capacity, whereas the Fabric scheduler used by Pipelines only allows you to specify 10 times per day.

Of course you need a capacity to be able to use Fabric Data Pipelines but orchestration activities only cost 0.0056CU hours per activity run, so using Pipelines to refresh a semantic model in this way will only use a tiny fraction of even the smallest capacity’s resources.

Even if you don’t think you’re interested in anything that Fabric offers beyond Power BI, it’s features like this that, in my opinion, still make it worthwhile to flip the switch to enable Fabric and make your life as a Power BI developer easier.

Reading Delta Metadata In Power Query

There’s a new M function rolling out now that allows you to read metadata from Delta tables (at the time of writing it’s available in Dataflows Gen2 and will be available soon in Desktop). It builds on the DeltaLake.Table M function that allows you to read data from Delta tables and is similar to the Parquet.Metadata function that was released last year. Here’s an example of how to use it to get metadata from a Delta table in OneLake:

let
  Source = AzureStorage.DataLake(
    "https://onelake.dfs.fabric.microsoft.com/insertworkspaceidhere/insertlakehouseidhere/Tables/inserttablenamehere", 
    [HierarchicalNavigation = true]
  ), 
  ToDelta = DeltaLake.Metadata(
    DeltaLake.Table(Source)
  )
in
  ToDelta

The function returns a table of records containing the metadata from the Delta table such as the schema, how the table is partitioned, and whether the table is V-Ordered or not:

Improve Power Query Performance On CSV Files Containing Date Columns

A few weeks ago I replied to a question on reddit where someone was experiencing extremely slow performance when importing data from a CSV file using Power Query. The original poster worked out the cause of the problem and the solution themselves: they saw that removing all date columns from their query made their Power Query query much faster and that using the Date.FromText function and specifying the date format solved the problem. While I couldn’t reproduce the extreme slowness that was reported I was able to reproduce a performance difference between the two approaches and Curt Hagenlocher of the Power Query team confirmed that this was expected behaviour.

Let’s see an example. I created a CSV file with five date columns and one million rows, then created a Power Query query to import this data into Power BI Desktop using the default M code generated by the Power Query Editor:

let
  Source = Csv.Document(
    File.Contents("C:\GenerateDates.csv"),
    [
      Delimiter  = ",",
      Columns    = 5,
      Encoding   = 65001,
      QuoteStyle = QuoteStyle.None
    ]
  ),
  #"Promoted Headers" = Table.PromoteHeaders(
    Source,
    [PromoteAllScalars = true]
  ),
  #"Changed Type" = Table.TransformColumnTypes(
    #"Promoted Headers",
    {
      {"Extra Spaces", type date},
      {"Extra Spaces - 2", type date},
      {"Extra Spaces - 3", type date},
      {"Extra Spaces - 4", type date},
      {"Extra Spaces - 5", type date}
    }
  )
in
  #"Changed Type"

The dates in the CSV file were in the following format:

02  Jan   1901

…and this is important: there are two spaces between the day and the month name and three spaces between the month name and the year.

Using SQL Server Profiler I found that this query took around 14 seconds to run.

I then created a second query that, instead of using Table.TransformColumnTypes to set the data type on the columns, used Date.FromText and the Format option:

let
  Source = Csv.Document(
    File.Contents("C:\GenerateDates.csv"),
    [
      Delimiter  = ",",
      Columns    = 5,
      Encoding   = 65001,
      QuoteStyle = QuoteStyle.None
    ]
  ),
  #"Promoted Headers" = Table.PromoteHeaders(
    Source,
    [PromoteAllScalars = true]
  ),
  DateConversionFunction = (inputDate) as date =>
    Date.FromText(
      inputDate,
      [Format = "dd  MMM   yyyy"]
    ),
  ChangeDate = Table.TransformColumns(
    #"Promoted Headers",
    {
      {
        "Extra Spaces",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 2",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 3",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 4",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 5",
        DateConversionFunction,
        Date.Type
      }
    }
  )
in
  ChangeDate

This version of the query took around 10.5 seconds to run, so not a huge improvement but a noticeable one. It’s certainly not the 6/7x performance improvement seen on the reddit post but I’m sure different data, different date formats and different hardware might result in bigger differences.

I was told by Curt that when Power Query uses Table.TransformColumnTypes to parse date data from CSV files it tries a series of different date formats in order: first it tries ISO-8601 (for example 9th February 2025 would be “2025-02-09”), then a long date format, then a short date format, and finally it uses a generic .NET date parsing function which is slower than the others. It does this to make sure date parsing “just works” as often as possible. The dates in the example above, with the extra spaces, were deliberately designed to be slow for Table.TransformColumnTypes. When I tested on CSV files that contained dates in IS-8601 format I found that Table.TransformColumnTypes performed the same as Date.FromText.

So, to sum up, if you’re using CSV files containing date columns as a source for Power Query and you’re experiencing performance problems, try changing your M code to use Date.FromText instead of Table.TransformColumnTypes to set the data types on the date columns.

Limit The Impact Of Expensive Power BI Queries On Your Capacity By Reducing The Query Timeout

The recent announcement of Surge Protection gives Fabric/Power BI capacity admins a way to restrict the impact of background operations on a capacity, preventing them from causing throttling. However, at the time of writing, Surge Protection does not prevent users that are running expensive DAX or MDX queries – which are interactive operations – from causing problems on your capacity. Indeed, right now, there is no direct way to stop runaway queries from consuming a lot of CUs, although there is something you can do which will help a lot: reducing the query timeout.

Surge Protection doesn’t address the problem of expensive queries yet because Power BI only knows the CU usage of a DAX or MDX query when it has finished running – by which time it’s too late to do anything about it. In many cases, though, DAX or MDX queries that consume a lot of CUs are also slow. Therefore reducing the query timeout, which will kill any query that runs longer than a specified duration, will stop these queries from consuming so many CUs.

There are two default query timeouts that you should be aware of in Power BI. First, all DAX queries generated by a Power BI report have a 225 second timeout applied by the report itself. This timeout can be changed in Power BI Desktop but it cannot be changed on a published report in the Power BI Service. Second, you can set a timeout at the capacity level by changing the Query Timeout property in the admin portal. The default setting here is 3600 seconds (one hour). Unlike the first timeout, which only applies to the DAX queries generated by a Power BI report, this timeout applies to all queries run on any semantic model associated with the capacity, including the MDX queries generated by Excel PivotTables via Analyze In Excel. Setting this second timeout to less than 225 seconds means that it will take precedence over the first timeout. Therefore it’s the Query Timeout property on your capacity that you should set.

Hitting a timeout in a Power BI report will give the user a “Query has exceeded the available resources” error; clicking See Details/More Details will give you a message like this:

The XML for Analysis request timed out before it was completed. Timeout value: 10 sec.

Hitting the query timeout in an Excel PivotTable will give you the same message:

What value should you set the Query Timeout to? In my opinion no query should ever run for more than 30 seconds because anything slower will result in a poor experience for your end users – no-one wants to sit around for ages waiting for a report to render. I also think it should be possible to tune any semantic model so all queries run under 30 seconds if you know what you’re doing. That said, in the real world, setting a timeout of 30 seconds may be unrealistic: developers may not have the skills to tune their semantic models. As a result I find a timeout of 100 seconds is often a good compromise but you should experiment with different timeouts to see what the minimum value you can get away with is.

It’s important to note that reducing the query timeout will not stop every expensive query. This is because it’s perfectly possible to have very fast queries that consume a lot of CUs – for example when distinct count measures are used, and/or when there are very large data volumes and/or when there are complex but highly-optimised measures. Also there relatively rare cases where a query will carry on running beyond the duration specified by the timeout, because the Vertipaq engine only checks if the timeout has been exceeded at certain points in the code and depending on the query there could be several seconds (sometimes more) between these checks. Equally, some very slow queries may not use a lot of CUs and having them time out might cause unnecessary disruption. Overall, though, in my experience setting a timeout will stop enough expensive queries to make doing so worthwhile.

[Update: my colleague Akshai Mirchandani has just reminded me that you can also set the Query Timeout at the workspace level as a Server Property using SQL Server Management Studio, as detailed here. The property is called ServerTimeout. This gives you more flexibility than setting it for the whole capacity.]

Why DAX Is Better Than MDX For Bulk Extracts Of Data From Power BI

This is a post I’ve avoided writing for many years, and before I carry on let me make one thing clear:

Doing bulk extracts of data from a Power BI semantic model is a **really** bad idea

My colleague Matthew Roche wrote a great post on this topic a couple of years ago that is still relevant: using Power BI (or Analysis Services) as a data source for other systems, including other Power BI Import mode semantic models, is an anti-pattern. Power BI is optimised for small, analytical queries that return the amount of data that can be visualised on a single page. It is not optimised for queries that return millions of rows. Running this kind of query on a Power BI semantic model will be slow, is likely to run into timeouts and memory errors, and is also likely to cause CU spikes – and perhaps throttling – on a Premium capacity. If you want the data from a semantic model it’s much better to go back to the original data sources that the semantic model uses.

But

People still use Power BI semantic models as data sources all the time. This is either because they don’t know any better, because they can’t get access to the underlying data sources, or because they want to get the result of any DAX calculations on the model.

If you do need to extract large amounts of data from a semantic model I have one important piece of advice: write a DAX query to get the data and not an MDX query. There are two reasons for this:

Writing a DAX query to get granular data is usually a lot simpler than writing an MDX query
DAX queries that return large amounts of data are typically faster (and so less likely to hit timeouts), more CPU efficient (and therefore less likely to cause throttling on a capacity) and more memory efficient (and so less likely to cause memory errors)

The bad news is that the two client tools most often used to bulk extract data from Power BI, Excel PivotTables and Power Query using the Analysis Services connector and its query builder, generate MDX queries. What’s more, they don’t always generate the most efficient MDX queries either.

Let’s see an example. I have a semantic model in a Premium workspace with a table called Property Transactions with around a million rows in it. I connected to the model via the XMLA Endpoint using the “From SQL Server Analysis Services Database (Import)” option in Power Query in Excel:

…and then created a query to get the data from all the columns on the Property Transactions table plus one measure, called Count of Sales, using Power Query’s query builder:

While the query builder generated the MDX for me, you can see that it was not a simple query:

I ran a Profiler trace while this query ran and from the Execution Metrics I saw that:

The query took 54 seconds to complete
CPU Time was also 54 seconds
The approximate peak memory usage of the query was 626292KB

I then created a second Power Query query that used the following DAX query to get the same data, which I think you’ll agree is much more straightforward:

EVALUATE 
ADDCOLUMNS('Property Transactions', "Count of Sales", [Count of Sales])

[You have the option of entering a customer MDX or DAX query when you create your Power Query query]

This time, Execution Metrics showed me that:

The query took 6 seconds to complete
CPU Time was 6 seconds too
The approximate peak memory usage was 142493KB

So the DAX query was simple to write and maintain, took 11% of the time that the MDX query to run, used 11% of the CPU and 22% of the memory. That’s a big improvement. Even though I might be able to rewrite the MDX generated by Power Query to be more efficient there’s no way it would be as simple or as efficient as the DAX query.

[Thanks to Akshai Mirchandani for the information in this post]