Why DAX Is Better Than MDX For Bulk Extracts Of Data From Power BI

This is a post I’ve avoided writing for many years, and before I carry on let me make one thing clear:

Doing bulk extracts of data from a Power BI semantic model is a **really** bad idea

My colleague Matthew Roche wrote a great post on this topic a couple of years ago that is still relevant: using Power BI (or Analysis Services) as a data source for other systems, including other Power BI Import mode semantic models, is an anti-pattern. Power BI is optimised for small, analytical queries that return the amount of data that can be visualised on a single page. It is not optimised for queries that return millions of rows. Running this kind of query on a Power BI semantic model will be slow, is likely to run into timeouts and memory errors, and is also likely to cause CU spikes – and perhaps throttling – on a Premium capacity. If you want the data from a semantic model it’s much better to go back to the original data sources that the semantic model uses.

But

People still use Power BI semantic models as data sources all the time. This is either because they don’t know any better, because they can’t get access to the underlying data sources, or because they want to get the result of any DAX calculations on the model.

So

If you do need to extract large amounts of data from a semantic model I have one important piece of advice: write a DAX query to get the data and not an MDX query. There are two reasons for this:

  • Writing a DAX query to get granular data is usually a lot simpler than writing an MDX query
  • DAX queries that return large amounts of data are typically faster (and so less likely to hit timeouts), more CPU efficient (and therefore less likely to cause throttling on a capacity) and more memory efficient (and so less likely to cause memory errors)

The bad news is that the two client tools most often used to bulk extract data from Power BI, Excel PivotTables and Power Query using the Analysis Services connector and its query builder, generate MDX queries. What’s more, they don’t always generate the most efficient MDX queries either.

Let’s see an example. I have a semantic model in a Premium workspace with a table called Property Transactions with around a million rows in it. I connected to the model via the XMLA Endpoint using the “From SQL Server Analysis Services Database (Import)” option in Power Query in Excel:

…and then created a query to get the data from all the columns on the Property Transactions table plus one measure, called Count of Sales, using Power Query’s query builder:

While the query builder generated the MDX for me, you can see that it was not a simple query:

I ran a Profiler trace while this query ran and from the Execution Metrics I saw that:

  • The query took 54 seconds to complete
  • CPU Time was also 54 seconds
  • The approximate peak memory usage of the query was 626292KB

I then created a second Power Query query that used the following DAX query to get the same data, which I think you’ll agree is much more straightforward:

EVALUATE 
ADDCOLUMNS('Property Transactions', "Count of Sales", [Count of Sales])

[You have the option of entering a customer MDX or DAX query when you create your Power Query query]

This time, Execution Metrics showed me that:

  • The query took 6 seconds to complete
  • CPU Time was 6 seconds too
  • The approximate peak memory usage was 142493KB

So the DAX query was simple to write and maintain, took 11% of the time that the MDX query to run, used 11% of the CPU and 22% of the memory. That’s a big improvement. Even though I might be able to rewrite the MDX generated by Power Query to be more efficient there’s no way it would be as simple or as efficient as the DAX query.

[Thanks to Akshai Mirchandani for the information in this post]

20 thoughts on “Why DAX Is Better Than MDX For Bulk Extracts Of Data From Power BI

  1. Out of curiosity , do we know the reason why Excel is still using “Inefficient MDX” for querying Tabular Model. I mean we never hear anyone complaining about slow performance in Power BI, only when they switch over to Excel they get into trouble.

      1. Chris, I need to migrate solution from Excel Pivot tables connecting to MDX cubes to similar style UI in Power BI or Excel with Power BI Symantec model or SQL server as a data source.
        I consider these 3 options:
        1) Connect Excel to Power BI using Analysis Services connector, so they can continue using Pivot Tables design
        2) Create Matrix visual in Power BI report and enable Personalized visualization, so users could slice data the way they want. Issue here with flexibility for Totals, Subtotals, Personalized visuals don’t allow to enable/disable totals.
        3) Use Matrix Visual, create Field parameters for Rows\Columns with all possible attributes (50+) and Values (all possible measures), so the users could populate the matrix view the way they want. There are multiple issues with this option: matrix visual doesn’t expand rows with selecting extra field parameters, so it would require manual steps from end user to expand, plus, the same issue with flexible totals and subtotals for attributes as above.

        Are there any other options I’m missing here? What would be the better choice from performance and data volume perspective?

  2. There’s a reason you never wrote this blog before. Because 50% of the discussion is related
    to personal preferences of the customer, when they are using one query language or another. For the other 50% of the discussion we are totally at the mercy of Microsoft to optimize the execution plans of both languages equally well, so they retrieve data with as much efficiency as possible.

    … If Microsoft wanted to optimize the performance of that particular monstrous nested crossjoin they be doing so. If they wanted to optimize MDX on tabular models, then they would do so. There is nothing inherent about the DAX *language* that makes it faster than an MDX, when retrieving the same data. They are only slow or fast, based on what syntax Microsoft chooses to optimize and enhance.

    I do agree that Microsoft gave DAX quite a bit more love, when it comes to their optimization efforts.

    It seems unfortunate that Microsoft won’t give their MDX execution plans a bit more TLC. It is a very clean and powerful query language. It is no accident that the Power BI team uses it for their import queries in PQ when moving rows from one dataset to another. And it is no accident that pivot tables use it as well.

    When Microsoft came up with MDX, the world took notice, and other data engines saw the value and adopted it as well. (including Tableau among others)

    When Microsoft came up with DAX, it was ostensibly “easier” and target non-technical users in Excel. While it is “easy” for solving some simple problems, it can get ugly pretty quick as the queries grow more complex. There are certainly some things I like about DAX… but there is no question that I would still use MDX for solving well over 80% of my reporting requirements. Unlike MDX, I’ve noticed that this DAX query language is *NOT* the envy of Microsoft’s competitors.

    IMO, it would be unfortunate for Microsoft to abandon MDX any more than they already have. And if they have bandwidth to invest in another query option, then it should be to run regular SQL relational queries. Lots of data providers can work with a flavor of SQL but PBI datasets seem to be an outlier in that regard.

  3. Does this apply to Power BI Import mode only? Is it still ok to use one semantic model for many Power BI reports querying the same semantic model in Direct mode?

  4. Importing from another Semantic Model is anti-patern? So direct-query / composite models are cool kid on the block, but import is bad?

      1. Why directquery to other semandic model is ok, but not import. I lost this part

      2. DirectQuery to another semantic model doesn’t copy the data. Importing data from another semantic model involves running DAX queries that return large amounts of data, which can be expensive.

  5. So, I’d be curious about your thoughts on this:

    Let’s say that we developed and curated a big Semantic Model, connecting multiple Facts and Dimension tables, adding a bunch a business rules – on the tables but also on the DAX measures – and finished with a data validation from end-users.

    Now, this Semantic Model becames the source of truth for, let’s say, a simple measure that is creating a % KPI using SUMs from two different fact tables.

    Now, you’re saying that, instead of using the highly-curated Semantic Model to feed any other source with the Source of Truth KPI calculation, we should simply go back and develop this on the backend using X-number of window functions and X-number of LEFT JOINs? Or are you just proposing to not extract the most granular data from the fact tables?

  6. Thank you, Chris, as always, for a great article.

    Although it wasn’t your main point, opening the box on using semantic models as an upstream source of something else seems to beg some bigger questions. I’d appreciate another article or two on this topic.

    For one thing, some of the messaging we’re getting seems to imply that XMLA is indeed an appropriate way to get data “at scale” from AS. See this post for example:
    https://powerbi.microsoft.com/en-us/blog/executequeries-rest-api-versus-xmla-endpoints-at-scale/

    For another, querying data in bulk seems to have been the original implicit use-case for semantic link–being able to pull “golden” semantic model data backwards into your python workloads (i.e., up to and including AI things you might try to do).

    While not explicit, it seems fair to say that some of the messaging we’ve been getting suggests that we could, in fact, use semantic models in this way if we wanted to.

    Interested in your thoughts.

    1. I can say for sure that everyone on the CAT team and everyone on the Analysis Services team (which includes Kay, who wrote that article) agrees that bulk extracts of data from a semantic model are a bad idea. However, as I said in the blog post above, we know that people will do it anyway, so if you have to do it there are some things you can do – use the XMLA Endpoint, write DAX queries – to limit the impact.

  7. Great article. Am I correct in assuming that using DAX for the bulk extract would only apply to tabular models? If we’re stuck running a MD SSAS cube – are we still required to only use MDX when extracting bulk data?

    Thanks!

      1. Thank you for this – I hadn’t quite realized that DAX could also be used. So then MDX is only needed for the actual creation of the SSAS MD model and measures within it?

  8. Great post. It might be worth mentioning the ‘Insert Table’ feature in Excel when connecting to a Power BI semantic model (https://powerbi.microsoft.com/en-us/blog/announcing-new-ways-to-create-connected-tables-in-excel-connected-to-power-bi/). This way you can select and filter fields much like Power BI desktop which then adds a table fed by a DAX query. Only problem with this is you can’t get back into this dialog if you want to change the query, you have to change the DAX query itself. If this would be added this would be THE way to extract data in the mentioned scenario’s. Do you know if this is something which will be added in the (near) future?

Leave a Reply to João Luis Baldo MartinsCancel reply