One of the most common questions I get asked is “How can I use Power BI in DirectQuery mode on top of a REST API?”. This seems like a reasonable thing to do but almost everyone who tries it will fail, and in this post I will explain why.
To answer this question we first of all have to review the two main ways of working with data in Power BI: Import mode and DirectQuery mode. In Import mode data is cached in Power BI’s own internal database and all the DAX queries that are generated by your reports are answered from there. In DirectQuery mode no data is stored inside Power BI; instead, when a report is run and DAX queries are fired off against your dataset, Power BI in turn generates queries against the data source to get the data needed. Most of the data sources that can be used with DirectQuery mode in Power BI are relational databases and so that means Power BI will generate SQL queries to get data from them, but Power BI can also generate queries in other languages too.
Let’s look at a simple example of how DirectQuery works. Consider the following table called FruitSales in a SQL Server database:
If I build a DirectQuery dataset containing only this table I can create the following Power BI report with a matrix and a slicer:
The SQL query generated by Power BI to get the data for the matrix is this:
SELECT TOP (1000001) [t0].[Colour],SUM( CAST([t0].[Sales] as BIGINT) ) AS [a0] FROM ( ( select [$Table].[Fruit] as [Fruit], [$Table].[Colour] as [Colour], [$Table].[Sales] as [Sales] from [dbo].[FruitSales] as [$Table] ) ) AS [t0] WHERE ( ([t0].[Colour] IN (N'Red',N'Green')) ) GROUP BY [t0].[Colour]
A few things are immediately obvious from this: the query has a WHERE clause to filter the data down to the two colours selected in the slicer and there’s a GROUP BY to get the sum of Sales by Colour. You can see how the results of the SQL query are going to match what is displayed in the report. In more more complex cases – especially when non-trivial DAX calculations are used – a single visual may generate multiple, more sophisticated SQL queries and some calculation work may also be done inside Power BI.
This example illustrates the kind of of operations that Power BI needs to push down to the data source in DirectQuery mode. The question is, then, does your REST API support filtering and aggregating data in the way Power BI wants? If not, then you won’t be able to build a DirectQuery dataset on top of it.
“But wait”, I hear some of you say, “why can’t Power BI get a table of data returned by my REST API and do the filtering and aggregation itself?”. I agree it would be lovely if it could but it can’t. And if it did, that would be something more like Import mode and not DirectQuery. “Can’t I just somehow refresh an Import mode dataset on demand then?”, you may then say, “I know it’s possible to refresh a dataset using Power Automate and we can now trigger a Power Automate flow from a report using the new Power Automate visual. What’s more, with Power BI Premium there’s no limit on the number of refreshes you can trigger via the API that Power Automate uses”. That’s certainly true, but there are a few problems with this approach:
- Refreshing an Import mode dataset can be relatively slow and expensive in terms of resources, and if you have hundreds of users refreshing a dataset every time they view a report you may end up with the dataset being permanently refreshing which again impacts performance.
- Commercial APIs often have limits on the number of times you can call them within a given time period and it’s likely you’d hit these rate limits if you let your users refresh a dataset any time they wanted.
- In Import mode there’s no easy way to take a selection made by an end user and pass it back to the API as a parameter (if you don’t need to pass parameters back to the API why not just create a regular Import dataset and refresh it on a schedule?). Dynamic M parameters only work in DirectQuery mode. I suppose you could capture a selection and, using the Power Automate visual again, pass it to a Flow that used a Power Automate custom connector to call the Power BI REST API and change an M parameter’s value in your dataset but that would be very tricky to set up. However…
- A dataset is something that is shared between all the users of your report. What happens if User A changes a parameter, refreshes the dataset, starts interacting with the report and then User B also starts viewing the report and changes the parameter to something different and refreshes again? In that case User A would see the numbers for User B’s selection in the report and it would be extremely confusing.
Let’s say your API does support all the rich operations Power BI needs in DirectQuery mode though. You can certainly build a Power BI custom connector that supports DirectQuery mode yourself. One way of doing this is to build your own ODBC provider on top of your API and then wrap it in a simple Power BI custom connector – which, as you can probably guess, is no easy task. There’s a sample of how to create a custom connector that supports DirectQuery on an ODBC provider here. Alternatively you can try not going down the ODBC route and putting all the logic in the custom connector but there’s no documentation on how to do this and it would be extremely difficult to do, so you’d need a lot of support from someone at Microsoft who is much better at coding in M than I am.
The good news is that some third-party vendors, for example CData and Progress (there may be others too) sell Power BI custom connectors that allow you to create DirectQuery datasets on top of REST APIs. These connectors are thin wrappers for ODBC providers and work by caching the data returned from the API within the provider and providing a SQL query interface on top of that. I haven’t tested these products so I can’t comment on how well they work or what their performance is like. You’d need to pay extra to use them, of course, and since they are custom connectors you’d need to have an on-premises data gateway to use them.
To conclude, the purpose of this post is not to explain why building a report on top of a REST API is a bad idea but to explain why it’s hard to do this in Power BI. This is, partly, the result of the way Power BI is designed: as a model-based tool rather than a report-based tool (Marco Russo wrote a great post on this subject) it needs access to all the data for a report via a richer interface than most APIs can provide. Indeed, my advice to anyone wanting to build a DirectQuery dataset on an API is that it is better to bypass the API and go to the data source that the API uses (which is often a relational database) instead, if you can, or to extract all the data that the API can provide and stage it locally. It might go against someone in your organisation’s grand plan for a service-oriented architecture but you’ll be a lot more successful with your reporting.