Chris Webb's BI Blog: Can I Build A Power BI DirectQuery Dataset On Top Of A REST API?

One of the most common questions I get asked is “How can I use Power BI in DirectQuery mode on top of a REST API?”. This seems like a reasonable thing to do but almost everyone who tries it will fail, and in this post I will explain why.

To answer this question we first of all have to review the two main ways of working with data in Power BI: Import mode and DirectQuery mode. In Import mode data is cached in Power BI’s own internal database and all the DAX queries that are generated by your reports are answered from there. In DirectQuery mode no data is stored inside Power BI; instead, when a report is run and DAX queries are fired off against your dataset, Power BI in turn generates queries against the data source to get the data needed. Most of the data sources that can be used with DirectQuery mode in Power BI are relational databases and so that means Power BI will generate SQL queries to get data from them, but Power BI can also generate queries in other languages too.

Let’s look at a simple example of how DirectQuery works. Consider the following table called FruitSales in a SQL Server database:

If I build a DirectQuery dataset containing only this table I can create the following Power BI report with a matrix and a slicer:

The SQL query generated by Power BI to get the data for the matrix is this:

SELECT 
TOP (1000001) [t0].[Colour],SUM(
CAST([t0].[Sales] as BIGINT)
)
 AS [a0]
FROM 
(
(
select [$Table].[Fruit] as [Fruit],
    [$Table].[Colour] as [Colour],
    [$Table].[Sales] as [Sales]
from [dbo].[FruitSales] as [$Table]
)
)
 AS [t0]
WHERE 
(
([t0].[Colour] IN (N'Red',N'Green'))
)

GROUP BY [t0].[Colour]

A few things are immediately obvious from this: the query has a WHERE clause to filter the data down to the two colours selected in the slicer and there’s a GROUP BY to get the sum of Sales by Colour. You can see how the results of the SQL query are going to match what is displayed in the report. In more more complex cases – especially when non-trivial DAX calculations are used – a single visual may generate multiple, more sophisticated SQL queries and some calculation work may also be done inside Power BI.

This example illustrates the kind of of operations that Power BI needs to push down to the data source in DirectQuery mode. The question is, then, does your REST API support filtering and aggregating data in the way Power BI wants? If not, then you won’t be able to build a DirectQuery dataset on top of it.

“But wait”, I hear some of you say, “why can’t Power BI get a table of data returned by my REST API and do the filtering and aggregation itself?”. I agree it would be lovely if it could but it can’t. And if it did, that would be something more like Import mode and not DirectQuery. “Can’t I just somehow refresh an Import mode dataset on demand then?”, you may then say, “I know it’s possible to refresh a dataset using Power Automate and we can now trigger a Power Automate flow from a report using the new Power Automate visual. What’s more, with Power BI Premium there’s no limit on the number of refreshes you can trigger via the API that Power Automate uses”. That’s certainly true, but there are a few problems with this approach:

Refreshing an Import mode dataset can be relatively slow and expensive in terms of resources, and if you have hundreds of users refreshing a dataset every time they view a report you may end up with the dataset being permanently refreshing which again impacts performance.
Commercial APIs often have limits on the number of times you can call them within a given time period and it’s likely you’d hit these rate limits if you let your users refresh a dataset any time they wanted.
In Import mode there’s no easy way to take a selection made by an end user and pass it back to the API as a parameter (if you don’t need to pass parameters back to the API why not just create a regular Import dataset and refresh it on a schedule?). Dynamic M parameters only work in DirectQuery mode. I suppose you could capture a selection and, using the Power Automate visual again, pass it to a Flow that used a Power Automate custom connector to call the Power BI REST API and change an M parameter’s value in your dataset but that would be very tricky to set up. However…
A dataset is something that is shared between all the users of your report. What happens if User A changes a parameter, refreshes the dataset, starts interacting with the report and then User B also starts viewing the report and changes the parameter to something different and refreshes again? In that case User A would see the numbers for User B’s selection in the report and it would be extremely confusing.

Let’s say your API does support all the rich operations Power BI needs in DirectQuery mode though. You can certainly build a Power BI custom connector that supports DirectQuery mode yourself. One way of doing this is to build your own ODBC provider on top of your API and then wrap it in a simple Power BI custom connector – which, as you can probably guess, is no easy task. There’s a sample of how to create a custom connector that supports DirectQuery on an ODBC provider here. Alternatively you can try not going down the ODBC route and putting all the logic in the custom connector but there’s no documentation on how to do this and it would be extremely difficult to do, so you’d need a lot of support from someone at Microsoft who is much better at coding in M than I am.

The good news is that some third-party vendors, for example CData and Progress (there may be others too) sell Power BI custom connectors that allow you to create DirectQuery datasets on top of REST APIs. These connectors are thin wrappers for ODBC providers and work by caching the data returned from the API within the provider and providing a SQL query interface on top of that. I haven’t tested these products so I can’t comment on how well they work or what their performance is like. You’d need to pay extra to use them, of course, and since they are custom connectors you’d need to have an on-premises data gateway to use them.

To conclude, the purpose of this post is not to explain why building a report on top of a REST API is a bad idea but to explain why it’s hard to do this in Power BI. This is, partly, the result of the way Power BI is designed: as a model-based tool rather than a report-based tool (Marco Russo wrote a great post on this subject) it needs access to all the data for a report via a richer interface than most APIs can provide. Indeed, my advice to anyone wanting to build a DirectQuery dataset on an API is that it is better to bypass the API and go to the data source that the API uses (which is often a relational database) instead, if you can, or to extract all the data that the API can provide and stage it locally. It might go against someone in your organisation’s grand plan for a service-oriented architecture but you’ll be a lot more successful with your reporting.

13 thoughts on “Can I Build A Power BI DirectQuery Dataset On Top Of A REST API?”

Stephen Maguire says:

April 25, 2021 at 8:54 pm

Great article. Such a shame that more APIs aren’t built to be ODATA compliant. Saves some hassle when hooking up Power BI.

Loading...

Chris Webb says:

April 26, 2021 at 10:30 am

One last point… this blog post is all about regular Power BI reports. But what does Chris Finlan mention here…? https://youtu.be/HaaEHb0v4aE?t=1944

Loading...

1. markive says:
  
  April 29, 2021 at 11:28 am
  
  Ahhh such a tease he says nothing!!! Nooooooooo
  
  Loading...
  
  1. Chris Webb says:
    
    April 29, 2021 at 11:31 am
    
    On the contrary, think about what he does say…
    
    Loading...
Pingback: Dew Drop – April 26, 2021 (#3430) – Morning Dew by Alvin Ashcraft
Pingback: DirectQuery on REST APIs – Curated SQL
MARTIN KRATKY says:

April 28, 2021 at 2:21 am

Another option are products that automate REST API integration for a variety of sources (major ERP, accounting and CRM systems) into an Azure/SQL data warehouse: https://www.acterys.com

Loading...

Pingback: Power BI Insights & Updates: DirectQuery on a REST API; Training; New visuals ERP for Hong Kong SME
markive says:

April 29, 2021 at 11:30 am

I’ve tested this against both the Progress and CData ODBC connectors for REST. I can confirm it does work and it’s snappy enough if the dataset returned is small. I’m currently testing a larger dataset that returns a single column of 130,000 rows and it’s very slow 10-20s.

I’m currently working with the Progress team to analyse the performance issue and any work around and I’ll update this thread as necessary.

Loading...

1. Chris Webb says:
  
  April 29, 2021 at 11:32 am
  
  Thanks – I would be very interested to see how it performs when multiple users are running queries too.
  
  Loading...
  
Aaron Burg says:

May 12, 2021 at 6:12 pm

Thanks for the great post, Chris. I am an engineer with Progress and work with our Autonomous REST Connector frequently to help people solve this challenge. If you or anyone else would like more details, please don’t hesitate to reach out. I’d be happy to share trial links and tutorials.

Loading...

1. Aaron Burg says:
  
  May 12, 2021 at 8:28 pm
  
  I can be reached at aburg @ progress.com
  
  Loading...
  
Rune says:

December 9, 2021 at 1:41 pm

Great article Chris, explains a LOT!

I wish that I had been provided with this information in more clarity when I was looking for pros/cons with Power BI, if we ever knew this we would never had gone for a solution that used Power BI and saved lots of expenses and time (it’s a long story) and rather go for other solutions more suitable solutions. But unfortunately it hasn’t and we are stuck with a mediocre implementation with a framework that doesn’t suit us or our needs and costs us lots of money due to in-between-hacks.

Microsoft should REALLY emphasize this information together with their information of available data sources so people knows more clearly about these limitations when analyzing the best way.

The CDATA/Progress custom connector / ODBC solutions are solutions but its way to expensive (and I haven’t figured out if they even work with Power BI Embedded either). In addition the comment earlier about the performance of the ODBC drivers is worrying with that amount of data which would be a small amount in our usage.

Loading...