Last year I blogged about how to use the Text.BetweenDelimiters() function to extract all the links from the href attributes in the source of a web page. The code was reasonably simple but there’s now an even easier way to solve the same problem using the new Html.Table() function. This function doesn’t seem to be documented online yet, but the built-in documentation for the function available in the Query Editor is up-to-date:
Miguel Escobar also has a great post showing how to use it and the new Web.BrowserContents function here.
Here’s an example M query that extracts all the links that start with the letters “http” from my company homepage:
To explain what’s going on here:
Web.BrowserContents returns the text of the html DOM for the web page
In the second step Html.Table takes that text and searches for all <a> elements whose href attribute starts with the letters “http”. I found this CSS selector here.
You can find out how to sign up for an Azure Maps account here; it isn’t free to use but you do get 250,000 free map renders per month (which should be more than enough for Power BI use) and any use over that is extremely cheap. Full details on pricing can be found here.
Here’s what the source data in my dataset looks like:
The only important column is the Center column, which contains the longitude of the crime location followed by a comma followed by the latitude of the crime location in a single text value.
With the data in this format you can call the Get Map Image API relatively easily in DAX using a measure something like this:
var BaseURL =
var SubscriptionKey =
"insert your key here"
var ApiVersion =
var Layer =
var Center =
"¢er=" & SELECTEDVALUE('London Crime'[Center])
var ZoomLevel =
var HeightWidth =
BaseURL & SubscriptionKey & ApiVersion &
Layer & Center & ZoomLevel & HeightWidth
You’ll need to paste your Azure Maps API key in on the line highlighted above and set the Data Category for the measure to Image URL. The maximum possible height of an image in a table or matrix in Power BI is, as far as I can see, 150 pixels so that’s why the code above requests an image that is 150×150. You may want to experiment with different zoom levels and layer types to see what looks best on your report.
You can view the sample report here and download a copy of the report (without my API key in) here.
I’ve just come across an interesting new project from Microsoft Research called Charticulator. It’s described as a tool for the “Interactive Construction of Bespoke Chart Layouts”, and you can use it for creating some very nice data visualisations like this:
So I had a play with it, was quite impressed, and then I watched the video on the home page and noticed that towards the end (watch from 3:55 onwards) there’s a demo of how the charts you create in it can be exported as Power BI custom visuals! No code required!
After that I was a lot more than quite impressed – this looks really useful. Unfortunately the feature is not released yet, although according to one of the creators it will be coming soon.
The excellent work that David Eldersveld has been doing recently on using SVG images in Power BI has generated a lot of interest; now that the August 2018 release of Power BI Desktop allows you to set data categories on measures – and specifically the Image URL data category – David has been able to apply his work to the creation of sparklines and small multiples in Power BI. He’s already published an example in the Quick Measures gallery here; you might also want to check out this blog post and quick measure by Tom Martens too.
All this got me thinking: what if there was a simple web service that could generate charts, one that didn’t need any authentication, could work with simple GET requests and was free to use? Well guess what, there is one created by Google: the Google Image Charts API. It turns out that it makes it super easy to create sparklines and small multiples in Power BI with very little DAX needed.
Here are two examples of what’s possible with some simple DAX in a measure:
Both examples use data from a table called Sales with the following contents:
The first example, the Sales Sparkline measure, has the following definition:
Sales Sparkline =
var BaseURL =
var ChartType =
var ChartScale =
var ChartSize =
var SeriesColour =
var ChartData =
CONCATENATEX(Sales, Sales[Sales], ",", Sales[Month])
BaseURL & ChartType & ChartScale &
ChartSize & SeriesColour & ChartData
All I’m doing is building up the URL to call the API from the parameters listed here. The second measure is basically the same as the first, but the chart type is driven by a selection in a slicer. It’s not bullet-proof code but it shows how easy it is to use the API.
You can see the report live here, and download the sample .pbix file here.
I’m not going to pretend that I’m a data visualisation expert and I know these examples are very basic; however, I think there are a lot of possibilities here for someone with better data visualisation skills than me.
Now for the bad news: the Google Image Charts API has been deprecated since 2012! Here’s the statement on the Google website:
While the dynamic and interactive Google Charts are actively maintained, we officially deprecated the static Google Image Charts way back in 2012. This gives us the right to turn it off without notice, although we have no plans to do so.
Also, there’s a fair usage policy too:
There’s no limit to the number of calls per day you can make to the Google Chart API. However, we reserve the right to block any use that we regard as abusive.
As a result I would not be confident about using it in production. If thousands of Power BI users started using the API it might prompt Google to block traffic from Power BI or turn off the API completely!
I do think Microsoft really needs to address the lack of native support for sparklines and small multiples in Power BI – the fact we still don’t have them is frankly embarrassing (even Power View had them, for crying out loud). Although one solution could be a DAX function that returned a chart image – basically taking David’s work and wrapping it up in a more user-friendly form – another approach would be for Microsoft to build on the Report Page Tooltips functionality. Since we can already design mini reports and make them appear as tooltips, why not make it possible for the same mini reports to appear in a cell in a table or matrix? You could then have measures that returned the name of a report page as a text value, and a new data category for them that tells Power BI to display the contents of the report page in the cell instead of the text. That way you could create small multiples from any visual or set of visuals.
UPDATE: thanks also to Tom Martens for reminding me to point out that this technique involves sending data to a third party, something which might cause legal problems for you.
While I was playing around with the new release (August 2018) of Power BI Desktop I noticed there was an undocumented change: similar to the OData improvements I blogged about here, there is a new option in the AnalysisServices.Database() and AnalysisServices.Databases() M functions that turns on a newer version of the MDX generation layer used by the Power Query engine. Like the OData improvements it is an option called Implementation=”2.0”, used like this:
…and also, as with the OData improvements, you will need to manually edit any existing M queries to take advantage of this.
In fact first heard about this option in a comment on this blog post back in January, but as I was told by the dev team that it hadn’t been tested properly I didn’t blog about it. However as it is now set by default in the M code generated by the Power Query Editor, I guess it’s ready for production use. I’m told it should improve the performance of M queries that import data from Analysis Services – and I would be very interested to hear from anyone who tests this about how much of an improvement they see.
I’ve done a little bit of testing myself and I can see there are indeed some differences in the MDX generated. For example, I created an M query that connected to the Adventure Works DW database and returned all combinations of customer and day name where the Internet Sales Amount measure is greater than 100. In the old version of the MDX generation layer (ie without Implementation=”2.0” set) the following MDX query is generated:
The difference between the two is that the first query uses a subselect to do the filtering whereas the second does not; subselects in MDX are not necessarily bad, but I generally try to avoid using them unless I need to. There may be other differences in the way the MDX is generated in the new version but I haven’t had a chance to do any detailed testing.
My posts from two weeks ago (see here and here) on using Process Monitor to troubleshoot the performance of Power Query queries made me wonder about another question: how does the performance of reading data from CSV files compare to the performance of reading data from Excel files? I think most experienced Power Query users in either Power BI or Excel know that Excel data sources perform pretty badly but I had never done any proper tests. I’m not going to pretend that this post is a definitive answer to this question (and once again, I would be interested to hear your experiences) but hopefully it will be thought-provoking.
To start off, I took the 153.6MB CSV file used in my last few posts and built a simple query that applied a filter on one text column, then removed all but three columns. The query ran in 9 seconds and using the technique from my last post I was able to draw the following graph from a Process Monitor log file showing how Power Query reads data from the file:
Nothing very surprising there. Then I opened the same CSV file in Excel and saved the data as an xlsx file with one worksheet and no tables or named ranges; the resulting file was only 80.6MB. Finally I created a duplicate of the first query and pointed it to the Excel file instead. The resulting query ran in 59 seconds – around 6 times slower! Here’s a comparison with the performance of this query with the first query:
The black line in the graph above is the amount of data read (actually the offset values showing where in the file the data is read from, which is the same thing as a running total when Power Query is reading all the data) from the Excel file; the green line is the amount of data read from the CSV file (the same data shown in the first graph above). A few things to mention:
Running Process Monitor while this second query was refreshing had a noticeable impact on its performance – in fact it was almost 20 seconds slower
The initial values of 80 million bytes seem to be where data is read from the end of the Excel file. Maybe this is Power Query reading some file metadata? Anyway, it seems as though it takes 5 seconds before it starts to read the data needed by the query.
There’s a plateau between the 10 and 20 second mark where not much is happening; this didn’t happen consistently and may have been connected to the fact that Process Monitor was running
In any case you don’t need to study this too hard to understand that the performance of reading data from an xlsx format Excel file is terrible compared to the performance of reading data from a CSV. So, if you have a choice between these two formats, for example when downloading data, it seems fair to say that you should always choose CSV over xlsx.