Can You Use Power Query In Power BI Or Excel To Write Data To A Data Source?

The short answer: no.

The long answer:

It is, technically, possible to use Power Query (in either Power BI or Excel) to write data to a data source; for example, here’s an old post I wrote showing how to run a UPDATE statement to update data in a table in SQL Server. However because the Power Query engine may evaluate a value or statement more than once when a query is executed, it is therefore not safe or supported to use it to try to create, insert, update, delete or otherwise change data in a data source – it should only be used to read data. You may see some functions listed in the M function reference that suggest otherwise but they aren’t supported either, and indeed may not work at all.

Thanks to Matt Masson of the dev team for providing the official position on this.

A Function To Generate The M Code For A Table Type

This is going to sound obscure, and it is, but recently I’ve been using the #table() intrinsic function in M a lot – specifically the version that takes a table type as its first parameter (as I describe here) – and because it’s a bit of a pain to have to write the M code for a table type by hand, I’ve written an M function that takes a table and returns the text for the M code that is needed to define a table type. Here it is:

(InputTable as table) as text =>
let
    Source = 
        Table.Schema(InputTable),
    SortRows = 
        Table.Sort(
            Source,
            {{"Position", Order.Ascending}}),
    RemoveColumns = 
        Table.SelectColumns(
            SortRows,
            {"Name", "TypeName"}),
    AddCustom = 
        Table.AddColumn(
            RemoveColumns, 
            "TypeNames", 
            each 
            Expression.Identifier([Name]) & " = " & [TypeName]),
    Output = 
        "[" & Text.Combine(AddCustom[TypeNames], ", ") & "]"
in
    Output

Nothing complex here, but now I’ve posted this I know that in the future I’ll be able to Google for it when I’m working onsite with a customer and I need it!

To give you an idea of how it works, take the table that is returned by the following M expression, which calls the public TripPin OData web service:

OData.Feed(
"https://services.odata.org/TripPinRESTierService/Airports", 
null, 
[Implementation="2.0"])

image

Passing this table to the function above returns the following text, the M code for a record that lists the names of the columns in this table and their data types, suitable for use with #table:

[Name = Text.Type, IcaoCode = Text.Type, 
IataCode = Text.Type, Location = Record.Type]

Using Power Query And Microsoft Flow To Automate The Creation Of CSV Files

I was super-excited when, a few days ago, the long-awaited integration of Power Query into Microsoft Flow was announced. I was then gutted when work commitments meant I couldn’t blog about it right away and Erik Svensen beat me to it with this excellent post showing how to push data from SQL Server via Power Query and Flow to a Power BI streaming dataset:

https://eriksvensen.wordpress.com/2018/09/25/powerquery-everywhere-now-in-microsoftflow-as-well/

Anyway, the integration of Power Query into Flow opens up so many interesting possibilities so I couldn’t resist writing about it myself today. For example: something I am asked about a lot is whether it is possible to use Power Query to automate the creation of CSV files. You can’t do this in a supported way in Power Query in Excel or Power BI (and yes I know about all of the slightly hacky ways people have done it like this), but it is absolutely something you can do with Power Query in Flow.

Here’s how:

image

First of all I have a Schedule – Recurrence trigger, which allows you to run a flow on a schedule. In this case I’m setting the Flow to run once a day at 6am UK time:

image

Next I have the all-important Transform data using Power Query action. At the moment you can only use Power Query with SQL Server data sources, and (at least when I tested it) it only seems to work with Azure SQL Database data sources and not on-premises SQL Server databases via a gateway – hopefully this gets fixed soon. Clicking the Edit Query button:

image

…opens up the Power Query online editor, which will be familiar to anyone who has used Power Query in Excel or Power BI Desktop, and you build your query:

image

Next I have a Create CSV Table action to convert the results returned by the query to csv format:

image

And finally I take the csv data and save it to a CSV file in OneDrive For Business with a Create File action:

image 

Run the Flow and the CSV file is created:

image

And I can view the contents in Excel Online:

image

The Power Query/Flow integration is still in Preview and I found a few things didn’t work reliably: for example the first few times I ran my Flow I got errors saying that it couldn’t connect to the Azure SQL Database, even though it clearly could while I was designing the query, but that error went away after a while. What’s more it only works for SQL Server data sources right now and I really hope that it is enabled for all the other data sources that Power Query can connect to, especially Excel. These are just teething troubles though, and it’s clear that this is going to be revolutionary for Power Query and Flow users alike!

I really need to improve my Flow skills now…

Converting Decimal Numbers To Hexadecimal In Power Query M

This is a very short post! A lot of people have blogged about how to convert numbers between different bases in M (see for example Maxim Zelensky’s very elegant solution for converting from binary to decimal), but today I noticed there was a very easy way to convert a decimal number to hexadecimal using the Number.ToText() function: you just need to use “x” in the second parameter. For example:

Number.ToText(12, "x") //returns c
Number.ToText(123, "x") //returns 7b

I’m sure this will come in handy somewhere…

Using Html.Table() To Extract URLs From A Web Page In Power BI/Power Query M

Last year I blogged about how to use the Text.BetweenDelimiters() function to extract all the links from the href attributes in the source of a web page. The code was reasonably simple but there’s now an even easier way to solve the same problem using the new Html.Table() function. This function doesn’t seem to be documented online yet, but the built-in documentation for the function available in the Query Editor is up-to-date:

image

Miguel Escobar also has a great post showing how to use it and the new Web.BrowserContents function here.

Here’s an example M query that extracts all the links that start with the letters “http” from my company homepage:

let
    Source = 
	 Web.BrowserContents("https://www.crossjoin.co.uk/"),
    Links = 
	 Html.Table(
	  Source, 
	  {{
	   "Link", 
	   "a[href^=""http""]", 
	   each [Attributes][href]}})
in
    Links

image

To explain what’s going on here:

  • Web.BrowserContents returns the text of the html DOM for the web page
  • In the second step Html.Table takes that text and searches for all <a> elements whose href attribute starts with the letters “http”. I found this CSS selector here.

Comparing The Performance Of CSV And Excel Data Sources In Power Query

My posts from two weeks ago (see here and here) on using Process Monitor to troubleshoot the performance of Power Query queries made me wonder about another question: how does the performance of reading data from CSV files compare to the performance of reading data from Excel files? I think most experienced Power Query users in either Power BI or Excel know that Excel data sources perform pretty badly but I had never done any proper tests. I’m not going to pretend that this post is a definitive answer to this question (and once again, I would be interested to hear your experiences) but hopefully it will be thought-provoking.

To start off, I took the 153.6MB CSV file used in my last few posts and built a simple query that applied a filter on one text column, then removed all but three columns. The query ran in 9 seconds and using the technique from my last post I was able to draw the following graph from a Process Monitor log file showing how Power Query reads data from the file:

image

Nothing very surprising there. Then I opened the same CSV file in Excel and saved the data as an xlsx file with one worksheet and no tables or named ranges; the resulting file was only 80.6MB. Finally I created a duplicate of the first query and pointed it to the Excel file instead. The resulting query ran in 59 seconds – around 6 times slower! Here’s a comparison with the performance of this query with the first query:

image

The black line in the graph above is the amount of data read (actually the offset values showing where in the file the data is read from, which is the same thing as a running total when Power Query is reading all the data) from the Excel file; the green line is the amount of data read from the CSV file (the same data shown in the first graph above). A few things to mention:

  • Running Process Monitor while this second query was refreshing had a noticeable impact on its performance – in fact it was almost 20 seconds slower
  • The initial values of 80 million bytes seem to be where data is read from the end of the Excel file. Maybe this is Power Query reading some file metadata? Anyway, it seems as though it takes 5 seconds before it starts to read the data needed by the query.
  • There’s a plateau between the 10 and 20 second mark where not much is happening; this didn’t happen consistently and may have been connected to the fact that Process Monitor was running

In any case you don’t need to study this too hard to understand that the performance of reading data from an xlsx format Excel file is terrible compared to the performance of reading data from a CSV. So, if you have a choice between these two formats, for example when downloading data, it seems fair to say that you should always choose CSV over xlsx.

Using Process Monitor To Find Out How Much Data Power Query Reads From A File

This post is really just a quick follow-on from my post earlier this week on using Process Monitor to troubleshoot Power Query performance issues with file-based data sources, which I suggest you read before carrying on. I realised, after playing around with Process Monitor some more, that the ReadFile operation actually tells you how much data is being read from a file when a Power Query query is running. For example, here’s a sample of some of the ReadFile operations captured while running the unoptimised version of the query I talked about in my last post:

image

Since Process Monitor can export captured events to a CSV file, it’s pretty easy to load the events into Power BI, filter the events down to only the ReadFile operations, parse the Detail column to extract the Offset values (which I’m sure you can work out how to do if you’re reading a post like this), and then draw a graph showing how much data gets read from a file when a query is run. Here’s what the graph looks like for the unoptimised version of the query from my previous blog post, with relative time on the x axis and the amount of data read  in bytes on the y axis:

image

In that post I noted that there were six reads of the file – and while that’s clear from the graph above, it’s also possible to see that the first read does not read the whole contents of the file while the next five do (the file is 149MB). So maybe I was right that there is one complete read of the file for each row in the output query? What is that first, partial read for, I wonder?

%d bloggers like this: