Rendering Text On A Map With Power Query And Power Map

They say a good magician never reveals his tricks. Luckily, I’m a rubbish magician and so (at the request of Jason Thomas and David Eldersveld) I can reveal the secret behind one of the magic tricks I used in my sessions at SQLBits and the PASS Summit last year: my mind-reading card trick. More specifically, this post will show you how I was able to find out which card my helper had picked from a deck and how I was able to make its name appear in Power Map. Alas the video of the hour-long SQLBits session isn’t available for some reason, but the video of the shortened version of the session I did at the BI Power Hour at the PASS Summit is available for purchase here along with other, much more useful content.

First of all, the mind-reading bit. I hate to admit it but it had nothing to do with this rather fetching hat – the Microsoft Brain Interface – modelled here by Matt Masson:

image

In fact I had some accomplices (Bob Duffy and Carmel Gunn at SQLBits, Matt Masson at PASS) who entered the name of the card into a text file stored in OneDrive when my helper showed it to the audience. I then used the technique Melissa Coates blogged about here to read the name from the text file using Power Query on my laptop.

Second, the harder problem of taking the name of the card and making it appear in Power Map. Again, I used the magic of Power Query. On an Excel worksheet I used 1s and blanks to represent each letter of the alphabet in its own 8×8 grid:

image

…and then I read this into Power Query, matched each 8×8 grid up to the letter it represented, split the name of the card into letters, put the two together and got a table containing a set of data points that, when plotted on a custom map in Power Map, could be read as text:

image

You can download a modified version of the workbook here which reads data from the worksheet instead of a text file and renders the text over a regular map in Power Map. Here’s an example of the output:

Sadly I’m even less of an artist than I am a magician, and I admit that the output could be prettier. David Eldersveld recently showed off a very impressive Power BI report that uses a scatter chart to display the Power BI logo (you can see the report live here):

image

I guess the logical next step would be to take my code, move it to Power BI, add a column with date values in and use the play axis on the scatter chart to display animated text or pictures. But, to be honest, that might be too geeky a project even for me.

The Table.Schema() Function In Power BI/M

Yet another new M function for you this week: the Table.Schema() function, which returns information about the columns in a table. There’s some fairly detailed documentation about what it returns here; a simple demo is always a lot more helpful though, I think.

If you connect to the Adventure Works DW database in SQL Server and import the DimDate table, you’ll get an M query that returns the contents of that table (along with some extra columns that describe the relationships between that table and the others in the database):

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data]
in
    dbo_DimDate

image

If you add an extra step to this query that calls the Table.Schema() function on the table returned by the dbo_DimDate step, like so:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    GetSchema = Table.Schema(dbo_DimDate)
in
    GetSchema 

…you get a table with one row for each column in the table returned by dbo_DimDate, and a lot of columns that give you information on each column such as its position in the table, its M data type, its data type in the original data source and so on:

image 

I can think of a lot of uses for this. Documenting a database is an obvious one; it would also serve as a much richer source of data when checking for changes in the structure of a data source, as I described here. Also, given how easy it is to tell whether two tables contain the same data in M, you could use this function to compare whether two tables have the same columns like so:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    SomeOtherTable = 
     Source{[Schema="dbo",Item="SomeOtherTable"]}[Data],
    TablesEquivalent = 
     (Table.Schema(dbo_DimDate)=Table.Schema(SomeOtherTable ))
in
    TablesEquivalent

If you want more detail when doing comparisons you can do that with a little bit more M code, but that’s probably a topic for another post..

Two New Books: “The Definitive Guide To DAX” And “’M’ Is For Data Monkey”

I’m not going to pretend that this blog post is a properly impartial review – I know the authors of both of these books to varying degrees – but I thought it was worth writing a few words on two new books I’ve acquired recently which are worth additions to any Power BI enthusiast’s bookshelf or e-reader.

The Definitive Guide To DAX

Something I’ll never understand about my friends Marco Russo and Alberto Russo is their love of writing books – they generally have a new one out every year, sometimes two (personally I find writing books painful). Their latest publication is “The Definitive Guide To DAX” and it does indeed live up to its title. No-one outside the dev team comes close to Marco and Alberto’s knowledge of DAX, the language of Power Pivot, Power BI Desktop modelling and SSAS Tabular, and in this book they have documented everything that they know about it down to the smallest detail. Want to know what the KeepFilters() function does? Or the GenerateAll() function? How about all the new DAX functions and features in the latest versions of Power BI Desktop which will also appear in SSAS 2016 Tabular? They’re all here, and more. As such this is essential purchase for anyone doing serious work on the Microsoft BI platform, although probably more as a reference than a book to read end-to-end. It’s fair to say there’s a certain amount of overlap between this and some of their previous books on Power Pivot and SSAS Tabular, but the language – and the community’s understanding of it – has evolved sufficiently to justify buying this book too.

[I received a free copy of this book for review]

Buy it here from Amazon UK | US

‘M’ Is For Data Monkey

As the author of the only other book on Power Query, I suppose I should really be keeping quiet about “’M’ Is For Data Monkey” in case you buy it instead of mine. However 18 months of UI changes and functionality improvements mean my book is now a bit out-of-date, and what’s more important is that Ken Puls and Miguel Escobar have had the advantage of a lot of real-world experience with Power Query that I didn’t have (indeed no-one had) when I was writing in early 2014. The book itself is not a formal introduction to the M language but a guide to what you can do with it in Power Query; while a lot of what’s here will be useful in Power BI this is definitely a Power Query book and the target audience is Excel Pros rather than BI Pros. The decision to focus on Excel Pros was a good one to make, in my opinion, because it plays to the authors’ strengths and means that the book has a very practical focus. A lot of the tips and tricks here are ones I’ve used successfully myself, and I don’t mind admitting that I learned one or two things from this book as well.

Buy it here from Amazon UK | US

Other Books Are Available…

There are a couple of other new books out that, although I haven’t seen them, will also be worth checking out. Rob Collie has just released Power Pivot and Power BI, essentially the second edition of DAX Formulas For Power Pivot; Matt Allington has just released Learn To Write DAX; both are going to be good choices for Excel users wanting a DAX tutorial. Finally, last week Teo Lachev announced on his blog that he has published the world’s first dedicated Power BI book. Teo is another author whose books I admire so I’m sure it will be excellent, although I’ll be interested to see how he handles the problem of writing about a product that changes so much so quickly.

Working With Compression In Power Query And Power BI Desktop

If you’re reading this post there’s one important question you’ll probably want to ask: is it possible to extract data from a zip file in Power Query/Power BI? The answer is, unfortunately, no (at least at the time of writing). As this answer from Tristan on the dev team explains, because there are so many flavours of zip file out there it’s an extremely difficult problem to solve – so it hasn’t been attempted yet. That said, there are two other mildly interesting things to learn about compression in Power Query/Power BI Desktop that I thought were worth blogging about…

The first is that Power Query/Power BI can work with gzip files. For example, given a gzip file that contains a single csv file, here’s an example M query showing how the Binary.Decompress() function can be used to extract the csv file from the gzip file and then treat the contents of the csv file as a table:

let
    Source = Binary.Decompress(
     File.Contents(
      "C:\Myfolder\CompressedData.csv.gz"),
       Compression.GZip),
    #"Imported CSV" = Csv.Document(Source,
     [Delimiter=",",Encoding=1252]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV"),
    #"Changed Type" = Table.TransformColumnTypes(
     #"Promoted Headers",{
      {"Month", type text}, {" Sales", Int64.Type}
      })
in
    #"Changed Type"

The other is that you’ll see Binary.Decompress() used when you import an Excel workbook that contains a linked table into Power BI Desktop. For example, consider an Excel workbook that contains the following table:

image

If this table is imported into the Excel Data Model as a linked table, and you then save the workbook and try to import it into Power BI using File/Import/Excel Workbook Contents:

image

… you’ll see this message:

image

Click Start and you’ll get another message:

image

If you choose the Copy Data option, the data from the Excel table will be copied into Power BI. But where is it stored exactly? A look in the Query Editor at the query that returns the data shows that it’s embedded in the M code itself:

let
    Source = Table.FromRows(
     Json.Document(
      Binary.Decompress(
       Binary.FromText(
        "i45WMlTSUTJSitWJVjIGskzALFMgy0wpNhYA", 
        BinaryEncoding.Base64), 
       Compression.Deflate)), 
      {"A","B"}),
    #"Changed Type" = Table.TransformColumnTypes(
     Source,{{"A", Int64.Type}, {"B", Int64.Type}})
in
    #"Changed Type"

That big chunk of text in the middle of the Source step is the data from the Excel table stored as a compressed JSON document, and again Binary.Decompress() is used to extract this data.

Avoiding Duplication Of Database Connection Information In Power BI

In a year’s time there will be a great opportunity for someone to present a session on “Power BI Development Best Practices” at the PASS Summit. Before then, we will all have to work out what those best practices actually are – probably the hard way. With that in mind, here’s a suggestion for one good practice that came out of a conversation at this year’s PASS Summit (thank you Prathy).

If you’re connecting to a data source like SQL Server in Power BI (or Power Query for that matter) you’re probably going to be importing multiple tables of data. For example, if I was importing data from my local SQL Server instance and the Adventure Works DW database, I might see something like this in the Navigator pane in Power BI:

image

Clicking the Load or Edit buttons would create five different queries to get data, one from each of the selected tables:

image

The problem here is that each query duplicates the connection information for the SQL Server database; for example the M code for the FactInternetSales query looks like this:

let
    Source = 
        Sql.Database("chriszbook", "adventure works dw"),
    dbo_FactInternetSales = 
        Source{[Schema="dbo",Item="FactInternetSales"]}[Data]
in
    dbo_FactInternetSales

That means that if you ever need to change the server or database that the queries point to – maybe because the server has migrated, or because you’re moving the reports from dev to test to production – then you have to edit each of these five queries. Which would be a pain.

Ideally Power BI would create a single connection that each of these queries could share, something like a data source in SSRS. In fact I can see that this has already been raised as a suggestion on the forum here and is under review; I’m sure some more votes would raise its profile. However there are certainly a number of different ways you can avoid this kind of duplication by making your own changes to these queries though.

One possible approach would be to create new Power BI queries that returned the names of the SQL Server instance and the database name, and for each of your main queries to reference these queries. To do this you would need to:

1) Create a new query using the Blank Query option:

image

2) Call the query SQLServerInstanceName:

image

3) Open the Advanced Editor window by clicking on the Advanced Editor button on the Home tab, deleting all the automatically generated code in there and replacing it with the name of the SQL Server instance that you want to connect to in double quotes:

image

4) Repeat steps 1-3 to create a new query called SQLServerDatabaseName that returns the name of the database that you want to connect to. It might also be a good idea to create a new group to separate these new queries from the ones that load the data:

image

5) Edit the queries that actually return data so that instead of hard-coding the instance name and database name, they take these values from the queries you have just created. A Power BI query can return a value of any data type (not just a table), and the queries created in steps 1-4 return values of type text – the names of the SQL Server instance and database to connect to. These queries can now be used as variables in other queries, so after editing the FactInternetSales query shown above would look like this:

let
    Source = 
        Sql.Database(SQLServerInstanceName, SQLServerDatabaseName),
    dbo_FactInternetSales = 
        Source{[Schema="dbo",Item="FactInternetSales"]}[Data]
in
    dbo_FactInternetSales

image

Now, if you ever need to change the connection you just need to change the values in these two queries rather than edit every single query that returns data.

There are other ways of solving this problem: for example you could have a query that returns the output of Sql.Database() (as used in the Source step in the FactInternetSales query shown above)and have all the other data load queries reference that. I like the approach I show here though because it makes it very easy to see the values for the SQL Server instance and database that are currently in use. If you’re being even more ambitious – maybe because you have many queries in many .pbix files that connect to the same database – you could even store connection information somewhere outside the .pbix file, maybe in another SQL Server database. But if you did that, you would then need to worry about the connection information for that database too…

Loading English LiDAR Open Data Into Excel And Power Map Using Power Query

…or, if you want the clickbait title of this post, “Creating cool Minecraft-type landscapes in Power Map”…

I’m a big fan of Owen Boswarva’s UK open data blog, and yesterday this post on English LiDAR data released by the Environment Agency caught my eye:

http://mapgubbins.tumblr.com/post/131424021480/open-data-release-of-aerial-lidar-data-for

I won’t repeat what he says because there’s nothing I could add to it, but basically this data is a 3D scan of large parts of England at extremely detailed resolutions. You can download the data from this site:

http://environment.data.gov.uk/ds/survey#/download

My first thought after reading this was post was that it should be possible to load this data into Excel (in general) and Power Map (in particular) using Power Query fairly easily. And indeed it was – so in this blog post I’ll show you how.

First of all, here’s a sample of the output. This is a Power Map custom map showing tile SU9090, which shows a section of the M40 motorway and its surroundings near High Wycombe:

image

Zooming in on an area just below the centre of the above screenshot you can see several distinctively-shaped office buildings, the M40 going across the screen and, on the top right-hand side, a Tesco superstore:

image

Here’s the same area shown in Bing Maps (http://binged.it/1RkakM7):

image

Here’s the obligatory video:

How did I do this? The first step is to load the data into the Excel Data Model using Power Query. I created a Power Query function called LoadLiDARData to do this, and you can get a sample workbook with it in from here. There isn’t much to say about the function – it’s not trivial, but not very interesting in itself. To use it, in Excel go to the Power Query tab on the ribbon and open the Workbook Queries pane. To load data from a single .asc file downloaded from the Environment Agency site, right-click to invoke the function and enter the filename (including the full path) of that file. The Power Query Editor window will open but you don’t need to do anything here except click the Close & Load button. I recommend you load the output to the Excel Data Model rather than a table in the worksheet because a single file contains a lot of data!

Next, go to the Insert tab on the Excel ribbon and click the Map button to open Power Map. Then click New Scene\New Custom Map to create a new custom map. You need to make sure that the Min values for the X and Y axis are set to 0, the Max value for X is set to the number of columns and the Max value for Y is set to the number of rows in the data you’re working with. Also, check the Flip Axis button for the Y axis.

image

Then, on the right-hand side of the screen check the ColumnNumber and RowNumber boxes and set them to be the X Coordinate and Y Coordinate respectively:

image

Click the Next button, then check Height to use as the height of a bar graph:

image

Finally – and this is important – go to Settings and adjust the Thickness property appropriately. If you don’t do this you won’t be able to pick out any detail; I have used 18% as a good starting setting. Ticking the Lock current scale box is also a good idea, so that when you zoom in the bars don’t disappear.

image

This is no more than a proof-of-concept. I’ve only tested my code with a very limited amount of data, and there are a lot of obvious improvements that could be made: for a start, it would be good to convert the grid co-ordinates used to latitude and longitude so that this data can be used with the maps in Power Map. I could also imagine some really useful applications for this, such as combining a map like this with snowfall data and creating an animation showing where snow falls over a landscape during the winter.

Of course a dedicated tool will do a much better job of visualising this data than Excel, but if you have the right edition of Excel 2013 then you can get everything you need to do this for free and in Excel 2016 Power Query and Power Map (as Get & Transform and 3D Maps) are available out-of-the-box. It also sounds like Power Map will be integrated with Power BI at some point too:

image

 

UPDATE: Here’s a video of another tour I created using LiDAR data, showing Waterloo Station, the Houses of Parliament and Westminster Abbey in central London

Referencing Individual Cell Values From Tables In Power Query

[This blog post is relevant to Power Query in Excel 2010/2013, the Get & Transform section on the Data tab in Excel 2016, and the Get Data screen in Power BI Desktop. I’m going to use the term ‘Power Query’ in this post to refer to all of the previously mentioned functionality]

Sometimes, when you’re working with a table of data in Power Query, you want to be able to get the value from just one cell in that table. In this blog post I’ll show you how you can do this both in the UI and in M code, and talk through all of the more advanced options available in M. Incidentally this is a topic I covered in some detail in the M chapter of my Power Query book, but since that book is now somewhat out-of-date I thought it was worth covering again in a blog post.

Referencing Cell Values In The UI

Imagine your data source is an Excel table that looks like this:

image

If you import it into Power Query, and you want to get the value in the cell on the second row in ColumnB, then in the Query Editor window you just need to right-click on that cell and select Drill Down:

image

…and bingo, you have the value 5 returned:

image

Note that this is the value 5, and not the value 5 in a cell in a table – a Power Query query can return a value of any data type, and in this case it’s going to return a single integer value rather than a value of type table. If you load the output of this query into Excel you’ll still see it formatted as a table, but if you’re using the output of this query as an input for another query (for example, you might want to read a value from Excel and use that value as a filter in a SQL query) it’s much more convenient to have an integer value than a table with one column and one row.

Referencing Cell Values in M

You can see in the screenshot above the M code generated for the Drill Down by the UI, and probably guess how it works. Here’s a cleaned-up version of the query from the previous section for reference:

let
    Source = Excel.CurrentWorkbook(){[Name="SourceData"]}[Content],
    ChangeDataTypes = Table.TransformColumnTypes(
                                        Source,
                                        {{"ColumnA", Int64.Type}, 
                                         {"ColumnB", Int64.Type}, 
                                         {"ColumnC", Int64.Type}}),
    GetMiddleCell = ChangeDataTypes{1}[ColumnB]
in
    GetMiddleCell

There are three steps here:

  • Source reads the data from the Excel table
  • ChangeDataTypes sets the data types for the three columns in the table to be Whole Number
  • GetMiddleCell returns the value from the middle cell of the table returned by the ChangeDataTypes step

M allows you to refer to individual values in tables by a system of co-ordinates using the name of the column and the zero-based row number (tables in Power Query are always assumed to be sorted, therefore it making it possible to ask for a value from the nth row). So the expression
ChangeDataTypes{1}[ColumnB]

returns the value from the cell on the second row in the column called ColumnB of the table returned by ChangeDataTypes, which is 5. Similarly, the expression

ChangeDataTypes{0}[ColumnC]
returns the value 3, which is the value in the column ColumnC on the first row.

It’s also worth pointing out that the row and column reference can be in any order, so the expression

ChangeDataTypes{1}[ColumnB]

…returns the same value as

ChangeDataTypes[ColumnB]{1}

As you’ll see in a moment, the order that the row and column reference come in could be important.

Referring To Rows Or Columns That Don’t Exist

What happens if you write an expression that refers to a row and/or column that doesn’t exist? You get an error of course! So using our example query, the expressions

ChangeDataTypes{4}[ColumnB]

and

ChangeDataTypes{1}[ColumnD]

…will both return errors because there isn’t a fifth row in the table, and there isn’t a column called ColumnD.
image

However, instead of an error you can return a null value by using the ? operator after the reference. For example, the expression

ChangeDataTypes{1}[ColumnD]?

returns the value null instead of an error:

image

You have to be careful though! The expression

ChangeDataTypes{4}?[ColumnB]

still returns an error, not because there isn’t a fifth row but because the reference to the fifth row returns a null value and there is no column called ColumnB in this null value.

image

The solution here is to reverse the order of the references, like so:

ChangeDataTypes[ColumnB]{4}?

or even better use ? with both references:

ChangeDataTypes{4}?[ColumnB]?

image

Unfortunately using ? won’t stop you getting an error if you use a negative value in a row reference.

The Effect Of Primary Keys

Did you know that a table in Power Query can have a primary key (ie a column or columns whose values uniquely identify each row) defined on it? No? I’m not surprised: it’s not at all obvious from the UI. However there are several scenarios where Power Query will define a primary key on a table, including:

  • When you import data from a table in a relational database like SQL Server, and that table has a primary key on it
  • When you use the Remove Duplicates button to remove all duplicate values from a column or columns, which behind the scenes uses the Table.Distinct() M function
  • When you use the Table.AddKey() M function on a table

The presence of a primary key affects how the Drill Down functionality works, and gives you another way of referencing individual cells.

Consider the following Excel table, basically the same as the table you’ve seen already but with a new column that uniquely identifies each row:

image

If you load the table into Power Query and then right-click on the column MyKeyColumn and select Remove Duplicates, you will have set this key as a primary key:

image

(By the way, you can use the Table.Keys() function to see what keys are defined on a table in Power Query, and there’s an example query showing how to use this in the sample workbook for this post).

Having done this, if you use the Drill Down functionality to get the value from the second row in ColumnB once again, this time you’ll get a query something like this:

image

let
    Source = Excel.CurrentWorkbook(){[Name="SourceData3"]}[Content],
    ChangedDataTypes = Table.TransformColumnTypes(
                                      Source,
                                      {{"MyKeyColumn", type text}, 
                                      {"ColumnA", Int64.Type}, 
                                      {"ColumnB", Int64.Type}, 
                                      {"ColumnC", Int64.Type}}),
    RemovedDuplicates = Table.Distinct(
                                     ChangedDataTypes, 
                                     {"MyKeyColumn"}),
    SecondRow = RemovedDuplicates{[MyKeyColumn="SecondRow"]}[ColumnB]
in
    SecondRow

The last step is the important one to look at. The row in the reference is no longer by the row number but by the value from the primary key column instead:

RemovedDuplicates{[MyKeyColumn=”SecondRow”]}[ColumnB]

image

You can still use the previous row number-based notation, but when a table has a primary key column defined on it you can also use a value from the primary key column to identify a row.

A Last Warning About Performance

Being able to reference individual values like this is incredibly useful for certain types of query and calculation. However, bear in mind that there are often many different ways of solving the same problem and not all of them will perform as well as each other. One obvious use of the techniques I’ve shown in this post would be to write a previous period growth calculation, where you need to refer to a value in a previous row in a table – but my experience is that writing calculation using row and column references prevents query folding and leads to poor performance, and an alternative approach (maybe like the ones shown here and here involving joins) often performs much better. There aren’t any general rules I can give you though, you just need to make sure you test thoroughly.

You can download the sample workbook for this post here.