The M Code Behind Power BI Parameters

For me the most exciting new feature in Power BI in a long while is the appearance of Query Parameters for data loading. We have been promised an official blog post explaining how they work (although they are very easy to use) and in fact Soheil Bakhshi has already two very good, detailed posts on them here and here. What I want to do in this post, however, is look at the M code that is generated for them and see how it works.

Consider the following parameter built in Power BI Desktop that has, as its possible values, the names of all of the days of the week:

image

The first thing to notice is that parameters are shown as a special type of query in the Queries Pane, but they are still a query:

image

This means that you can open up the Advanced Editor and look at the M code for the query. Here’s the code for the query shown above:

[sourcecode language=”text” padlinenumbers=”true”]
"Monday"
meta
[
IsParameterQuery=true,
List={"Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday", "Sunday"},
DefaultValue="Monday",
Type="Text",
IsParameterQueryRequired=true
]
[/sourcecode]

From this you can see that the value returned by the parameter query is just a single piece of text – it’s the value “Monday” that is set as the Current Value, that’s to say the value returned by the parameter itself. The interesting stuff is all in the metadata record associated with the value. I blogged about metadata here, so you may want to read that post before going any further; it’s pretty clear that the fields in the metadata record correspond to the values set in the UI. All of the fields in the metadata record can be edited in the Advanced Editor if you want.

When the parameter is used in another query it is referenced like any other query value. For example, if you load the DimDate table from the Adventure Works DW sample database and use the parameter above to filter the EnglishDayNameOfWeek column then the code generated in the UI looks like this:

[sourcecode language=”text” highlight=”10″]
let
Source =
Sql.Databases("localhost"),
#"Adventure Works DW" =
Source{[Name="Adventure Works DW"]}[Data],
dbo_DimDate =
#"Adventure Works DW"{[Schema="dbo",Item="DimDate"]}[Data],
#"Filtered Rows" =
Table.SelectRows(dbo_DimDate,
each [EnglishDayNameOfWeek] = Day)
in
#"Filtered Rows"
[/sourcecode]

The filtering takes place in the #”Filtered Rows” step and you can see where the name of the parameter – Day – is used in the Table.SelectRows() function to filter the EnglishDayNameOfWeek column. This is nothing new in terms of the language itself because you have always been able to return values of any data type from a query, not just tables, and you have always been able to reference queries in other queries like this – in fact you can see me write the same kind of code manually in this video. What is new is that there is now a UI to do this and there’s no need to write any code.

Personally, I think the Power BI team have done a great job here in terms of usability and clearly a lot of thought has gone into this feature. It doesn’t do everything I would want yet though: the ability to bind the list of available values to the output of another query and the ability to select multiple parameter values at the same time are obvious missing features (and ones that would be needed to match the parameter functionality in SSRS). However I would not be surprised to see them appear in a future version of Power BI.

After seeing the code, I wondered whether I could edit the code in the parameter query to make it do more interesting things. For example, even if the UI doesn’t support data-driven lists of available values for a parameter, it looks as though it should be possible to replace the hard-coded list with a list of values returned by another query. Unfortunately this does not work: any changes I tried to the parameter query code were either ignored or removed completely. A bit of a disappointment but again, hopefully this will be possible in a future version.

Thoughts On SandDance And Power BI

After SandDance was announced at the Microsoft Data Insights Summit a few weeks ago I had a quick play with it, thought to myself that it looked like it it would provide a few more cool data visualisation options, and then almost forgot about it. More recently I spent some time looking at SandDance in more detail and it got me thinking some more about what its uses today are and what what its future might be. There has been a lot of hype surrounding SandDance but not a lot of clarity about where it is positioned in the Power BI story; to be honest I’m still not quite sure where it fits myself and I wouldn’t be surprised if Microsoft doesn’t know either, or at least is keeping its options open.

One thing that is worth pointing out is that it comes from Microsoft Research and is released through Microsoft Garage which is, and I quote, an “outlet for experimental projects”. This suggests that it isn’t a polished product but more of a work-in-progress or an experimental platform. This certainly matches my impressions of the tool and those of Ruth Pozuelo and Alon Brody, who have blogged about it already: in many respects it’s very sophisticated but in others it is quite limited. Will it ever become an ‘official’ product? Other tools have followed this path: you may remember Power Query was originally an experimental project called Data Explorer and released though a site called Azure Labs, a predecessor to the Microsoft Garage site, so it is possible.

Another aspect of the SandDance story that deserves discussion is whether it’s just another custom visualisation or something more. This post on the Power BI blog talks about is as though it’s the former and I guess you could see it just as a way of accessing a lot of new chart types (such as small multiples) for your reports. The charts its creates are certainly eye-catching, as are the animated transitions, and the importance of that  – especially for sales demos – should not be underestimated.

image

However, it seems clear to me that SandDance is really an interactive visual data exploration tool, and indeed this is what the SandDance website suggests:
”SandDance is a web-based application that enables you to more easily explore, identify, and communicate insights about data.”
Microsoft doesn’t currently have any other products that compete in this sector: Power BI reports and dashboards are for publishing pre-defined, semi-static insights rather than true ad-hoc analysis, and while Excel PivotTables are great for starting with a blank sheet and exploring your data, they are certainly not visual; I don’t think Excel PivotCharts are a true visual exploration tool either, more of a visual representation of data in a PivotTable. Does Microsoft need a product in this area? I think it does if it wants to compete directly with Tableau, the gold standard in visual data exploration. Adding SandDance to Power BI makes Power BI a much more rounded product.

A third question is this: why is there a standalone version of SandDance and a Power BI custom visual? This blog post contains an interesting statement from Steven Drucker, principal researcher on the SandDance team:
“Using the Microsoft Garage as the release platform gives us the freedom to run experiments with the more accessible standalone version, and as we learn what you like and what works, we can add the right parts to the Power BI visual,”
This strongly suggests that the standalone version is really just a place for testing new functionality and that the Power BI custom visual is the main focus. Does this contradict the point I made above, and is it just the standalone version that is the ‘experimental’ tool? I’m not sure, because at the moment there don’t seem to be many differences in functionality between the two versions. We’ll have to see how things develop. This statement also suggests that if SandDance does grow up to be a real product, it will be as part of Power BI. This makes commercial sense – every new Microsoft BI product should be integrated with Power BI in my opinion. What’s more, many of SandDance’s current limitations (for example around loading and refreshing data) are solved by using the capabilities of the Power BI platform.

However I’m not sure integrating SandDance into Power BI as a custom visualisation, or rather only as a custom visualisation, is a good idea. At the moment the SandDance custom visualisation feels a bit awkward to use: it’s one tool embedded inside another with two inconsistent and often overlapping UIs. I would prefer to see it as a separate tool launched from the PowerBI.com portal, similar to how the original Power View is/was launched from SharePoint, a third way to interact with data stored in Power BI alongside regular Power BI reports and Excel reports. Users should be able to launch it in the same way as Analyze in Excel and use it to explore a data set directly without having to create a report first, and if they find something interesting they should be able to pin what they have created as a visual to a dashboard, or save it for use in a regular Power BI report. Doing this would require a lot more time and effort on the part of Microsoft than just building a custom visual, but at the moment there seems to be no shortage of resources available to the Power BI team. SandDance is undoubtedly a great first step but with some more investment from Microsoft it could be a much more important part of the Power BI story.

Dynamic Chart Titles In Power BI

UPDATE April 2019: It is now possible to use DAX expressions such as the ones described in this post directly in the Title property of a visual. See https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-april-2019-feature-summary/#dynamicTitles

As you probably know, charts (and lots of other visualisations) in Power BI have titles that can be set to any piece of static text. You can do this by selecting the chart, going to the Format tab in the Visualizations pane, then changing the properties in the Title section as shown below (full documentation here):

image

But what about if you want the chart title to change depending on what is selected? For example, you might be using slicers or filters to allow a user to choose which days of the week they want to see data for. In that situation you might want to add a title that shows which days of the week have actually been selected; this would be particularly important if the report uses filters, or if the report is going to be printed. Unfortunately the built in Title Text property can’t be used to display dynamic values but in this blog post I’ll show you how to solve this problem using DAX.

Here’s a simple example of a report that contains a dynamic chart title:

image

Using data from the Adventure Works DW database I’ve created a simple data model containing a Date dimension table called DimDate and a fact table called FactInternetSales; the DimDate table contains a field called EnglishDayNameOfWeek that contains the names of the days of the week, and the report contains a column chart that shows a Sales measure broken down by day of week. There’s also a slicer where the user can select one or more day and at the top there’s a title that lists the day names selected in the slicer and displayed in the chart.

There are two parts to the solution. The first part is to create a measure that will return the text needed for the chart title, and this relies on the DAX ConcatenateX() function that I blogged about here. Here’s the DAX for the measure:

Title =

"Sales Amount for "
& CONCATENATEX (
VALUES ( 'DimDate'[EnglishDayNameOfWeek] ),
'DimDate'[EnglishDayNameOfWeek],
", "
)

Here, the Values() function is used to return a table containing all of the selected days of the week, and this is then passed to ConcatenateX() to get a text value containing a comma delimited list of the day names.

The second part of the solution deals with how to display the value returned by the measure. In the report above I used a Card visualisation, dropped the measure above into the Field area and then turned off the Category Label on the Format tab so that only the value returned by the measure, and not the name of the measure itself, is displayed:

image
image

And this is all you need to do to recreate the report above.

We can make this better though! Instead of a simple comma delimited list of day names it would be better if we could change the last comma in the list to an “and”:

image

Also, if all the day names were displayed, it would be good not to display a long list of day names but show some default text instead:

image

Here’s the DAX for a measure that does all this:

Title2 =

VAR SelectedDays =
VALUES ( 'DimDate'[EnglishDayNameOfWeek] )
VAR NumberOfSelectedDays =
COUNTROWS ( SelectedDays )
VAR NumberOfPossibleDays =
COUNTROWS ( ALL ( 'DimDate'[EnglishDayNameOfWeek] ) )
VAR AllButLastSelectedDay =
TOPN ( NumberOfSelectedDays - 1, SelectedDays )
VAR LastSelectedDay =
EXCEPT ( SelectedDays, AllButLastSelectedDay )
RETURN
"Sales Amount "
& IF (
NumberOfSelectedDays = NumberOfPossibleDays,
"By Day Of Week",
"For "
& IF (
NumberOfSelectedDays = 1,
"",
CONCATENATEX (
AllButLastSelectedDay,
'DimDate'[EnglishDayNameOfWeek],
", " )
& " And "
)
& LastSelectedDay
)

Using a series of DAX variables to make the code more readable, here’s what this measure does:

  • If the number of days selected is the same as the total number of possible days, return the title text “By Day Of Week”, otherwise
    • If two or more days have been selected, then return a comma delimited list containing all but the last selected day (I used TopN() to get that table of all but the last selected day) plus a trailing “ And “. If only one day has been selected, return an empty string. Then
    • Concatenate the last selected day to the text returned by the previous step. I’ve used the Except() function to find the day that was excluded by the TOPN() function in the previous step.

You can download a .pbix file containing all the code from this post here and I’ve published the report here.

Timing M Query Execution In Power Query And Power BI (Revisited)

Some time ago I wrote the following post on how to work out how long your M query takes to execute in Power Query:

http://blog.crossjoin.co.uk/2014/11/17/timing-power-query-queries/

While it’s still relevant for Power Query and now of course Power BI (and it also illustrates some interesting facts about query execution), recently I had an interesting discussion with Colin Banfield in the comments section of that post that led to us finding an easier way of measuring query execution times.

In M, the DateTime.LocalNow() function returns the current system date and time at the time the function is called – which means that you could call it multiple times in the same query and get different results. There’s also the DateTime.FixedLocalNow() function which also returns the system date and time; the difference between the two is, as the documentation says:

This value is fixed and will not change with successive calls, unlike DateTime.LocalNow, which may return different values over the course of execution of an expression.

The question here is, though, what time does DateTimeFixedLocalNow() actually return? I asked on the Power Query forum here and Ehren from the Power Query dev team revealed that it returns the system date and time at the point when the query begins.

This means that it can be used to simply the original method I used to find query execution. Here’s a query that uses Function.InvokeAfter() to create a delay of 5 seconds during query execution and returns the difference between the values returned by DateTime.LocalNow() and DateTime.FixedLocalNow():

[sourcecode language=”text” padlinenumbers=”true”]
let
SomeFunction = ()=> DateTime.LocalNow(),
InvokeAfter5Secs =
Function.InvokeAfter(SomeFunction, #duration(0,0,0,5)),
Output =
Duration.ToText(InvokeAfter5Secs – DateTime.FixedLocalNow())
in
Output
[/sourcecode]

image

If you’re using this on a real query I strongly recommend you read my original post carefully and make sure that all of the steps in your query are executed, but does make things a little bit easier.

Thoughts On The Power BI Announcements At The MS Data Insights Summit

I’m not at the Microsoft Data Insights Summit, unfortunately, but I watched the live stream of the keynote today and was duly overwhelmed by the new features announced. I’m not going to summarise what was said because there’s already a post on the official Power BI blog that does that here:

https://powerbi.microsoft.com/en-us/blog//over-5-million-subscribers-are-embracing-power-bi-for-modern-business-intelligence/

…but I wanted to pick out a two topics for more in-depth discussion.

Query Power BI Datasets from Excel

For me, by far, the biggest announcement made today was that by the end of this month we’ll be able to connect an Excel PivotTable up to a Power BI dataset and query it as if it was an Analysis Services cube or a local Power Pivot Data Model. In my opinion this a true killer feature. For the last few years my customers have asked me when MS was going to release SSAS in the cloud and I’ve always replied that Power BI is SSAS in the cloud – it’s just tightly coupled with a front-end right now. That makes things simple in some ways but it also causes problems:

  • You have to build your data model and reports all in the same file, which is a pain because when you want to update one you need to deploy both
  • Other report authors who want to use your model for their reports need to get hold of your .pbix file and edit it
  • Everyone has to use Power BI to build reports on data stored in Power BI

This move solves these problems. It makes manageability much easier: you can build a Data Model, publish it, then you and other developers can build reports on it separately. What’s more it means that all those users who prefer to use Excel, either because it’s the tool they are most comfortable with or because it’s the best option for building the reports they want to build (Excel has a lot of great functionality that Power BI reports don’t have right now and will probably never have), to build reports can do so without having to copy the data into Excel.

Even better, as Amir hinted, this isn’t going to just work with just Excel. If this works the way I assume it works you’ll be able to use all the other MDX-capable front-end tools out there, like SSRS or Pyramid, to query your Power BI data, meaning that you have even more choices for consuming data held in Power BI.

SandDance: A symbol of how MS is finally exploiting all of its assets

SandDance is undoubtedly a cool data visualisation tool and will be great for creating ‘wow’ demos. It’s also not, as the hype is suggesting, something new – I saw it on the Microsoft Research site at least two years ago. What is really important about SandDance is that it shows off another area that Microsoft has got right with Power BI: it has created a platform that other developers both inside and outside Microsoft has build on top of. In this case Microsoft Research had some eye-catching data visualisation technology but not something that made sense as a standalone tool (yes, I know it is actually available as a standalone tool but let’s face it, it wouldn’t make sense as a commercial product). Integrate this data visualisation technology with Power BI, though, and suddenly you’re part of a much richer product that is commercially viable. SandDance gets the chance to be used for real-world purposes by a lot more users; Power BI gets another great bit of functionality; it’s a win-win.

The ‘build a platform’ strategy is classic Microsoft. It’s the story of Windows. It has some disadvantages in that the different pieces might not always fit together as smoothly as it should (compare and contrast with Tableau, which could be seen as the Apple of the data visualisation world) but it gets useful, specialised features out there very quickly. Microsoft has a wealth of great technology scattered across its different divisions that, historically, it has struggled to bring together coherently, but it looks like Power BI is managing to buck the trend. The integration with Azure Stream Analytics and Azure SQL DW that exist today, the integration Power Apps that was demoed in the keynote, the use of machine learning in Quick Insights and the integration with Azure ML that I also saw mentioned on Twitter today, are other examples of this happening.

Microsoft also has a strong partner community in the BI space that did good business building solutions on the SQL Server BI stack; Power BI v1 did not offer the same opportunities and was neglected by partners as a result but Power BI today offers partners a lot more opportunities and so the they are back out there building and selling on Microsoft’s behalf again. It’s happening in the Custom Visuals Gallery, it’s happening with the Power BI API and apps like Power BI Tiles,  and it’s going to happen in a lot of other ways in the future too.

Dynamic Column Selection In Custom Columns In Power BI/Power Query/M

When you’re writing an M expression for a custom column in Power BI or Power Query it’s easy to reference the values in other columns. A slightly more difficult question, however, is how can you reference column values dynamically? For example, given the following input table:

image

How can you use the values in the “Column To Select” column to dynamically select values from either Column 1, Column 2 or Column 3? For example, on the first line of the table the “Column To Select” column contains the value 2, so the calculated column should contain the value from “Column 2”; on the second line of the table the “Column To Select” column contains the value 1, so the calculated column should contain the value from “Column 1” and so on:

image

There are a number of different ways to achieve this. You could, for instance, write a nested if or do some other kind of fancy conditional logic in M, but this could result in a lot of code. You could also unpivot all the columns except “Column To Select”, do some filtering, then pivot the data again but that would also be quite complicated. Probably the easiest way of doing this is with the Record.Field() function, which allows you to get the value of any given field in a record by passing the name of that field as a text value.

Here’s an example query that generates the table shown in the first screenshot above in its first step, sets some data types on the columns in the second step, then creates the custom column shown in the second screenshot in the final step:

[sourcecode language=”text” padlinenumbers=”true” highlight=”18,19,20″]
let
Source = #table(
{"Column 1", "Column 2",
"Column 3", "Column To Select"},
{{10,20,30,2},
{30,40,50,1},
{50,60,70,3}}),
#"Changed Type" = Table.TransformColumnTypes(
Source,
{{"Column 1", Int64.Type},
{"Column 2", Int64.Type},
{"Column 3", Int64.Type},
{"Column To Select", Int64.Type}}),
#"Added Custom" = Table.AddColumn(
#"Changed Type",
"Selected Column Value",
each
Record.Field(
_,
"Column " & Text.From([Column To Select])))
in
#"Added Custom"
[/sourcecode]

Here’s the expression for the custom column isolated, in the form that you would use in the Add Custom Column dialog:

[sourcecode language=”text”]
Record.Field(_, "Column " & Text.From([Column To Select]))
[/sourcecode]

image

Understanding how to use the Record.Field() function here leads us to an interesting side discussion about custom columns, functions, records and each expressions. The full M expression for the third step in the query is:

[sourcecode language=”text”]
Table.AddColumn(
//the table returned by the previous step
#"Changed Type",
//the name of the new custom column
"Selected Column Value",
//a function to call for each row of the table
//returning the value to display in the custom column
each Record.Field(_, "Column " & Text.From([Column To Select])))
[/sourcecode]

The first two parameters of the Table.AddColumn() function are straightforward but if you read the documentation you’ll see that the third parameter requires a value of type function. This function is called for every row in the table and returns the values in the custom column. It doesn’t look like it but there is a new function being defined here. In M an each expression can be used to define a function that has no name and that takes one parameter whose name is _ (ie the underscore character). The Table.AddColumn() function passes this new function a value of type record representing all of the values in the current row of the table, and that means Record.Field() can access this record by using _. What’s more, when referring to fields in this record you don’t even have to say _[Column To Select], you can drop the underscore and just say [Column To Select], as in the code example above. All this means that the expression

[sourcecode language=”text”]
each Record.Field(_, "Column " & Text.From([Column To Select]))
[/sourcecode]

is basically the same as

[sourcecode language=”text”]
(_)=> Record.Field(_, "Column " & Text.From(_[Column To Select]))
[/sourcecode]

…which, if you know a bit of M, makes a lot more sense but for the average user is probably more intimidating.

A Quick Look At Azure Data Catalog

I’ve been meaning to look at Azure Data Catalog for a long time but somehow never got round to it until now. I’m not sure why – perhaps it’s because there’s so much other cool stuff happening in the world of Power BI and Azure, and Azure Data Catalog seems a bit, um, boring in comparison. Maybe it’s a bit too much like documentation and all that other stuff we instinctively try to avoid. Whatever the reason, now I have looked at it I’m impressed and I can see its uses, not just for Power BI but also for traditional corporate BI scenarios. It certainly deserves a lot more attention from the Microsoft BI community than it seems to be getting at the moment.

First of all, what is Azure Data Catalog anyway? There’s lots of good documentation, videos and a tutorial here:
https://azure.microsoft.com/en-us/documentation/services/data-catalog/

…but the short answer is that it’s a cloud based service for cataloguing all the data sources you want to use for BI. It doesn’t store any of the data itself (except for a small amount for preview purposes, and you can turn that off), just the connection details; it also stores metadata (column names, data types etc) as well as descriptions, documentation and contact details for people who own or know about the data. End users can search the catalogue for data and even, for some data source types like SSAS, click a button to open that data source directly in Excel or Power BI Desktop.

image

Is it any good? Well, I managed to get going with it very easily. As well as an Azure subscription it does require you to have Azure Active Directory set up though, which is a pain – some of my customers (mostly the ones that have bought into Office 365 and Power BI) have it already, but I know for others it would be a showstopper. After setting everything up I was able to register SQL Server, SSAS and other types of data source very quickly using both the desktop application and the browser interface. It’s certainly simple enough for a reasonably technical end user to use and the implementation is very solid. I do have a few minor quibbles (I found the search results interface slightly confusing at first but soon came to grips with it) and it’s clear that it’s still very much a work in progress (right now the browser interface supports registering more data types than the desktop app, for example) but nothing very significant.

There is an obvious need for Azure Data Catalog if you are going to use Power BI and have embraced the whole concept of self-service BI: it makes it much easier for your users to find and consume the various different data sources that you have available. However, as I said earlier, I think on-premises users of SSAS and SSRS could also benefit from it too. It’s becoming increasingly common for me to see organisations with several SSAS cubes in production across various servers, each of them with a confusing list of measures and dimensions. Equally I’ve seen plenty of SSRS implementations with hundreds of reports, with nobody knowing what many of these reports do or even if they are used at all. Azure Data Catalog could provide a convenient central place to document and discover these cubes and reports. I guess it all depends on whether you can be bothered to do this documentation and annotation though, rather than whether you have a suitable tool for this purpose; I don’t think many organisations can be bothered, unfortunately.

One last thing: Azure Data Catalog is meant to include replacement functionality for the old Power BI for Office 365 Data Catalog but as yet it doesn’t allow you to publish or share Power Query queries. This is something that I and other Power Query/Power BI fans would love to see, especially if (unlike the old Data Catalog) it let you publish updates to your code without needing to manually update each Excel workbook/Power BI report that used it afterwards.

Working With CSV Files That Contain Rogue Line Breaks In Power Query And Power BI

CSV files often contain text fields, and text fields often contain line breaks that shouldn’t be there (especially if the text is entered by end users). For example, here is are the contents of a simple CSV file that should contain three fields(Product, Comment and Sales) and six rows:

[sourcecode language=”text” padlinenumbers=”true”]
Product,Comment,Sales
Apples,This is a commment,10
Oranges,Another comment,20
Pears,A comment with a
line break in,30
Grapes,A normal comment,40
Bananas,A comment
with
two line breaks,50
Pineapples,A normal comment,60
[/sourcecode]

However, as you can see, the Comment field contains line breaks in some cases. If you try to load this file into Power Query/Power BI, you’ll see that it doesn’t handle things well:

image

The easiest way of solving this problem is to make sure you have double quotes around the text in the Comment field, so that Power Query/Power BI knows to ignore the line breaks:

[sourcecode language=”text” padlinenumbers=”true”]
Product,Comment,Sales
Apples,"This is a commment",10
Oranges,"Another comment",20
Pears,"A comment with a
line break in",30
Grapes,"A normal comment",40
Bananas,"A comment
with
two line breaks",50
Pineapples,"A normal comment",60
[/sourcecode]

image

[By the way, no text in the screenshot above has been truncated. The line breaks are still there in the Comment field; you can see the full text in the bottom pane of the Query Editor by clicking on a cell in the table.]

But what if you can’t fix the source data? I was asked this question the other week, and since I had been asked about it before and not come up with a good answer, I decided to spend some time researching the problem.

What I found was that it was relatively easy to write some M code that gave me the correct results, but very hard to write code that performed acceptably well on a large data set (I was testing on a CSV file containing almost half a million rows). Here’s the code for the function I ended up with:

[sourcecode language=”text” wraplines=”false” light=”true”]
(FileName as text, DelimiterCharacter as text, DelimitersPerLine as number, UseFirstRowAsHeaders as logical) =>
let
// Load file and buffer in memory
BufferedBinary = Binary.Buffer(File.Contents(FileName)),
// Convert to a single block of text
Source = Text.FromBinary(BufferedBinary),
// Find the position of all line breaks
LineBreaks = List.Buffer(Text.PositionOf(Source, "#(cr,lf)", Occurrence.All)),
// Get a list of numbers from 0 to the number of line breaks-1
LinePositions = List.Positions(LineBreaks),
// Count the number of line breaks
TotalLines = List.Count(LineBreaks),
// Split the text by line breaks
FirstSplit = Text.Split(Source, "#(cr,lf)"),
// On each line, count the number of column delimiter characters
DelimiterCounts = List.Buffer(
List.Transform(
LinePositions,
each List.Count(Text.PositionOf(FirstSplit{_},
DelimiterCharacter,
Occurrence.All)))),
// Loop through each line and
// a) Calculate a running total of the number of column delimiter characters on each line
// b) Divide the running total by the expected number of delimiters per line, and round down to the nearest integer
RunningSums = List.Buffer(
List.Generate(
()=>[Counter=0, RunningTotal=0],
each [Counter]<TotalLines,
each [Counter=[Counter]+1, RunningTotal=[RunningTotal]+DelimiterCounts{[Counter]}],
each Number.IntegerDivide([RunningTotal]+DelimiterCounts{[Counter]},DelimitersPerLine))),
// If the result of the division calculated in the previous step is
// the same as on the previous line, then the line break can be ignored
// so return null, otherwise return the position of the line break
FindFullLines = List.Transform(LinePositions, each if _=0 or RunningSums{_}<>RunningSums{_-1} then LineBreaks{_} else null),
// Remove nulls from the list
RemoveNulls = List.RemoveNulls(FindFullLines),
// Split the original text by the positions of the line breaks left
LineSplitterFunction = Splitter.SplitTextByPositions(List.Combine({{0},RemoveNulls})),
ActualLines = LineSplitterFunction(Source),
// Convert the result to a table
ConvertToTable = Table.FromList(ActualLines, Splitter.SplitTextByDelimiter(DelimiterCharacter), null, null, ExtraValues.Ignore),
// Use the first rows as headers if desired
PromoteHeaders = if UseFirstRowAsHeaders then Table.PromoteHeaders(ConvertToTable) else ConvertToTable
in
PromoteHeaders
[/sourcecode]

Assuming that you called this function LoadFile(), here’s an example of how to call it:

[sourcecode language=”text” padlinenumbers=”true”]
LoadFile(
//File name
"C:\MyFile.csv",
//Column delimiter character – in this case a comma
",",
//The expected number of commas to be found per line
//ie one less than the number of columns
2,
//If true then first row contains header values
true)
[/sourcecode]

The comments in the code should give you an idea of how it all works. A few other things to note:

  • If any of your lines contain extra columns, this approach won’t work
  • I’ve noticed a difference between how the Power BI Desktop and Excel 2016 behave with this code: the current Excel 2016 Query Editor will keep any line breaks after splitting on them, whereas Power BI Desktop does not. It’s easy to fix the Excel output though: just use the Clean transform to remove the line breaks.

I’m not really sure why it was so hard to come up with code that performed well, even after liberal use of List.Buffer(). It seemed like any time I referenced the original text (as returned by the Source step) in the List.Generate() function then performance died – so you’ll notice in the code above that I’m only referencing the list that contains the number of delimiters found per line.

Rendering Text On A Map With Power Query And Power Map

They say a good magician never reveals his tricks. Luckily, I’m a rubbish magician and so (at the request of Jason Thomas and David Eldersveld) I can reveal the secret behind one of the magic tricks I used in my sessions at SQLBits and the PASS Summit last year: my mind-reading card trick. More specifically, this post will show you how I was able to find out which card my helper had picked from a deck and how I was able to make its name appear in Power Map. Alas the video of the hour-long SQLBits session isn’t available for some reason, but the video of the shortened version of the session I did at the BI Power Hour at the PASS Summit is available for purchase here along with other, much more useful content.

First of all, the mind-reading bit. I hate to admit it but it had nothing to do with this rather fetching hat – the Microsoft Brain Interface – modelled here by Matt Masson:

image

In fact I had some accomplices (Bob Duffy and Carmel Gunn at SQLBits, Matt Masson at PASS) who entered the name of the card into a text file stored in OneDrive when my helper showed it to the audience. I then used the technique Melissa Coates blogged about here to read the name from the text file using Power Query on my laptop.

Second, the harder problem of taking the name of the card and making it appear in Power Map. Again, I used the magic of Power Query. On an Excel worksheet I used 1s and blanks to represent each letter of the alphabet in its own 8×8 grid:

image

…and then I read this into Power Query, matched each 8×8 grid up to the letter it represented, split the name of the card into letters, put the two together and got a table containing a set of data points that, when plotted on a custom map in Power Map, could be read as text:

image

You can download a modified version of the workbook here which reads data from the worksheet instead of a text file and renders the text over a regular map in Power Map. Here’s an example of the output:

Sadly I’m even less of an artist than I am a magician, and I admit that the output could be prettier. David Eldersveld recently showed off a very impressive Power BI report that uses a scatter chart to display the Power BI logo (you can see the report live here):

image

I guess the logical next step would be to take my code, move it to Power BI, add a column with date values in and use the play axis on the scatter chart to display animated text or pictures. But, to be honest, that might be too geeky a project even for me.

Analysing Power BI DMV Queries In Power BI Desktop

Experienced Analysis Services and Power Pivot developers will know all the interesting things you can learn from DMV queries. For example, in this blog post Kasper de Jonge shows how to use a DMV to find out which columns are using the most memory in a Power Pivot model; here’s an older post from me on using DMVs to show the dependencies between DAX calculations. There’s plenty of other interesting stuff about how data is stored and so on that you can find with a bit of digging, and there’s some reasonably up-to-date documentation on DMVs for SSAS here.

However, running DMV queries against a Power BI Desktop model (which of course runs a local version of the same engine that powers Analysis Services Tabular and Power Pivot) and more importantly doing something useful with the information they return, isn’t straightforward. You can run DMV queries from DAX Studio but that will only give you the table of data returned; you need to copy and paste that data out to another tool to be able to analyse this data. Instead it’s possible to use Power BI Desktop’s own functionality for connecting to Analysis Services to connect to its own local data model and run DMV queries.

If you’re connecting to an instance of Analysis Services in Power BI Desktop you need a server name and a database name and the same goes when you’re connecting Power BI Desktop to itself. This episode of Adam Saxton’s excellent Guy In A Cube YouTube show details how to connect SQL Server Profiler to Power BI Desktop in order to run a trace, and the method he uses to find the connection details also works for our purpose here; it’s definitely worth a watch because it goes into a lot of detail. However it’s much easier to get the connection details from DAX Studio using the following steps (thanks to Darren Gosbell for pointing this out):

  1. Open Power BI Desktop and load data as normal.
  2. Open DAX Studio and choose the Power BI Designer data source option:image
  3. Look in the bottom right-hand corner of the screen and you’ll see the server name, including the all-important port number, that you have connected to. In this case it’s
    localhost:52981image
  4. Run the following DMV query in a DAX query window. This will give you the nasty GUID that is the name of the only database in the Power BI data model:

    [sourcecode language=”text” padlinenumbers=”true”]
    SELECT [CATALOG_NAME] FROM $SYSTEM.DBSCHEMA_CATALOGS
    [/sourcecode]

    image

     

  5. Now go back to Power BI Desktop, click the Get Data button and choose Analysis Services. In the connection dialog enter just the server name and port number and the database name found in the previous steps, as well as your DMV query:image

     

Once you have done this, you can load the results of your DMV query into the data model and analyse the results just like any other data. Here’s an M query using the DMV from the blog post by Kasper referenced earlier that looks at the memory used by just one table in the model:

[sourcecode language=”text”]
let
Source = AnalysisServices.Database(
"localhost:52981",
"1107a5df-3dd9-4c16-80b6-cf571c4e794f",
[Query="SELECT dimension_name, attribute_name, DataType,
(dictionary_size/1024) AS dictionary_size
FROM $system.DISCOVER_STORAGE_TABLE_COLUMNS"]),
#"Filtered Rows" = Table.SelectRows(Source,
each ([dimension_name] = "FactInternetSales"))
in
#"Filtered Rows"
[/sourcecode]

And here’s a bar chart built from that query showing clearly that the SalesOrderNumber column in the FactInternetSales table is very expensive:

image

Before you get too excited about this, there are two major problems you need to be aware of when using this technique:

  1. The port number used to connect to the local data model and the database name will change every time you open the Power BI Desktop file, so you will have to edit the connection information manually if you want to refresh the data after reopening.
  2. You won’t be able to make this connection refresh once you have published the file to PowerBI.com – so this will only work on the desktop.

That said, I think this is still very useful for development purposes. At some point I expect we’ll probably get a range of management reports in PowerBI.com that show similar data on published reports – just like we used to have in the old Office 365 Power BI.