Combining Data From Multiple Worksheets In The Same Excel Workbook Using Power BI

It’s very common that you need to combine data from multiple worksheets in the same Excel workbook when you’re using Power BI or Power Query/Get&Transform in Excel. Indeed a lot of people have blogged about how to solve this problem, but none of the solutions I’ve found on the internet work in more complex scenarios when the data on each sheet needs some kind of transformation before it can be combined. I was asked to explain how to do this recently while teaching a Power BI class, so in this blog post I’m going to walk through a worked example and point out a few issues that might trip up even experienced Power BI users.

First of all, the source data. Let’s say you have an Excel workbook with four worksheets: Q1, Q2, Q3 and Q4. On each worksheet is some sales data for the three months in each quarter; for example the Q1 worksheet looks like this:

image

…the Q2 worksheet looks like this:

image

…and so on. The required output for Power BI should be a table that looks like this:

image

Now most of the blog posts that describe this problem, such as Ken Puls’s post here, assume each worksheet has a table with the same column names on it. If each sheet has the same columns, this means you can just connect to the Excel workbook and get a table containing the contents (Miguel Escobar has a great post describing how to do this here) and then click the Expand/Aggregate button:

image

However in this particular case it doesn’t solve the problem, because we get this:

image

Aha, you may say, we have to transform the data before we can combine it and so we need to create a function and call it for every worksheet – the technique I’ve already blogged about here. And yes, that is basically what needs to happen, but the devil’s in the detail.

Here’s the solution, step-by-step:

Step 1: Get a table with all the worksheets listed

In Power BI connect to your Excel file as normal, then in the Navigator pane right-click on the name of the Excel workbook and select Edit rather than selecting any of the individual worksheets:

image

The result will be a table that looks something like this:

image

If you need to, filter out any rows that do not contain “Sheet” in the Kind column and also filter out any worksheets that you don’t want to combine data from.

Step 2: Create your template query

Duplicate the query above and call the new query Template.

Now, in the Template query, select one of the worksheets to use to build the query whose logic will be applied to all the other worksheets, and filter the table above so it only contains the row for that worksheet. In this case I’m using the worksheet called Q1:

image

Then – and this is important – remove all the other columns in the table except the Data column:

image

Doing this changes the M code generated for the next thing you’ll do; removing all these columns changes the way the row is referenced (see the section on “The effect of primary keys” in this post) and makes sure the name of the worksheet won’t be hard-coded anywhere.

After that click the Table link inside the cell, and you’ll see the contents of the worksheet:

image

There will probably be a Changed Type step in the query that sets the data types for each of the columns, and you will need to delete it:

image

You can now perform any other transformations you need on this query, but you will need to avoid any transformations that generate M code referring to any columns on the original worksheet that aren’t present on other worksheets. Remember, these transformations will need to be applied to the other worksheets and they will fail if they refer to columns that aren’t present – this is why you had to delete the Changed Type step earlier, because it sets the types on the January, February and March columns, and you’ll probably need to delete any other Changed Type steps that are created elsewhere in the query. Open up the Advanced Editor and check the M code for the whole query just to be sure.

In this case all I need to do is unpivot the month columns  by selecting the Product column and using the Unpivot Other Columns button on the Transform tab, and then renaming the columns appropriately:

image

Step 3: Create a function

Next you need to create a new parameter by clicking the Manage Parameters/New Parameter button, call the parameter Worksheet, set the data type to text and have it return the name of the worksheet you chose in the previous step:

image

Now, go back to the Template query, find the step called Filtered Rows towards the beginning where you filtered down to a single worksheet, and click the gear icon next to the step to edit it:

image

Then, edit the step so it uses the value returned by the parameter to filter by instead of the hard-coded value you entered earlier. To do this, click on the button shown below, select Parameter and then select the Worksheet parameter in the next dropdown box along:

image

Finally, go to the Queries pane on the left-hand side of the screen and right-click on the Template query and select Create Function..

image

You’ll be prompted to give the new function a name; call it GetData:

image

Step 4: Invoke the function and combine the data

Finally, go back to the duplicate copy of the original query created at the beginning of step 2. Then go to the Add Column tab on the ribbon and click the Invoke Custom Function button and invoke the GetData function, passing in the contents of the Name column to the function’s only parameter:

image

Last of all, click the Expand/Aggregate button on the new column and expand the nested tables:

image

After removing any unnecessary columns, you’ll see the data from all the worksheets combined into a single table as desired:

image

Don’t forget to set the data types on each column.

You can download the Excel workbook used in this post here and the sample Power BI Desktop file here.

How The New IsAvailableInMDX Property For Analysis Services Tabular Can Reduce Memory Usage And Speed Up Processing

Following on from my post the other week about the new RowsetSerializationLimit server property, I thought it would be a good idea to write about why the new IsAvailableInMDX property (announced in the same blog post) is so important. In fact, I would say that everyone using Analysis Services Tabular 2017 (CU7 or higher) or Azure Analysis Services should spend some time investigating it because the potential benefits in terms of reduced memory usage and faster processing times are significant, especially for larger models.

First of all, what does it actually do? As the blog post says, it allows you to stop attribute hierarchies from being built on columns when you don’t need them. But what are attribute hierarchies? They are structures that are used only when you are querying your Tabular model using MDX; Excel PivotTables, for example, generate MDX queries when they are connected to Analysis Services Tabular whereas Power BI always generates DAX queries. An attribute hierarchy allows a column on a table to be used on the rows or columns axis of an MDX query, and in Excel that means you will be able to drag that field onto the rows or columns area of a PivotTable. Attribute hierarchies are used by some DAX functionality too – for example the TreatAs() function (at least for now) needs them to be present to work. Frustratingly, the DAX functionality that does need attribute hierarchies is not documented.

To give you an example, consider a Tabular model that contains a table with three columns, Product, Customer and Sales, and a measure that sums up the values in the Sales column.

image

I can query this Tabular model in Power BI, for example by creating a Matrix visualisation:

image

I can also get the same values out using an Excel PivotTable:

image

Now the reason I can create this PivotTable is that Analysis Services Tabular has created attribute hierarchies on the Customer and Product columns. However, the important thing to understand is that Analysis Services Tabular creates attribute hierarchies on every column on every table by default, including the Sales column. This allows me to create a PivotTable like this, with the distinct values from Sales on the rows of the PivotTable:

image

image

You’re probably thinking, why would I ever want to use Sales – a measure column – like this? And the answer is you probably wouldn’t, even though Tabular allows this by default. What’s more, building the attribute hierarchy for Sales makes processing slower and the resulting hierarchy uses memory, so all this comes as a cost. The IsAvailableInMDX property is therefore very useful because it allows you to stop attribute hierarchies from being built on columns like Sales where they serve no real purpose.

Unfortunately at the time of writing SSDT doesn’t allow you to set the IsAvailableInMDX property but the good news is that the latest versions of Tabular Editor do:

image

Setting IsAvailableInMDX to false for the Sales field has no impact at all in Power BI, so long as you are not using functionality like TreatAs() that needs it. In Excel, it just means that it is no longer possible to drag Sales onto rows or columns in a PivotTable – the Sales Amount measure still works:

image

As a result, there are two recommendations that can be made:

  • If you are not using any client tools that generate MDX queries (such as Excel) or you want to prevent your users from using them, and you can be sure that it does not affect any of your existing Power BI reports or DAX calculations, you can set IsAvailableInMDX to false on every column of every table
  • If you are using client tools that generate MDX you can still probably set IsAvailableInMDX to false on every measure column and not lose any important functionality

How much of an impact will doing this have on processing times and memory usage? It depends, but it could be a lot. The anecdotal evidence on Twitter is promising:

image

image

I did my own (not particularly scientific) test using a table with five million rows and ten columns, each of which contained the integers between one and five million. Here’s the M query to generate such a table without the need for an external data source:

[sourcecode language='text'  padlinenumbers='true']
let
    Source =
	Table.FromColumns(
		List.Repeat(
			{{1..5000000}},
			10
		)
	),
    #"Changed Type" =
	Table.TransformColumnTypes(
		Source,
		List.Transform(
			Table.ColumnNames(Source),
			each {_, Int64.Type}
		)
	)
in
    #"Changed Type"
[/sourcecode]

On my laptop, with IsAvailableInMDX set to true for all ten columns, a full process on this table took around 105 seconds and the table size reported by Vertipaq Analyzer was 381MB. After changing IsAvailableInMDX to false for all ten columns, the time for a full process went down to around 81 seconds and the table size was down to 191MB.

In summary, this is one of those seemingly obscure technical changes that turns out to be way more useful than you might think. If you test out setting IsAvailableInMDX on your Tabular model, please leave a comment letting me know what kind of impact it had!

[Thanks to Daniel Otykier for providing a lot of information for this post]

UPDATE 2022: the issue with TREATAS was fixed a long time ago, so you should be able to use all DAX functionality when IsAvailableInMDX is turned off.

OData Performance Improvements In The June 2018 Power BI Desktop Release

In the June 2018 release of Power BI Desktop there were a number of improvements made to the way the Power Query engine handles OData data sources. You can read about them here:

https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-june-2018-feature-summary/#oData

However, while testing them out, I noticed one important point that the announcement didn’t make: an existing Power Query query will only benefit from these changes if you make a small change to its M code, adding the Implementation=”2.0” option to the OData.Feed() function.

Take the following M query, running on the UK Houses of Parliament OData API:

[sourcecode language=’text’ padlinenumbers=’true’]
let
Source = OData.Feed(“https://api.parliament.uk/odata”),
GovernmentOrganisation_table =
Source{[
Name=”GovernmentOrganisation”,
Signature=”table”
]}[Data],
#”Expanded GroupHasPosition” =
Table.ExpandTableColumn(
GovernmentOrganisation_table,
“GroupHasPosition”,
{“PositionName”},
{“PositionName”}),
#”Filtered Rows” =
Table.SelectRows(
#”Expanded GroupHasPosition”,
each [PositionName] = “Chancellor of the Exchequer”)
in
#”Filtered Rows”
[/sourcecode]

It queries a table called GovernmentOrganisation, expands a column called GroupHasPosition and then filters on one of the expanded columns, PositionName, to only return the rows where PositionName contains the text “Chancellor of the Exchequer”.

image

Using Fiddler in the way I describe here, I can see that when this query runs the engine first tries to fold the filter on “Chancellor of the Exchequer” and then when this request returns an error, it defaults to a very slow approach that involves making multiple requests to the API:

image

However, if you change the code above so that the OData.Feed() function uses the Implementation=”2.0” option like so:

[sourcecode language=’text’ padlinenumbers=’true’ highlight=’2,3,4,5′]
let
Source = OData.Feed(
https://api.parliament.uk/odata”,
null,
[Implementation=”2.0″]),
GovernmentOrganisation_table =
Source{[
Name=”GovernmentOrganisation”,
Signature=”table”
]}[Data],
#”Expanded GroupHasPosition” =
Table.ExpandTableColumn(
GovernmentOrganisation_table,
“GroupHasPosition”,
{“PositionName”},
{“PositionName”}),
#”Filtered Rows” =
Table.SelectRows(
#”Expanded GroupHasPosition”,
each [PositionName] = “Chancellor of the Exchequer”)
in
#”Filtered Rows”
[/sourcecode]

…then Fiddler shows a completely different request made:

image

In this case it does not try to fold the filter, but it at least does the expansion in a single request. The performance of the resulting query is a lot better.

It looks like all new code generated by the Power Query Editor uses the Implementation=”2.0” option for OData.Feed() so it will get the benefits described in the post on the Power BI blog. Existing M code won’t have this option set though so you will need to update it appropriately. As always you should test thoroughly after making these changes to make sure you do indeed make your query run faster and get the same behaviour as you had before.

More Details On The RowsetSerializationLimit Server Property For Analysis Services

The recent blog post on the Analysis Services team blog about new memory options contained information about some very interesting functionality that has just been added. The first of the new features I decided to try was the RowsetSerializationLimit server property, which restricts the number of rows returned by a query and which is intended to stop rogue users running queries that return very large amounts of data. It certainly works but there are two things to point out about it that aren’t immediately obvious.

First of all, an example of it in action. In Azure Analysis Services, using a database that contains one very simple table with one column and 5000 rows, the following DAX query returns all 5000 rows from the table:

[sourcecode language='text'  padlinenumbers='true']
EVALUATE 'MyTable'
[/sourcecode]

image

To change the RowsetSerializationLimit server property, open SQL Server Management Studio, right-click on your server name in the Object Explorer pane and select Properties. Then go to the General page, check the Show Advanced (All) Properties box, and you’ll see the property listed under OLAP\Query\RowsetSerializationLimit:

image

The default value is –1, meaning no limit is placed on the number of rows returned by a rowset. Changing it to 100 and then running the above query results in the following error:

image

Executing the query …
Obtained object of type: Microsoft.AnalysisServices.AdomdClient.AdomdDataReader
The maximum number of rows (100) was reached.

A query that returns less than 99 rows, for example

[sourcecode language='text' ]
EVALUATE TOPN(99,'MyTable')
[/sourcecode]

…is ok though.

The important thing to understand is that although this applies to both DAX and MDX queries, it only applies to queries that are returned as rowsets (a tabular format) and not as cellsets (a multidimensional format). Most client tools that generate MDX, including Excel, use cellsets so this property will not apply to them unfortunately.

For example, the following MDX query run from a SQL Server Management Studio MDX query window (which also returns a cellset) on the same database runs ok and returns 5000 rows:

[sourcecode language='text' ]
SELECT
{[Measures].[Number Of Rows]} ON 0,
[MyTable].[Column1].[Column1].MEMBERS ON 1
FROM
[Model]
[/sourcecode]

 

image

However the same query run from Power BI Desktop to import data from the same database:

image

…runs, but returns only 100 rows and then an error:

image

Something else to point out is that this applies to all queries that return rowsets, even Discover requests and DMV queries. As a result, setting this property to a very low value can cause problems in tools like SQL Server Management Studio: for example if you set the property to 10 and you had 11 tables in a database, you would see an error when you tried to expand the Tables node of a database in the Object Explorer pane!

Even though this property only applies to rowsets I think it’s still going to be very useful, especially in scenarios where Power BI Desktop users are importing vast amounts of data from Analysis Services and causing major problems on the server.

Line Breaks And Auto-Indent In The Power BI DAX Formula Bar

The other day I discovered something new (at least to me) while writing the DAX for a measure in Power BI Desktop: when you insert a new line in your DAX expression using SHIFT-ENTER it also auto-indents the code. I asked a few people if this was new because I was sure I hadn’t seen it before, even though I always put line breaks in my code; of course Marco had and said he thought it had been around for a while. Anyway, Marco then commented that most people didn’t know you could even put line breaks in DAX and I thought to myself I should probably write a short blog post about all this, because of course line breaks and indentation make your code much more readable.

Here’s what it looks like to write a DAX measure in Power BI Desktop using line breaks and auto-indent:

DAXIndent

To summarise:

  • Press the SHIFT and ENTER keys on the keyboard simultaneously to insert a line break in your DAX code when you are working in the Power BI DAX formula bar
  • Do this after the opening bracket of each function and after the comma of each function argument and the auto-indent will kick in
  • Bonus tip: while typing DAX, if the dropdown list has the function, table, column or measure selected that you want to use, just hit the TAB key to auto-complete. If the dropdown list hasn’t selected what you are looking for and you don’t want to keep typing, use the up and down arrow keys on the keyboard to move up and down the list until you have selected what you want.

Of course there’s always Marco and Alberto’s excellent DAX Formatter service to format your DAX code but this is a lot more convenient.

How Much Does Azure Analysis Services Actually Cost?

It might seem strange to write a blog post on how much Azure Analysis Services costs when there is a page on the Microsoft site that very clearly lists all of the SKUs and how much they cost per hour or per month in any currency and Azure region you choose:

https://azure.microsoft.com/en-gb/pricing/details/analysis-services/

The prices you see here are somewhat misleading though. They’re not wrong, but they do not challenge the assumption that you will  use Azure Analysis Services in the same way as an on-premises instance and therefore they allow you to make further, incorrect assumptions about cost. A lot of my customers look at the monthly cost and assume that’s what they will actually be paying monthly – and more often than not come to the conclusion that Azure Analysis Services is too expensive for them. In fact the situation is a lot more complex and Azure Analysis Services (especially at the enterprise level) might end up being cheaper than you think. Let me explain why.

The workload of any Analysis Services instance, whether on-premises or in Azure, varies a lot. For example:

  • It will be busy during office hours while users are running queries but much quieter at night when most users have gone home
  • Similarly, it will be busy during the week when most users are working and much quieter on weekends and public holidays
  • It will be busy at certain times, such as month-ends or Black Friday, when more reports need to be run
  • It will be much busier (often with both CPU and memory usage at their peak) when processing is taking place

When you are planning an on-premises deployment of Analysis Services you need to specify your hardware and licensing so as to be able to handle these periods of high usage, even if for most of the time usage is a lot lower. This means that enterprise-level deployments of Analysis Services can be expensive because you need servers with a large number of cores and a lot of RAM and you may also need to use network load-balancing to scale out over several servers.

On the other hand Azure Analysis Services is able to scale up and scale out on demand, and you only pay for what you use. Scaling up means moving to a higher performance level (ie a SKU) within a service tier, or even moving up a service tier. Scaling out means adding replicas of your existing Azure AS instance and database.

Broadly speaking you need to scale up in two scenarios:

  • To handle the need for more memory and more QPUs while you are processing
  • To handle increased data volumes, either as a result of new tables being added to the model or because the size of existing tables has increased over time

You need to scale out when:

  • You need to process during times when other users are running queries, to ensure that query performance is not affected
  • You need to handle an increased number of concurrent users running queries

Bill Anton has an excellent blog post covering this question in a lot more detail here:

http://byobi.com/2017/11/when-to-scale-up-or-scale-out-with-azure-analysis-services/

One other thing to point out is that if you pause an instance of Azure Analysis Services you pay absolutely nothing.

What does this mean for the cost of Azure Analysis Services? Basically, if you’re taking advantage of these features you won’t pay one of the monthly prices quoted on the pricing page linked to at the top of this post. Instead you may do things like:

  • Scale up for one hour every day when you need to process your SSAS database, just to get the extra memory and QPUs needed, then scale down when processing has finished
  • Scale out only on certain days, or certain times of day, to handle increased numbers of users
  • Pause your instance when you are sure that no-one needs to run queries

How do you then calculate the likely cost? For my Azure Analysis Services precon at SQLBits a few months ago I built an Excel workbook that shows how to go about this. First, there is a table with the hourly costs for S-level instances in GBP:

image

Then there is a table with one cell for every hour of every day of the week, with the performance level required for that hour. In this example most of the time an S1 instance is required except for at midnight every night where processing is taking place and an S2 is needed. For four hours on Sunday morning the instance is paused.

image

Next there is a similar table showing the number of scale-out replicas needed for each hour of each day. In this case scale-out is needed for four hours of the day, Monday to Friday, to handle a larger number of concurrent users:

image

Next, the data from these three tables is brought together to calculate the cost per hour for each day of the week:

image

Finally, for a given month the actual cost per day can be calculated (which of course varies by the number of weekdays, weekends and maybe also public holidays), resulting in the true monthly cost:

image

In this example the monthly cost for January 2018 is £1287.56, which is only slightly more than the £1104.48 you would pay to have an S1 instance for a whole month. You can download a copy of the workbook here.

In summary, my point here is that pricing an implementation of Azure Analysis Services is complex because of its flexibility. I am not saying that Azure Analysis Services is cheap, or cheaper than using Analysis Services on a VM in Azure or on premises – that’s a subject for a completely separate discussion. Hopefully, though, this post gives you a better idea of how much you might pay if you do use Azure Analysis Services for a project.

UPDATE 10th June 2017: the number of query replicas for scale-out has been limited by Azure region, so choosing the right Azure region is very important if you do need to scale out. The documentation is here: https://docs.microsoft.com/en-us/azure/analysis-services/analysis-services-overview#availability-by-region

Using Email Attachments As A Data Source In Power BI

Many of my customers get the source data for their Power BI reports – usually Excel or csv files – sent to them as an email attachment on a regular basis. When they get one of these emails what usually happens next is that they save the attachment to a folder so Power BI can connect to it. This a tedious manual process and of course one that breaks down if they’re off sick or away on holiday. However if you are using Exchange or Exchange Online it is possible to eliminate this step and have Power BI connect directly to the attachment without downloading it. In this post I’ll show you how, using Exchange Online as an example.

First of all, the source data. Here’s an inbox with the usual junk in it; the most recent three emails are from the same person, with the same subject line, and each of them has an Excel file attachment called MailSalesDemo.xlsx:

image

The Excel files contain a simple table called SalesTable with some sales data in it:

image

If you’re going to do this properly it will be better to create a folder in your mailbox, and add a rule to make sure that all the emails with the data you need go to that folder, but I’m going to keep things simple and assume that the emails I need are in my inbox.

Next, you need to connect Power BI to Exchange. There are surprisingly few examples of how to do this, but this video is a good place to start:

https://www.youtube.com/watch?v=wvBR41V7_Yk

…and Ken Puls has good post (it’s written for Excel/Power Query but it’s all relevant for Power BI) here:

https://www.excelguru.ca/blog/2014/01/09/using-powerquery-with-email/

 

image

In my case after I selected the Microsoft Exchange Online data source all I needed to do was enter my email address in the Mailbox Address dialog box, and then in the authentication dialog select Microsoft Account and sign in, to get access to my data.

Once you have connected the Navigator pane will appear; you should select Mail and then click the Edit button:

image

 

At this point the Power Query Editor screen will open and you will see a table containing one row for each email in your mailbox. You will now have to filter your emails so you only see the emails with the attachments containing data (don’t worry if there is more than one of these emails though). You will need to something like this:

  • Filter by the Folder Path column so you only get the emails in the relevant folder
  • Expand the Sender column so you can see the name and email address of the sender of each email, and filter so that you only get emails from the relevant person (assuming that these emails are always have the same sender)
  • Filter by the Subject column (assuming that these emails always have the same subject line)
  • Filter the Has Attachments column so you only get the emails with attachments – ie where the Has Attachments column contains a TRUE value

Once you’ve done this you should see a table that looks something like this (I have removed all non-relevant columns too to make things clearer):

image

The final step is to find the most recent email and get the attachment from it. To do this:

  • Sort the table in descending order by the DateTimeReceived column

image

  • Go to the Home tab on the ribbon, click the Keep Rows/Keep Top Rows button and enter the value 1

image

  • UPDATE: something changed since I first wrote this post, resulting in an extra step being necessary at this point. You need to delete all the columns in the table except the Attachments column now; if you don’t, the query will return the specific email that’s listed when you’re designing the query, not whichever email is the most recent. This means the next screenshot is out of date, because it shows the columns that you need to delete, but it’s not a big change.
  • Click on the value Table in the Attachments column of the one remaining row in the table:

image

  • This will display a table with one row for each attachment on the selected email. If there is more than one attachment (which is not the case in my example) filter the table of attachments, for example by the Name column, so that you only have the attachment you want to get data from. Click on the value Binary in the AttachmentContent column of the one row:

image

  • From this point on the experience should be very similar to the one you get when you connect direct to an Excel file: you should see a table containing all the worksheets, tables and named ranges in the Excel file attachment. Click on the Table value in the Data column for whatever worksheet/table/named range you need for your report and then use the Power Query Editor as normal, click Close and Apply and design your report. In my case I clicked on the Table link for the SalesTable table:

image

 

Here’s the very basic report I built:

image

After you have published your report you will also need to set up scheduled refresh on the dataset for a reasonable amount of time after whenever you expect to have received the email containing the data. There’s no need to use a Gateway if you are using Exchange Online, so in the Settings pane for the dataset in PowerBI.com choose Connect directly:

image

Under Data Source Credentials choose OAuth2 in the Authentication method dropdown box and then sign in:

image

Once you’ve done this, every time your report refresh it will use the data from the attachment in the most recently-received email and you have one less thing to remember to do every day. To be honest it’s a bit of a hack but it works so long as the emails you receive always come from the same sender, with the same subject, same attachment and so on.

It’s a shame that Flow doesn’t have a “refresh Power BI dataset” action built-in (why doesn’t it??) because if it did, it would be very easy to refresh the dataset whenever an email with data was received. I know this is possible with Flow if you call the Power BI API but that involves a lot of technical skill to set up. Thinking about it, if you use Flow you could probably solve the problem in a completely different way by saving the email attachment to OneDrive every time an email arrived… but that needs a separate blog post.

A Few More Power BI Roadmap Details From The Dynamics 365 Release Notes

Back in March, with the publication of the Dynamics 365 Release Notes document (which you can download here, halfway down the page), we learned a lot about the roadmap of Power BI; my blog post here summarises the announcements. However, on Twitter Matthew Roche reminded me that it is a living document and indeed it turns out that it has been updated several times since March. So what has been added since then about the future of Power BI?

The change history section at the start helps identify what’s new in the document, but it’s not easy to tell what genuinely new Power BI features have been announced. There are plenty of changes to availability dates for sure. Here’s what little I’ve found in terms of new, interesting stuff (page numbers are for document version 18.1.2):

  • Filtering in the data view (p205) – finally! This is a really useful feature from Excel’s Power Pivot window that allows you to sort and filter the data that is shown in the data view after you have loaded data into tables in your dataset:

image

  • Various improvements to SAP BW and SAP Hana data connectors (p241) – I don’t think some of the details here have been officially announced yet, but I’m not an SAP person so I could be wrong.
  • Power BI custom connectors written in M will also work in Flow, PowerApps and Logic Apps (p261). This is something that Matt Masson talked about in his session at the Dublin Power BI conference too, but which I don’t think has been mentioned anywhere else. This makes custom connectors even more powerful!

Let me know if you find anything else! I’ll need to remember to check for changes to this document on a regular basis…

More Details On Creating Tables In Power BI/Power Query M Code Using #table()

About two years ago I wrote a blog post describing how the #table M function can be used to generate tables, but in that post I only covered the functionality I used regularly – namely using #table with a list of column names or a table type in the first parameter. However there two other variations on #table that I have used recently that I thought were worth pointing out.

For example, if you need to generate a table with a set number of columns but you don’t care what the columns are called, you can use an integer in the first parameter to get a table with that number of columns. The following expression returns a table with four columns of data type Any called Column1, Column2, Column3 and Column3, and no rows:

[sourcecode language='text'  padlinenumbers='true']
#table(4,{})
[/sourcecode]

image

Also, if you have a list of lists with an unknown number of items in and you want to use each nested list for the row values in a table, you can use a null value in the first parameter of #table. The following expression returns a table with four columns like the one above, but with two rows of integer values:

[sourcecode language='text' ]
#table(null, {{1,2,3,4},{2,3,4,5}})
[/sourcecode]

image

A New Approach To Handling SSRS Multi-Valued Parameters in DAX Queries

Back in 2012 I wrote a blog post explaining how to handle multi-value parameters in DAX queries in Reporting Services reports. The approach I took back then was basically:

  1. Generate a string containing a pipe-delimited list of all the parameter values that the user has selected (I did this in DAX too, but to be honest it’s better to use the SSRS Join function to do this outside the DAX query)
  2. Use the DAX PathContains() function in a Filter() to check whether the value on the current row of the table being filtered appears in the pipe-delimited list

Here’s a deliberately simplified example of how this works based on Adventure Works DW data. The following query filters the FactInternetSales fact table and returns the rows for the Sales Order Numbers that are present in the OrderList variable:

[sourcecode language=’text’ padlinenumbers=’true’]
EVALUATE
VAR OrderList = “SO43713|SO43758|SO43784|SO43821”
RETURN
FILTER (
FactInternetSales,
PATHCONTAINS(OrderList, FactInternetSales[SalesOrderNumber])
)
[/sourcecode]

The trouble with this approach is that is that it can be very slow. Running a trace in DAX Studio for the query above reveals the problem:

image

The presence of CallbackDataID shows that the Storage Engine is calling the Formula Engine to handle the use of PathContains() in the filter, and this is often a cause of poor query performance. However back when I wrote the post the only alternative was, as Chris Koester points out here, to dynamically generate the entire DAX query as an SSRS expression and that is very painful to do.

The good news is that recent changes in DAX mean that there is another way to tackle this problem that can give much better performance. Here’s an example of this new approach:

[sourcecode language=’text’ ]
EVALUATE
VAR OrderList = “SO43713|SO43758|SO43784|SO43821”
VAR OrderCount =
PATHLENGTH ( OrderList )
VAR NumberTable =
GENERATESERIES ( 1, OrderCount, 1 )
VAR OrderTable =
GENERATE (
NumberTable,
VAR CurrentKey = [Value]
RETURN
ROW ( “Key”, PATHITEM ( OrderList, CurrentKey ) )
)
VAR GetKeyColumn =
SELECTCOLUMNS ( OrderTable, “Key”, [Key] )
VAR FilterTable =
TREATAS ( GetKeyColumn, FactInternetSales[SalesOrderNumber] )
RETURN
CALCULATETABLE ( FactInternetSales, FilterTable )
[/sourcecode]

Broken down variable by variable, here’s how it works:

  1. OrderList is the pipe-delimited list of key values passed from SSRS
  2. OrderCount uses the PathLength() DAX function to find the number of parameter values in this list
  3. NumberTable uses the GenerateSeries() function to create a table of numbers with one row for each number between 1 and the number of parameter values in the list
  4. OrderTable uses the trick Marco describes here to iterate over NumberTable and, for each row, uses the PathItem() function to return one parameter value from the list for each row in the able
  5. GetKeyColumn uses the SelectColumns() DAX function to only return the column from OrderTable that contains the parameter values
  6. FilterTable uses the TreatAs() DAX function to take the table of values returned by GetKeyColumn and treat them as values in the FactInternetSales[SalesOrderNumber] column
  7. Finally, the query returns the contents of the FactInternetSales table filtered by the values in FilterTable using the CalculateTable() DAX function

There’s a lot of extra code here and in some cases you may find that performance with smaller data volumes is worse as a result, but in this particular case the new approach is twice as fast at the old one. There’s certainly no CallBackDataID:

image