OData Performance Improvements In The June 2018 Power BI Desktop Release

In the June 2018 release of Power BI Desktop there were a number of improvements made to the way the Power Query engine handles OData data sources. You can read about them here:

https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-june-2018-feature-summary/#oData

However, while testing them out, I noticed one important point that the announcement didn’t make: an existing Power Query query will only benefit from these changes if you make a small change to its M code, adding the Implementation=”2.0” option to the OData.Feed() function.

Take the following M query, running on the UK Houses of Parliament OData API:

let
    Source = OData.Feed("https://api.parliament.uk/odata"),
    GovernmentOrganisation_table =
    Source{[
        Name="GovernmentOrganisation",
        Signature="table"
        ]}[Data],
    #"Expanded GroupHasPosition" = 
        Table.ExpandTableColumn(
            GovernmentOrganisation_table, 
            "GroupHasPosition", 
            {"PositionName"}, 
            {"PositionName"}),
    #"Filtered Rows" = 
        Table.SelectRows(
            #"Expanded GroupHasPosition", 
            each [PositionName] = "Chancellor of the Exchequer")
in
    #"Filtered Rows"

It queries a table called GovernmentOrganisation, expands a column called GroupHasPosition and then filters on one of the expanded columns, PositionName, to only return the rows where PositionName contains the text “Chancellor of the Exchequer”.

image

Using Fiddler in the way I describe here, I can see that when this query runs the engine first tries to fold the filter on “Chancellor of the Exchequer” and then when this request returns an error, it defaults to a very slow approach that involves making multiple requests to the API:

image

However, if you change the code above so that the OData.Feed() function uses the Implementation=”2.0” option like so:

let
    Source = OData.Feed(
        "https://api.parliament.uk/odata", 
        null, 
        [Implementation="2.0"]),
    GovernmentOrganisation_table =
    Source{[
        Name="GovernmentOrganisation",
        Signature="table"
        ]}[Data],
    #"Expanded GroupHasPosition" = 
        Table.ExpandTableColumn(
            GovernmentOrganisation_table, 
            "GroupHasPosition", 
            {"PositionName"}, 
            {"PositionName"}),
    #"Filtered Rows" = 
        Table.SelectRows(
            #"Expanded GroupHasPosition", 
            each [PositionName] = "Chancellor of the Exchequer")
in
    #"Filtered Rows"

…then Fiddler shows a completely different request made:

image

In this case it does not try to fold the filter, but it at least does the expansion in a single request. The performance of the resulting query is a lot better.

It looks like all new code generated by the Power Query Editor uses the Implementation=”2.0” option for OData.Feed() so it will get the benefits described in the post on the Power BI blog. Existing M code won’t have this option set though so you will need to update it appropriately. As always you should test thoroughly after making these changes to make sure you do indeed make your query run faster and get the same behaviour as you had before.

More Details On The RowsetSerializationLimit Server Property For Analysis Services

The recent blog post on the Analysis Services team blog about new memory options contained information about some very interesting functionality that has just been added. The first of the new features I decided to try was the RowsetSerializationLimit server property, which restricts the number of rows returned by a query and which is intended to stop rogue users running queries that return very large amounts of data. It certainly works but there are two things to point out about it that aren’t immediately obvious.

First of all, an example of it in action. In Azure Analysis Services, using a database that contains one very simple table with one column and 5000 rows, the following DAX query returns all 5000 rows from the table:

EVALUATE 'MyTable'

image

To change the RowsetSerializationLimit server property, open SQL Server Management Studio, right-click on your server name in the Object Explorer pane and select Properties. Then go to the General page, check the Show Advanced (All) Properties box, and you’ll see the property listed under OLAP\Query\RowsetSerializationLimit:

image

The default value is –1, meaning no limit is placed on the number of rows returned by a rowset. Changing it to 100 and then running the above query results in the following error:

image

Executing the query …
Obtained object of type: Microsoft.AnalysisServices.AdomdClient.AdomdDataReader
The maximum number of rows (100) was reached.

A query that returns less than 99 rows, for example

EVALUATE TOPN(99,'MyTable')

…is ok though.

The important thing to understand is that although this applies to both DAX and MDX queries, it only applies to queries that are returned as rowsets (a tabular format) and not as cellsets (a multidimensional format). Most client tools that generate MDX, including Excel, use cellsets so this property will not apply to them unfortunately.

For example, the following MDX query run from a SQL Server Management Studio MDX query window (which also returns a cellset) on the same database runs ok and returns 5000 rows:

SELECT
{[Measures].[Number Of Rows]} ON 0,
[MyTable].[Column1].[Column1].MEMBERS ON 1
FROM
[Model]

 

image

However the same query run from Power BI Desktop to import data from the same database:

image

…runs, but returns only 100 rows and then an error:

image

Something else to point out is that this applies to all queries that return rowsets, even Discover requests and DMV queries. As a result, setting this property to a very low value can cause problems in tools like SQL Server Management Studio: for example if you set the property to 10 and you had 11 tables in a database, you would see an error when you tried to expand the Tables node of a database in the Object Explorer pane!

Even though this property only applies to rowsets I think it’s still going to be very useful, especially in scenarios where Power BI Desktop users are importing vast amounts of data from Analysis Services and causing major problems on the server.

Line Breaks And Auto-Indent In The Power BI DAX Formula Bar

The other day I discovered something new (at least to me) while writing the DAX for a measure in Power BI Desktop: when you insert a new line in your DAX expression using SHIFT-ENTER it also auto-indents the code. I asked a few people if this was new because I was sure I hadn’t seen it before, even though I always put line breaks in my code; of course Marco had and said he thought it had been around for a while. Anyway, Marco then commented that most people didn’t know you could even put line breaks in DAX and I thought to myself I should probably write a short blog post about all this, because of course line breaks and indentation make your code much more readable.

Here’s what it looks like to write a DAX measure in Power BI Desktop using line breaks and auto-indent:

DAXIndent

To summarise:

  • Press the SHIFT and ENTER keys on the keyboard simultaneously to insert a line break in your DAX code when you are working in the Power BI DAX formula bar
  • Do this after the opening bracket of each function and after the comma of each function argument and the auto-indent will kick in
  • Bonus tip: while typing DAX, if the dropdown list has the function, table, column or measure selected that you want to use, just hit the TAB key to auto-complete. If the dropdown list hasn’t selected what you are looking for and you don’t want to keep typing, use the up and down arrow keys on the keyboard to move up and down the list until you have selected what you want.

Of course there’s always Marco and Alberto’s excellent DAX Formatter service to format your DAX code but this is a lot more convenient.

How Much Does Azure Analysis Services Actually Cost?

It might seem strange to write a blog post on how much Azure Analysis Services costs when there is a page on the Microsoft site that very clearly lists all of the SKUs and how much they cost per hour or per month in any currency and Azure region you choose:

https://azure.microsoft.com/en-gb/pricing/details/analysis-services/

The prices you see here are somewhat misleading though. They’re not wrong, but they do not challenge the assumption that you will  use Azure Analysis Services in the same way as an on-premises instance and therefore they allow you to make further, incorrect assumptions about cost. A lot of my customers look at the monthly cost and assume that’s what they will actually be paying monthly – and more often than not come to the conclusion that Azure Analysis Services is too expensive for them. In fact the situation is a lot more complex and Azure Analysis Services (especially at the enterprise level) might end up being cheaper than you think. Let me explain why.

The workload of any Analysis Services instance, whether on-premises or in Azure, varies a lot. For example:

  • It will be busy during office hours while users are running queries but much quieter at night when most users have gone home
  • Similarly, it will be busy during the week when most users are working and much quieter on weekends and public holidays
  • It will be busy at certain times, such as month-ends or Black Friday, when more reports need to be run
  • It will be much busier (often with both CPU and memory usage at their peak) when processing is taking place

When you are planning an on-premises deployment of Analysis Services you need to specify your hardware and licensing so as to be able to handle these periods of high usage, even if for most of the time usage is a lot lower. This means that enterprise-level deployments of Analysis Services can be expensive because you need servers with a large number of cores and a lot of RAM and you may also need to use network load-balancing to scale out over several servers.

On the other hand Azure Analysis Services is able to scale up and scale out on demand, and you only pay for what you use. Scaling up means moving to a higher performance level (ie a SKU) within a service tier, or even moving up a service tier. Scaling out means adding replicas of your existing Azure AS instance and database.

Broadly speaking you need to scale up in two scenarios:

  • To handle the need for more memory and more QPUs while you are processing
  • To handle increased data volumes, either as a result of new tables being added to the model or because the size of existing tables has increased over time

You need to scale out when:

  • You need to process during times when other users are running queries, to ensure that query performance is not affected
  • You need to handle an increased number of concurrent users running queries

Bill Anton has an excellent blog post covering this question in a lot more detail here:

http://byobi.com/2017/11/when-to-scale-up-or-scale-out-with-azure-analysis-services/

One other thing to point out is that if you pause an instance of Azure Analysis Services you pay absolutely nothing.

What does this mean for the cost of Azure Analysis Services? Basically, if you’re taking advantage of these features you won’t pay one of the monthly prices quoted on the pricing page linked to at the top of this post. Instead you may do things like:

  • Scale up for one hour every day when you need to process your SSAS database, just to get the extra memory and QPUs needed, then scale down when processing has finished
  • Scale out only on certain days, or certain times of day, to handle increased numbers of users
  • Pause your instance when you are sure that no-one needs to run queries

How do you then calculate the likely cost? For my Azure Analysis Services precon at SQLBits a few months ago I built an Excel workbook that shows how to go about this. First, there is a table with the hourly costs for S-level instances in GBP:

image

Then there is a table with one cell for every hour of every day of the week, with the performance level required for that hour. In this example most of the time an S1 instance is required except for at midnight every night where processing is taking place and an S2 is needed. For four hours on Sunday morning the instance is paused.

image

Next there is a similar table showing the number of scale-out replicas needed for each hour of each day. In this case scale-out is needed for four hours of the day, Monday to Friday, to handle a larger number of concurrent users:

image

Next, the data from these three tables is brought together to calculate the cost per hour for each day of the week:

image

Finally, for a given month the actual cost per day can be calculated (which of course varies by the number of weekdays, weekends and maybe also public holidays), resulting in the true monthly cost:

image

In this example the monthly cost for January 2018 is £1287.56, which is only slightly more than the £1104.48 you would pay to have an S1 instance for a whole month. You can download a copy of the workbook here.

In summary, my point here is that pricing an implementation of Azure Analysis Services is complex because of its flexibility. I am not saying that Azure Analysis Services is cheap, or cheaper than using Analysis Services on a VM in Azure or on premises – that’s a subject for a completely separate discussion. Hopefully, though, this post gives you a better idea of how much you might pay if you do use Azure Analysis Services for a project.

UPDATE 10th June 2017: the number of query replicas for scale-out has been limited by Azure region, so choosing the right Azure region is very important if you do need to scale out. The documentation is here: https://docs.microsoft.com/en-us/azure/analysis-services/analysis-services-overview#availability-by-region

Using Email Attachments As A Data Source In Power BI

Many of my customers get the source data for their Power BI reports – usually Excel or csv files – sent to them as an email attachment on a regular basis. When they get one of these emails what usually happens next is that they save the attachment to a folder so Power BI can connect to it. This a tedious manual process and of course one that breaks down if they’re off sick or away on holiday. However if you are using Exchange or Exchange Online it is possible to eliminate this step and have Power BI connect directly to the attachment without downloading it. In this post I’ll show you how, using Exchange Online as an example.

First of all, the source data. Here’s an inbox with the usual junk in it; the most recent three emails are from the same person, with the same subject line, and each of them has an Excel file attachment called MailSalesDemo.xlsx:

image

The Excel files contain a simple table called SalesTable with some sales data in it:

image

If you’re going to do this properly it will be better to create a folder in your mailbox, and add a rule to make sure that all the emails with the data you need go to that folder, but I’m going to keep things simple and assume that the emails I need are in my inbox.

Next, you need to connect Power BI to Exchange. There are surprisingly few examples of how to do this, but this video is a good place to start:

https://www.youtube.com/watch?v=wvBR41V7_Yk

…and Ken Puls has good post (it’s written for Excel/Power Query but it’s all relevant for Power BI) here:

https://www.excelguru.ca/blog/2014/01/09/using-powerquery-with-email/

 

image

In my case after I selected the Microsoft Exchange Online data source all I needed to do was enter my email address in the Mailbox Address dialog box, and then in the authentication dialog select Microsoft Account and sign in, to get access to my data.

Once you have connected the Navigator pane will appear; you should select Mail and then click the Edit button:

image

 

At this point the Power Query Editor screen will open and you will see a table containing one row for each email in your mailbox. You will now have to filter your emails so you only see the emails with the attachments containing data (don’t worry if there is more than one of these emails though). You will need to something like this:

  • Filter by the Folder Path column so you only get the emails in the relevant folder
  • Expand the Sender column so you can see the name and email address of the sender of each email, and filter so that you only get emails from the relevant person (assuming that these emails are always have the same sender)
  • Filter by the Subject column (assuming that these emails always have the same subject line)
  • Filter the Has Attachments column so you only get the emails with attachments – ie where the Has Attachments column contains a TRUE value

Once you’ve done this you should see a table that looks something like this (I have removed all non-relevant columns too to make things clearer):

image

The final step is to find the most recent email and get the attachment from it. To do this:

  • Sort the table in descending order by the DateTimeReceived column

image

  • Go to the Home tab on the ribbon, click the Keep Rows/Keep Top Rows button and enter the value 1

image

  • Click on the value Table in the Attachments column of the one remaining row in the table:

image

  • This will display a table with one row for each attachment on the selected email. If there is more than one attachment (which is not the case in my example) filter the table of attachments, for example by the Name column, so that you only have the attachment you want to get data from. Click on the value Binary in the AttachmentContent column of the one row:

image

  • From this point on the experience should be very similar to the one you get when you connect direct to an Excel file: you should see a table containing all the worksheets, tables and named ranges in the Excel file attachment. Click on the Table value in the Data column for whatever worksheet/table/named range you need for your report and then use the Power Query Editor as normal, click Close and Apply and design your report. In my case I clicked on the Table link for the SalesTable table:

image

 

Here’s the very basic report I built:

image

After you have published your report you will also need to set up scheduled refresh on the dataset for a reasonable amount of time after whenever you expect to have received the email containing the data. There’s no need to use a Gateway if you are using Exchange Online, so in the Settings pane for the dataset in PowerBI.com choose Connect directly:

image

Under Data Source Credentials choose OAuth2 in the Authentication method dropdown box and then sign in:

image

Once you’ve done this, every time your report refresh it will use the data from the attachment in the most recently-received email and you have one less thing to remember to do every day. To be honest it’s a bit of a hack but it works so long as the emails you receive always come from the same sender, with the same subject, same attachment and so on.

It’s a shame that Flow doesn’t have a “refresh Power BI dataset” action built-in (why doesn’t it??) because if it did, it would be very easy to refresh the dataset whenever an email with data was received. I know this is possible with Flow if you call the Power BI API but that involves a lot of technical skill to set up. Thinking about it, if you use Flow you could probably solve the problem in a completely different way by saving the email attachment to OneDrive every time an email arrived… but that needs a separate blog post.

%d bloggers like this: