How Much Does Azure Analysis Services Actually Cost?

It might seem strange to write a blog post on how much Azure Analysis Services costs when there is a page on the Microsoft site that very clearly lists all of the SKUs and how much they cost per hour or per month in any currency and Azure region you choose:

https://azure.microsoft.com/en-gb/pricing/details/analysis-services/

The prices you see here are somewhat misleading though. They’re not wrong, but they do not challenge the assumption that you will  use Azure Analysis Services in the same way as an on-premises instance and therefore they allow you to make further, incorrect assumptions about cost. A lot of my customers look at the monthly cost and assume that’s what they will actually be paying monthly – and more often than not come to the conclusion that Azure Analysis Services is too expensive for them. In fact the situation is a lot more complex and Azure Analysis Services (especially at the enterprise level) might end up being cheaper than you think. Let me explain why.

The workload of any Analysis Services instance, whether on-premises or in Azure, varies a lot. For example:

  • It will be busy during office hours while users are running queries but much quieter at night when most users have gone home
  • Similarly, it will be busy during the week when most users are working and much quieter on weekends and public holidays
  • It will be busy at certain times, such as month-ends or Black Friday, when more reports need to be run
  • It will be much busier (often with both CPU and memory usage at their peak) when processing is taking place

When you are planning an on-premises deployment of Analysis Services you need to specify your hardware and licensing so as to be able to handle these periods of high usage, even if for most of the time usage is a lot lower. This means that enterprise-level deployments of Analysis Services can be expensive because you need servers with a large number of cores and a lot of RAM and you may also need to use network load-balancing to scale out over several servers.

On the other hand Azure Analysis Services is able to scale up and scale out on demand, and you only pay for what you use. Scaling up means moving to a higher performance level (ie a SKU) within a service tier, or even moving up a service tier. Scaling out means adding replicas of your existing Azure AS instance and database.

Broadly speaking you need to scale up in two scenarios:

  • To handle the need for more memory and more QPUs while you are processing
  • To handle increased data volumes, either as a result of new tables being added to the model or because the size of existing tables has increased over time

You need to scale out when:

  • You need to process during times when other users are running queries, to ensure that query performance is not affected
  • You need to handle an increased number of concurrent users running queries

Bill Anton has an excellent blog post covering this question in a lot more detail here:

http://byobi.com/2017/11/when-to-scale-up-or-scale-out-with-azure-analysis-services/

One other thing to point out is that if you pause an instance of Azure Analysis Services you pay absolutely nothing.

What does this mean for the cost of Azure Analysis Services? Basically, if you’re taking advantage of these features you won’t pay one of the monthly prices quoted on the pricing page linked to at the top of this post. Instead you may do things like:

  • Scale up for one hour every day when you need to process your SSAS database, just to get the extra memory and QPUs needed, then scale down when processing has finished
  • Scale out only on certain days, or certain times of day, to handle increased numbers of users
  • Pause your instance when you are sure that no-one needs to run queries

How do you then calculate the likely cost? For my Azure Analysis Services precon at SQLBits a few months ago I built an Excel workbook that shows how to go about this. First, there is a table with the hourly costs for S-level instances in GBP:

image

Then there is a table with one cell for every hour of every day of the week, with the performance level required for that hour. In this example most of the time an S1 instance is required except for at midnight every night where processing is taking place and an S2 is needed. For four hours on Sunday morning the instance is paused.

image

Next there is a similar table showing the number of scale-out replicas needed for each hour of each day. In this case scale-out is needed for four hours of the day, Monday to Friday, to handle a larger number of concurrent users:

image

Next, the data from these three tables is brought together to calculate the cost per hour for each day of the week:

image

Finally, for a given month the actual cost per day can be calculated (which of course varies by the number of weekdays, weekends and maybe also public holidays), resulting in the true monthly cost:

image

In this example the monthly cost for January 2018 is £1287.56, which is only slightly more than the £1104.48 you would pay to have an S1 instance for a whole month. You can download a copy of the workbook here.

In summary, my point here is that pricing an implementation of Azure Analysis Services is complex because of its flexibility. I am not saying that Azure Analysis Services is cheap, or cheaper than using Analysis Services on a VM in Azure or on premises – that’s a subject for a completely separate discussion. Hopefully, though, this post gives you a better idea of how much you might pay if you do use Azure Analysis Services for a project.

UPDATE 10th June 2017: the number of query replicas for scale-out has been limited by Azure region, so choosing the right Azure region is very important if you do need to scale out. The documentation is here: https://docs.microsoft.com/en-us/azure/analysis-services/analysis-services-overview#availability-by-region

Using Email Attachments As A Data Source In Power BI

Many of my customers get the source data for their Power BI reports – usually Excel or csv files – sent to them as an email attachment on a regular basis. When they get one of these emails what usually happens next is that they save the attachment to a folder so Power BI can connect to it. This a tedious manual process and of course one that breaks down if they’re off sick or away on holiday. However if you are using Exchange or Exchange Online it is possible to eliminate this step and have Power BI connect directly to the attachment without downloading it. In this post I’ll show you how, using Exchange Online as an example.

First of all, the source data. Here’s an inbox with the usual junk in it; the most recent three emails are from the same person, with the same subject line, and each of them has an Excel file attachment called MailSalesDemo.xlsx:

image

The Excel files contain a simple table called SalesTable with some sales data in it:

image

If you’re going to do this properly it will be better to create a folder in your mailbox, and add a rule to make sure that all the emails with the data you need go to that folder, but I’m going to keep things simple and assume that the emails I need are in my inbox.

Next, you need to connect Power BI to Exchange. There are surprisingly few examples of how to do this, but this video is a good place to start:

https://www.youtube.com/watch?v=wvBR41V7_Yk

…and Ken Puls has good post (it’s written for Excel/Power Query but it’s all relevant for Power BI) here:

https://www.excelguru.ca/blog/2014/01/09/using-powerquery-with-email/

 

image

In my case after I selected the Microsoft Exchange Online data source all I needed to do was enter my email address in the Mailbox Address dialog box, and then in the authentication dialog select Microsoft Account and sign in, to get access to my data.

Once you have connected the Navigator pane will appear; you should select Mail and then click the Edit button:

image

 

At this point the Power Query Editor screen will open and you will see a table containing one row for each email in your mailbox. You will now have to filter your emails so you only see the emails with the attachments containing data (don’t worry if there is more than one of these emails though). You will need to something like this:

  • Filter by the Folder Path column so you only get the emails in the relevant folder
  • Expand the Sender column so you can see the name and email address of the sender of each email, and filter so that you only get emails from the relevant person (assuming that these emails are always have the same sender)
  • Filter by the Subject column (assuming that these emails always have the same subject line)
  • Filter the Has Attachments column so you only get the emails with attachments – ie where the Has Attachments column contains a TRUE value

Once you’ve done this you should see a table that looks something like this (I have removed all non-relevant columns too to make things clearer):

image

The final step is to find the most recent email and get the attachment from it. To do this:

  • Sort the table in descending order by the DateTimeReceived column

image

  • Go to the Home tab on the ribbon, click the Keep Rows/Keep Top Rows button and enter the value 1

image

  • Click on the value Table in the Attachments column of the one remaining row in the table:

image

  • This will display a table with one row for each attachment on the selected email. If there is more than one attachment (which is not the case in my example) filter the table of attachments, for example by the Name column, so that you only have the attachment you want to get data from. Click on the value Binary in the AttachmentContent column of the one row:

image

  • From this point on the experience should be very similar to the one you get when you connect direct to an Excel file: you should see a table containing all the worksheets, tables and named ranges in the Excel file attachment. Click on the Table value in the Data column for whatever worksheet/table/named range you need for your report and then use the Power Query Editor as normal, click Close and Apply and design your report. In my case I clicked on the Table link for the SalesTable table:

image

 

Here’s the very basic report I built:

image

After you have published your report you will also need to set up scheduled refresh on the dataset for a reasonable amount of time after whenever you expect to have received the email containing the data. There’s no need to use a Gateway if you are using Exchange Online, so in the Settings pane for the dataset in PowerBI.com choose Connect directly:

image

Under Data Source Credentials choose OAuth2 in the Authentication method dropdown box and then sign in:

image

Once you’ve done this, every time your report refresh it will use the data from the attachment in the most recently-received email and you have one less thing to remember to do every day. To be honest it’s a bit of a hack but it works so long as the emails you receive always come from the same sender, with the same subject, same attachment and so on.

It’s a shame that Flow doesn’t have a “refresh Power BI dataset” action built-in (why doesn’t it??) because if it did, it would be very easy to refresh the dataset whenever an email with data was received. I know this is possible with Flow if you call the Power BI API but that involves a lot of technical skill to set up. Thinking about it, if you use Flow you could probably solve the problem in a completely different way by saving the email attachment to OneDrive every time an email arrived… but that needs a separate blog post.

A Few More Power BI Roadmap Details From The Dynamics 365 Release Notes

Back in March, with the publication of the Dynamics 365 Release Notes document (which you can download here, halfway down the page), we learned a lot about the roadmap of Power BI; my blog post here summarises the announcements. However, on Twitter Matthew Roche reminded me that it is a living document and indeed it turns out that it has been updated several times since March. So what has been added since then about the future of Power BI?

The change history section at the start helps identify what’s new in the document, but it’s not easy to tell what genuinely new Power BI features have been announced. There are plenty of changes to availability dates for sure. Here’s what little I’ve found in terms of new, interesting stuff (page numbers are for document version 18.1.2):

  • Filtering in the data view (p205) – finally! This is a really useful feature from Excel’s Power Pivot window that allows you to sort and filter the data that is shown in the data view after you have loaded data into tables in your dataset:

image

  • Various improvements to SAP BW and SAP Hana data connectors (p241) – I don’t think some of the details here have been officially announced yet, but I’m not an SAP person so I could be wrong.
  • Power BI custom connectors written in M will also work in Flow, PowerApps and Logic Apps (p261). This is something that Matt Masson talked about in his session at the Dublin Power BI conference too, but which I don’t think has been mentioned anywhere else. This makes custom connectors even more powerful!

Let me know if you find anything else! I’ll need to remember to check for changes to this document on a regular basis…

More Details On Creating Tables In Power BI/Power Query M Code Using #table()

About two years ago I wrote a blog post describing how the #table M function can be used to generate tables, but in that post I only covered the functionality I used regularly – namely using #table with a list of column names or a table type in the first parameter. However there two other variations on #table that I have used recently that I thought were worth pointing out.

For example, if you need to generate a table with a set number of columns but you don’t care what the columns are called, you can use an integer in the first parameter to get a table with that number of columns. The following expression returns a table with four columns of data type Any called Column1, Column2, Column3 and Column3, and no rows:

#table(4,{})

image

Also, if you have a list of lists with an unknown number of items in and you want to use each nested list for the row values in a table, you can use a null value in the first parameter of #table. The following expression returns a table with four columns like the one above, but with two rows of integer values:

#table(null, {{1,2,3,4},{2,3,4,5}})

image

A New Approach To Handling SSRS Multi-Valued Parameters in DAX Queries

Back in 2012 I wrote a blog post explaining how to handle multi-value parameters in DAX queries in Reporting Services reports. The approach I took back then was basically:

  1. Generate a string containing a pipe-delimited list of all the parameter values that the user has selected (I did this in DAX too, but to be honest it’s better to use the SSRS Join function to do this outside the DAX query)
  2. Use the DAX PathContains() function in a Filter() to check whether the value on the current row of the table being filtered appears in the pipe-delimited list

Here’s a deliberately simplified example of how this works based on Adventure Works DW data. The following query filters the FactInternetSales fact table and returns the rows for the Sales Order Numbers that are present in the OrderList variable:

EVALUATE
VAR OrderList = "SO43713|SO43758|SO43784|SO43821"
RETURN
    FILTER (
        FactInternetSales,
        PATHCONTAINS(OrderList, FactInternetSales[SalesOrderNumber])
    )

The trouble with this approach is that is that it can be very slow. Running a trace in DAX Studio for the query above reveals the problem:

image

The presence of CallbackDataID shows that the Storage Engine is calling the Formula Engine to handle the use of PathContains() in the filter, and this is often a cause of poor query performance. However back when I wrote the post the only alternative was, as Chris Koester points out here, to dynamically generate the entire DAX query as an SSRS expression and that is very painful to do.

The good news is that recent changes in DAX mean that there is another way to tackle this problem that can give much better performance. Here’s an example of this new approach:

EVALUATE
VAR OrderList = "SO43713|SO43758|SO43784|SO43821"
VAR OrderCount =
    PATHLENGTH ( OrderList )
VAR NumberTable =
    GENERATESERIES ( 1, OrderCount, 1 )
VAR OrderTable =
    GENERATE (
        NumberTable,
        VAR CurrentKey = [Value]
        RETURN
            ROW ( "Key", PATHITEM ( OrderList, CurrentKey ) )
    )
VAR GetKeyColumn =
    SELECTCOLUMNS ( OrderTable, "Key", [Key] )
VAR FilterTable =
    TREATAS ( GetKeyColumn, FactInternetSales[SalesOrderNumber] )
RETURN
    CALCULATETABLE ( FactInternetSales, FilterTable )

Broken down variable by variable, here’s how it works:

  1. OrderList is the pipe-delimited list of key values passed from SSRS
  2. OrderCount uses the PathLength() DAX function to find the number of parameter values in this list
  3. NumberTable uses the GenerateSeries() function to create a table of numbers with one row for each number between 1 and the number of parameter values in the list
  4. OrderTable uses the trick Marco describes here to iterate over NumberTable and, for each row, uses the PathItem() function to return one parameter value from the list for each row in the able
  5. GetKeyColumn uses the SelectColumns() DAX function to only return the column from OrderTable that contains the parameter values
  6. FilterTable uses the TreatAs() DAX function to take the table of values returned by GetKeyColumn and treat them as values in the FactInternetSales[SalesOrderNumber] column
  7. Finally, the query returns the contents of the FactInternetSales table filtered by the values in FilterTable using the CalculateTable() DAX function

There’s a lot of extra code here and in some cases you may find that performance with smaller data volumes is worse as a result, but in this particular case the new approach is twice as fast at the old one. There’s certainly no CallBackDataID:

image

How To Tell Whether Query Folding Is Taking Place When Importing Data From Analysis Services In Power BI And Excel Power Query

As a quick follow-on from last week’s post on how to detect whether query folding is taking place when importing from OData data sources, if you’re importing data from Analysis Services you have a similar problem: how do you know whether query folding is taking place? Ensuring that query folding takes place for as many of the steps in your query – especially those that filter or otherwise reduce the amount of data returned – is very important for data refresh performance.

Although the Power Query engine generates MDX queries when importing from Analysis Services in the same way it generates SQL queries when it imports from a relational database, the View Native Query option doesn’t work for Analysis Services data sources. You can of course use a Profiler trace or xEvents to see the MDX, but for most users that will not be an option for security reasons. The UI does tell you in another way though. If query folding is taking place for a step, then the Cube Tools menu on the ribbon (with the Add Items and Collapse Columns buttons) will be available, and in the top right-hand corner of the table in the results area there will be a cube icon:

image

If query folding is not taking place for a step (even though it might be taking place for previous steps in the query) then the Cube Tools menu will not be visible, and the cube icon will be replaced by a table icon. For example, in the following screenshot an Index column has been added to the query shown above, so query folding is not taking place from this step on:

image

[Thanks to Jure Jaklic for pointing this out]

Troubleshooting Data Refresh Performance Issues With OData Data Sources In Power BI And Excel Using Fiddler

A lot of people have problems with the performance of OData data sources when loading data into Power BI and Excel. One possible cause of these problems is query folding not taking place – if this is the case then the Power Query engine will be requesting more data that is needed and applying any filters itself locally, rather than requesting filtered data from the data source. How do you know whether query folding is taking place or not though? The Power Query Editor UI doesn’t give you this information, unfortunately. Last week, at the Data and BI Summit in Dublin, Matt Masson demonstrated how to use Fiddler to check if query folding is taking place when loading data from an OData data source and he’s very kindly allowed me to write about what he showed here.

The following examples use the UK Parliament’s public OData API which is documented here. The M query below, generated using the Power Query Editor in Power BI Desktop, returns all the rows from the Government Organisation table from the API:

let
    Source = OData.Feed("https://api.parliament.uk/odata"),
    GovernmentOrganisation_table =
	Source{[
		Name="GovernmentOrganisation",
		Signature="table"
		]}[Data]
in
    GovernmentOrganisation_table

image

To monitor communication between Power BI Desktop and the OData API you will need to install Fiddler, a free tool from Telerik. You can download it here:

https://www.telerik.com/fiddler

You’ll probably also need to configure Fiddler to intercept https traffic, which you can find out how to do here. It’s a very powerful tool and I am by no means an expert in using it, but even a basic understanding of its features can be very useful for a Power BI developer.

With Fiddler running you can add a filter so that it only shows traffic to certain hosts; in this case I’m filtering to only show traffic to api.parliament.uk:

image

Refreshing the query shown above in the Query Editor results in the following activity being shown in Fiddler:

image

There are several calls to the service root URL, but the important call is to get the top 1000 rows from the GovernmentOrganisation table:

https://api.parliament.uk/odata/GovernmentOrganisation?$top=1000

Notice the use of the $top query option to restrict the number of rows returned – the first example of the Power Query engine pushing a filter back to the data source. This only happens when you refresh the query in the Power Query Editor so you can see a sample of the data; when you click the Close and Apply button and load the query into your dataset you’ll see that this $top filter is not applied and all the rows from the table are requested.

Altering the query to filter the rows down to where the GroupName column equals “Cabinet Office” like so:

image

Results in the following M query:

let
    Source = OData.Feed("https://api.parliament.uk/odata"),
    GovernmentOrganisation_table =
	Source{[
		Name="GovernmentOrganisation",
		Signature="table"
	]}[Data],
    #"Filtered Rows" =
	Table.SelectRows(
		GovernmentOrganisation_table,
		each ([GroupName] = "Cabinet Office")
	)
in
    #"Filtered Rows"

Fiddler shows the following call that includes the filter:

image

https://api.parliament.uk/odata/GovernmentOrganisation?$filter=GroupName%20eq%20’Cabinet%20Office’&$top=1000

[The Power Query Editor does quite a lot of caching so you may need to click the Refresh button on the ribbon to make sure it actually does call the API]

It can be quite difficult to work out what’s going on in a call like this, so with the highlighted row in the screenshot above selected you can go to the Inspectors tab and then the WebForms sub-tab, and it will show the different query options used:

image

In this case you can see the $filter query option has been used to do the filter, so query folding has taken place in this case and the API is only returning the one row that the query returns:

image

However it is all too easy to do something in your query that prevents query folding from happening. For example if you add an index column to the table before filtering by GroupName, as shown in this query:

let
    Source = OData.Feed("https://api.parliament.uk/odata"),
    GovernmentOrganisation_table =
	Source{[
		Name="GovernmentOrganisation",
		Signature="table"
	]}[Data],
    #"Added Index" =
	Table.AddIndexColumn(
		GovernmentOrganisation_table,
		"Index",
		0,
		1),
    #"Filtered Rows" =
		Table.SelectRows(
			#"Added Index",
			each ([GroupName] = "Cabinet Office"))
in
    #"Filtered Rows"

 

image

…then Fiddler shows that only the $top filter is being applied in the call to the API, not the $filter on GroupName, so query folding is no longer taking place for the filter step:

image

image

Not everything you do in your M queries can or will be folded back to an OData API, but in general you should aim to get query folding to take place on the steps in your query that reduce the amount of data returned by the API the most. Row filters like the one shown above are a prime example of the type of transformation that will need to be folded in order to get the best possible data refresh performance. You may need to experiment with reordering steps and applying transformations in different ways to get the Power Query engine to call the API in the way you want.

%d bloggers like this: