How Much Does Azure Analysis Services Actually Cost?

It might seem strange to write a blog post on how much Azure Analysis Services costs when there is a page on the Microsoft site that very clearly lists all of the SKUs and how much they cost per hour or per month in any currency and Azure region you choose:

https://azure.microsoft.com/en-gb/pricing/details/analysis-services/

The prices you see here are somewhat misleading though. They’re not wrong, but they do not challenge the assumption that you will  use Azure Analysis Services in the same way as an on-premises instance and therefore they allow you to make further, incorrect assumptions about cost. A lot of my customers look at the monthly cost and assume that’s what they will actually be paying monthly – and more often than not come to the conclusion that Azure Analysis Services is too expensive for them. In fact the situation is a lot more complex and Azure Analysis Services (especially at the enterprise level) might end up being cheaper than you think. Let me explain why.

The workload of any Analysis Services instance, whether on-premises or in Azure, varies a lot. For example:

  • It will be busy during office hours while users are running queries but much quieter at night when most users have gone home
  • Similarly, it will be busy during the week when most users are working and much quieter on weekends and public holidays
  • It will be busy at certain times, such as month-ends or Black Friday, when more reports need to be run
  • It will be much busier (often with both CPU and memory usage at their peak) when processing is taking place

When you are planning an on-premises deployment of Analysis Services you need to specify your hardware and licensing so as to be able to handle these periods of high usage, even if for most of the time usage is a lot lower. This means that enterprise-level deployments of Analysis Services can be expensive because you need servers with a large number of cores and a lot of RAM and you may also need to use network load-balancing to scale out over several servers.

On the other hand Azure Analysis Services is able to scale up and scale out on demand, and you only pay for what you use. Scaling up means moving to a higher performance level (ie a SKU) within a service tier, or even moving up a service tier. Scaling out means adding replicas of your existing Azure AS instance and database.

Broadly speaking you need to scale up in two scenarios:

  • To handle the need for more memory and more QPUs while you are processing
  • To handle increased data volumes, either as a result of new tables being added to the model or because the size of existing tables has increased over time

You need to scale out when:

  • You need to process during times when other users are running queries, to ensure that query performance is not affected
  • You need to handle an increased number of concurrent users running queries

Bill Anton has an excellent blog post covering this question in a lot more detail here:

http://byobi.com/2017/11/when-to-scale-up-or-scale-out-with-azure-analysis-services/

One other thing to point out is that if you pause an instance of Azure Analysis Services you pay absolutely nothing.

What does this mean for the cost of Azure Analysis Services? Basically, if you’re taking advantage of these features you won’t pay one of the monthly prices quoted on the pricing page linked to at the top of this post. Instead you may do things like:

  • Scale up for one hour every day when you need to process your SSAS database, just to get the extra memory and QPUs needed, then scale down when processing has finished
  • Scale out only on certain days, or certain times of day, to handle increased numbers of users
  • Pause your instance when you are sure that no-one needs to run queries

How do you then calculate the likely cost? For my Azure Analysis Services precon at SQLBits a few months ago I built an Excel workbook that shows how to go about this. First, there is a table with the hourly costs for S-level instances in GBP:

image

Then there is a table with one cell for every hour of every day of the week, with the performance level required for that hour. In this example most of the time an S1 instance is required except for at midnight every night where processing is taking place and an S2 is needed. For four hours on Sunday morning the instance is paused.

image

Next there is a similar table showing the number of scale-out replicas needed for each hour of each day. In this case scale-out is needed for four hours of the day, Monday to Friday, to handle a larger number of concurrent users:

image

Next, the data from these three tables is brought together to calculate the cost per hour for each day of the week:

image

Finally, for a given month the actual cost per day can be calculated (which of course varies by the number of weekdays, weekends and maybe also public holidays), resulting in the true monthly cost:

image

In this example the monthly cost for January 2018 is £1287.56, which is only slightly more than the £1104.48 you would pay to have an S1 instance for a whole month. You can download a copy of the workbook here.

In summary, my point here is that pricing an implementation of Azure Analysis Services is complex because of its flexibility. I am not saying that Azure Analysis Services is cheap, or cheaper than using Analysis Services on a VM in Azure or on premises – that’s a subject for a completely separate discussion. Hopefully, though, this post gives you a better idea of how much you might pay if you do use Azure Analysis Services for a project.

UPDATE 10th June 2017: the number of query replicas for scale-out has been limited by Azure region, so choosing the right Azure region is very important if you do need to scale out. The documentation is here: https://docs.microsoft.com/en-us/azure/analysis-services/analysis-services-overview#availability-by-region

An In-Depth Look At The Csv.Document M Function

CSV files are one of the most commonly used data sources in Power BI and Power Query/Get&Transform, and yet the documentation for the Csv.Document M function is very limited and in some cases incorrect. In this rather long post I’ll show you as many of the capabilities of this useful function as I’ve been able to discover.

The Source parameter

The Csv.Document function returns a table, and the first (and only non-optional) parameter of this function is the source data in CSV format. Normally this is a binary value returned by the File.Contents function. For example take the this simple CSV file with no column headers and one row of data:

image

The following M code uses File.Contents to read the contents of the file, and then passes the contents to Csv.Document to be interpreted as a CSV file:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(Source)
in
    ToCSV

The output is this:

image

However it is also possible to pass text to the first parameter of Csv.Document too, for example:

let
    SourceText = "February,Oranges,2",
    ToCSV = Csv.Document(SourceText)
in
    ToCSV

The output of this query is:

image

In both of these examples I’m relying on the default behaviour of the Csv.Document function with regard to delimiters and other properties, which I’ll explain in more detail below.

Using a record in the second parameter

The second parameter of Csv.Document can be used in several different ways. In code generated by the Query Editor UI it usually takes the form of a record, and the different fields in the record specify how the function behaves in different scenarios. For example, if you connect to the CSV file shown above by selecting the Text/CSV source in the Query Editor UI, you’ll see the following window appear showing a preview of the data and three options:

image

This results in the following M query:

let
    Source = 
	Csv.Document(
		File.Contents("C:\CSVTests\SourceFile.csv"),
		[
		 Delimiter=",", 
		 Columns=3, 
		 Encoding=1252, 
		 QuoteStyle=QuoteStyle.None
		]),
    #"Changed Type" = 
		Table.TransformColumnTypes(
			Source,
			{
			{"Column1", type text}, 
			{"Column2", type text}, 
			{"Column3", Int64.Type}
			})
in
    #"Changed Type"

The query above shows the Csv.Document function with a record in its second parameter containing four fields: Delimiter, Columns, Encoding and QuoteStyle. There is also a fifth field that can be added to the record, CsvStyle, but this cannot be set anywhere in the UI.

The Data Type Detection option shown in the screenshot gives you three options for detecting the data types in each column of your file: by default it looks at the first 200 rows in the dataset, but you can also ask it to look at the entire dataset (which may be slower) or not to detect data types at all, in which case it will treat all columns as text. In this case data types are not set in the Csv.Document function but in the #”Changed Type” step with the Table.TransformColumnTypes function, but as we will see later it is possible to set column names and data types in a single step with Csv.Document instead.

The Encoding field

The File Origin dropdown menu shown above corresponds to the Encoding field in the Csv.Document function. This integer value specifies the code page used to encode the contents of the file:

image

In the M query in the previous section the 1252 code page is set explicitly. The following M query sets the (incorrect) 1200 code page for the CSV file shown above:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(Source,[Encoding=1200])
in
    ToCSV

…with the following result:

image

The Delimiter field

The Delimiter dropdown allows you to specify the delimiter used to separate the columns in each row of data. There are a number of options available through the UI, including commas and tabs, and the Custom option allows you to enter your own delimiter:

image

If you specify a single character delimiter at this point then the Delimiter field of the record in the second parameter of Csv.Document is set; the Custom and Fixed Width options shown here use a different form of the Csv.Document function described below. If the Delimiter record is not set then a comma is used as the delimiter. If you want to use a special character like a tab then you need to use an M escape sequence; for example to use a tab character as delimiter you need to use the text “#(tab)” which returns a text value containing just a single tab character.

For example, the following query:

let
    Source = "123a456a789",
    ToCSV = Csv.Document(Source, [Delimiter="a"])
in
    ToCSV

Returns:

image

And this query:

let
    Source = "789#(tab)456#(tab)123",
    ToCSV = Csv.Document(Source, [Delimiter="#(tab)"])
in
    ToCSV

Returns:

image

The Columns field

The Columns field specifies the number of columns in the table returned by Csv.Document, regardless of how many columns are actually present in the source data. For example, the following query:

let
    Source = "a,b,c",
    ToCSV = Csv.Document(Source, [Delimiter=",", Columns=3])
in
    ToCSV

…returns a table with three columns:

image

While the following query returns a table with four columns, even though only three columns are present in the data:

let
    Source = "a,b,c",
    ToCSV = Csv.Document(Source, [Delimiter=",", Columns=4])
in
    ToCSV

image

And the following query returns a table with only two columns, discarding the third column of data present in the data:

let
    Source = "a,b,c",
    ToCSV = Csv.Document(Source, [Delimiter=",", Columns=2])
in
    ToCSV

image

The Columns field is not explicitly set by the user when you first connect to a CSV file via the UI, but the UI infers its value from the number of columns it finds in the CSV file. This can be a problem if the number of columns in the source data changes in the future because, as shown above, when the Columns field is set the table returned always has a fixed number of columns. As a result if the number of columns in the data source increases in the future you will find columns on the right-hand side of the table are not returned; similarly if the number of columns decreases you’ll see unwanted empty columns. Several people such as Prathy Kamasani have blogged about this problem and it may be better to delete the Columns field from the record, or not set the Columns field in the first place, in order to avoid it. If you do not set the Columns field then Csv.Document returns a table with the number of columns that are present in the first row of your source data.

The QuoteStyle field

The QuoteStyle field can take two possible values of type QuoteStyle: QuoteStyle.None and QuoteStyle.Csv. Here’s what the built-in documentation has to say about the QuoteStyle type:

image

While the value for QuoteStyle is set automatically when you connect to a file, if you edit a step in the Query Editor that uses Csv.Document you can change this value in the UI in the Line Breaks dropdown shown here:

image

As the screenshot above suggests this field controls whether line breaks inside text values are respected. For both QuoteStyle.None and QuoteStyle.Csv, if you wrap a text value inside double quotes those double quotes are used to indicate the start and the end of the text value and are not shown in the output; if you want a double quote to appear, you have to double it up. However if QuoteStyle.None is set then line breaks are always respected, even if they appear inside double quotes; if QuoteStyle.Csv is set, then line breaks inside double quotes are ignored. Take the following CSV file for example:

image

The following M query, using QuoteStyle.None:

let
    Source = File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
    ToCSV = Csv.Document(Source,[QuoteStyle=QuoteStyle.None])
in
    ToCSV

…returns the following table with two rows in it:

image

Whereas the following M query, using QuoteStyle.Csv:

let
    Source = File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
    ToCSV = Csv.Document(Source,[QuoteStyle=QuoteStyle.Csv])
in
    ToCSV

…returns a table with just one row, and a line break present in the text value in the first column:

image

The CsvStyle field

The final field that can be used, CsvStyle, is also related to quotes. It can take one of two values of type CsvStyle: Csv.QuoteAfterDelimiter and CsvStyle.QuoteAlways.

image

If the CsvStyle field is not set, the default is CsvStyle.QuoteAlways. Consider the following CSV file:

image

Notice that on the second line there is a space after the comma. The following M query:

let
    Source = 
	File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
    ToCSV = 
	Csv.Document(
		Source,
		[CsvStyle=CsvStyle.QuoteAlways])
in
    ToCSV

Returns this, because the space before the comma is not treated as significant:

image

Whereas the following M query:

let
    Source = 
	File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
    ToCSV = 
	Csv.Document(
		Source,
		[CsvStyle=CsvStyle.QuoteAfterDelimiter])
in
    ToCSV

Returns the text “four” in double quotes on the second line, because the space before the comma on the second line changes how the double quotes are treated:

image

 

Using a list or a table type in the second parameter

If the first line of your CSV file contains column headers and you connect to the file using the Query Editor user interface, in most cases this will be detected and an extra step will be added to your query that uses Table.PromoteHeaders to use these values as the column headers. However if you don’t have column headers inside your CSV file, instead of a record it is also possible to supply a list of column names or even better a table type to define the columns present in your CSV file. When you do this Csv.Document has three other parameters that can be used to do some of the same things that are possible if you use a record in the second parameter – Delimiter, ExtraValues and Encoding – and they are described below.

For example, in the following CSV file there are three columns: Month, Product and Sales.

image

Using a list of text values containing these column names in the second parameter of Csv.Document, as in the following M query:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(Source,{"Month","Product","Sales"})
in
    ToCSV

Returns the following table:

image

This has set the column names correctly but the data types of the three columns are set to text. What if I know that only the Month and Product columns contain text and the Sales column should be a number? Instead of a list of column names, using a table type allows you to set names and data types for each column:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(
	Source,
	type table 
		[#"Month"=text, #"Product"=text, #"Sales"=number])
in
    ToCSV

image

Notice how now the Sales column has its data type set to number.

The Delimiter parameter

If you have used a list of column names or a table type in the second parameter of Csv.Document, you can use the third parameter to control how each row of data is split up into columns. There are two ways you can do this.

First of all, you can pass any piece of text to the third parameter to specify a delimiter. Unlike the delimiter field of the second parameter described above, this can be a single character or multiple characters. For example, the following M query:

let
    Source = "abcdefg",
    ToCSV = Csv.Document(Source,{"first","second"},"c")
in
    ToCSV

Returns:

image

And the following M query:

let
    Source = "abcdefg",
    ToCSV = Csv.Document(Source,{"first","second"},"cd")
in
    ToCSV

Returns:

image

Instead of text, the Delimiter parameter can also take a list of integer values to allow you to handle fixed-width files. This functionality is available from the UI when you choose the Fixed Width option from the Delimiter dropdown box when you connect to a CSV file for the first time:

image

Each integer in the list represents the number of characters from the start of the row that marks the start of each column; as a result, each integer in the list has to be a larger than the preceding integer. The values are 0-based so 0 marks the start of a row. For example, the M query:

let
    Source = "abcdefg",
    ToCSV = Csv.Document(Source,{"first","second","third"},{0,3,5})
in
    ToCSV

Returns:

image

 

The ExtraValues parameter

The ExtraValues parameter allows you to handle scenarios where there are extra columns on the end of lines. This isn’t quite as useful as it sounds though: most of the time when the number of columns varies in a CSV file it’s because there are unquoted line breaks in text columns, in which case you should make sure your source data always wraps text in double quotes and use the QuoteStyle option described above, or if you can’t fix your data source, see this post.

The ExtraValues parameter can take one of three values of type ExtraValues: ExtraValues.List, ExtraValues.Ignore and ExtraValues.Error.

image

Consider the following CSV file with two extra columns on the second row:

image

The following query reads data from this file:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(Source,{"Month","Product","Sales"})
in
    ToCSV

As you can see from the screenshot below, because we have specified that there are three columns in the table, the error “There were more columns in the result than expected” is returned for each cell on the second line:

image

The same thing happens when ExtraValues.Error is explicitly specified in the fourth parameter, like so:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = 
	Csv.Document(
		Source,
		{"Month","Product","Sales"},
		",",
		ExtraValues.Error
	)
in
    ToCSV

If you set ExtraValues.Ignore instead, though:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = 
	Csv.Document(
		Source,
		{"Month","Product","Sales"},
		",",
		ExtraValues.Ignore
	)
in
    ToCSV

The extra columns are ignored and no errors are returned:

image

Setting ExtraValues.List allows you to capture any extra column values in a list; however, if you want to do this you will need to add an extra column to your table to hold these values. For example, notice in this query that four columns rather than three have been defined:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = 
	Csv.Document(
		Source,
		{"Month","Product","Sales","Extra Columns"},
		",",
		ExtraValues.List)
in
    ToCSV

The output looks like this:

image

On the first and third rows the Extra Columns column contains an empty list. On the second row, however, the Extra Columns column contains a list containing two values – the two values from the two extra columns on that line.

The Encoding parameter

The Encoding parameter corresponds directly to the Encoding field used when you pass a record to the second parameter, as described above. The one difference is that it can take an integer or a value of type TextEncoding, although the TextEncoding data type only contains values for some of the more common code pages so the only reason to use it would be for readability:

image

As a result, the following two M queries:

let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(
	Source, 
	{"Month","Product","Sales"},
	",",
	ExtraValues.Ignore,
	TextEncoding.Windows
	)
in
    ToCSV
let
    Source = File.Contents("C:\CSVTests\SourceFile.csv"),
    ToCSV = Csv.Document(
	Source, 
	{"Month","Product","Sales"},
	",",
	ExtraValues.Ignore,
	1252
	)
in
    ToCSV

…return the same result.

What about CsvStyle and QuoteStyle?

If you specify a list of column names or a table type in the second parameter of Csv.Document there’s no way to set CsvStyle or QuoteStyle – these options are only available when you use a record in the second parameter. The behaviour you get is the same as CsvStyle.QuoteAlways and QuoteStyle.Csv, so with the following source data:

image

This M query:

let
    Source = File.Contents("C:\CSVTests\SourceFileWithQuotes.csv"),
    ToCSV = Csv.Document(
		Source, 
		{"Month","Sales"},
		",",
		ExtraValues.Ignore,
		1252)
in
    ToCSV

returns:

image

Filtering Data Loaded Into A Workspace Database In Analysis Services Tabular 2017 And Azure Analysis Services

The first mistake that all new Analysis Services Tabular developers make is this one: they create a new project in SSDT, they connect to their source database, they select the tables they want to work with, they click Import, and they then realise that trying to load a fact table with several million rows of data into their Workspace Database (whether that’s a separate Workspace Database instance or an Integrated Workspace) is not a good idea when they either end up waiting for several hours or SSDT crashes because it has run out of memory. You of course need to filter your data down to a manageable size before you start developing in SSDT. Traditionally, this has been done at the database level, for example using views, but modern data sources in SSAS 2017 and Azure Analysis Services allow for a new approach using M.

Here’s a simple example of how to do this using the Adventure Works DW database. Imagine you are developing a Tabular model and you have just connected to the relational database, clicked on the FactInternetSales table and clicked Edit to open the Query Editor window before importing. You’ll see something like this:

image

…that’s to say there’ll be a single query visible in the Query Editor with the same name as your source table. The M code visible in the Advanced Editor will be something like this:

let
    Source = 
	#"SQL/localhost;Adventure Works DW",
    dbo_FactInternetSales = 
	Source{[Schema="dbo",Item="FactInternetSales"]}[Data]
in
    dbo_FactInternetSales

At this point the query is importing all of the data from this table, but the aim here is to:

  1. Filter the data down to a much smaller number of rows for the Workspace Database
  2. Load all the data in the table after the database has been deployed to the development server

To do this, stay in the Query Editor and create a new Parameter by going to the menu at the top of the Query Editor and clicking Query/Parameters/New Parameter, and creating a new parameter called FilterRows of type Decimal Number with a Current Value of 10:

image

The parameter will now show up as a new query in the Queries pane on the left of the screen:

image

Note that at the time of writing there is a bug in the Query Editor in SSDT that means that when you create a parameter, close the Query Editor, then reopen it, the parameter is no longer recognised as a parameter – it is shown as a regular query that returns a single value with some metadata attached. Hopefully this will be fixed soon but it it’s not a massive problem for this approach.

Anyway, with the parameter created you can now use the number that it returns to filter the rows in your table. You could, for example, decide to implement the following logic:

  • If the parameter returns 0, load all the data in the table
  • If the parameter returns a value larger than 0, interpret that as the number of rows to import from the table

Here’s the updated M code from the FactInternetSales query above to show how to do this:

let
    Source = 
	#"SQL/localhost;Adventure Works DW",
    dbo_FactInternetSales = 
	Source{[Schema="dbo",Item="FactInternetSales"]}[Data],
    FilterLogic = 
	if 
		FilterRows<=0 
	then 
		dbo_FactInternetSales 
	else 
		Table.FirstN(dbo_FactInternetSales, FilterRows)
in
    FilterLogic 

The FactInternetSales query will now return just 10 rows because the FilterRows parameter returns the value of 10:

image

And yes, query folding does take place for this query.

You now have a filtered subset of rows for development purposes, so you can click the Import button and carry on with your development as usual. Only 10 rows of data will be imported into the Workspace Database:

image

What happens when you need to deploy to development though?

First, edit the FilterRows parameter so that it returns the value 0. To do this, in the Tabular Model Explorer window, right-click on the Expressions folder (parameters are classed as Expressions, ie queries whose output is not loaded into Analysis Services) and select Edit Expressions:

image

Once the bug I mentioned above has been fixed it should be easy to edit the value that the parameter returns in the Manage Parameters pane; for now you need to open the Advanced Editor window by clicking the button shown below on the toolbar, and then edit the value in the M code directly:

image

Then close the Advanced Editor and click Import. Nothing will happen now – the data for FactInternetSales stays filtered until you manually trigger a refresh in SSDT – and you can deploy to your development server as usual. When you do this, all of the data will be loaded from the source table into your development database:

image

At this point you should go back to the Query Editor and edit the FilterRows parameter so that it returns its original value, so that you don’t accidentally load the full dataset next time you process the data in your Workspace Database.

It would be a pain to have to change the parameter value every time you wanted to deploy, however, and luckily you don’t have to do this if you use BISM Normalizer – a free tool that all serious SSAS Tabular developers should have installed. One of its many features is the ability to do partial deployments, and if you create a new Tabular Model Comparison (see here for detailed instructions on how to do this) it will show the differences between the project and the version of the database on your development server. One of the differences it will pick up is the difference between the value of the parameter in the project and on in the development database, and you can opt to Skip updating the parameter value when you do a deployment from BISM Normalizer:

image

Upcoming Power BI and Azure SSAS training in the UK, Israel and the Netherlands

If you have some training budget to spare why not come along to one of the pre-conference seminars/training courses I have coming up in the next few months? Specifically:

  • Introduction to Azure Analysis Services – a pre-conference seminar on Thursday 22nd February at SQLBits 2018 in London. You’ll learn about what Azure Analysis Services is, why you should use it and how to build Tabular models for it. More details and registration here.
  • Power BI for analysts and developers – a three-day course covering the whole Power BI development lifecycle in Herzliya, Israel, on March 12th-14th. Full agenda and registration here.
  • Loading and transforming data in Power BI and Power Query – a pre-conference seminar on Friday 23rd March before the Dutch Power BI User Day in Utrecht. It will cover everything you need to know about the Query Editor in Power BI and Excel, going from the basics to more advanced topics such as parameters, functions and the M language. Full agenda here.

Incidentally, if you’re in London for SQLBits don’t forget to come to the London Power BI User Group meeting on Wednesday 21st February: I’m doing a panel discussion with various other Power BI folks. I’m also going to be speaking at the Dublin Data and BI Summit (run by the Power BI User Group) at the end of April, and at several other user groups later in the year; follow me on Twitter (I’m now @cwebb_bi) to find out the details when they’re available. I hope to see you at one of these events!

Setting SQL Server Connection String Properties In Power BI and SSAS Tabular Modern Data Sources

It may not be immediately obvious, but you cannot set your own connection string properties when connecting to SQL Server using the built-in SQL Server connector from either Power BI or a modern data source in Azure SSAS/SSAS Tabular 2017:

image

All you can do is configure the options that are available in the UI, which in the current version of SSDT looks like this:

image

…and which are documented in the Sql.Databases() M function here.

It turns out that the restriction on using your own connection string properties in the built-in SQL Server connector is a deliberate design decision on the part of the Power Query team because, behind the scenes, they use different providers in different circumstances to optimise performance, and because allowing arbitrary connection string properties might make maintaining backwards compatibility difficult in the future.

While your average Power BI user is unlikely to even notice this, for SSAS Tabular developers it could be a big problem: complete control over the connection string is often necessary in enterprise BI scenarios. What are the alternatives then? Well you can use the OLE DB and ODBC connectors instead:

image

Both of these connectors do allow you to set your own connection string properties. For example here’s the UI for a new ODBC connection in SSDT:

image 

The documentation for the Odbc.DataSource and OleDb.DataSource M functions has more detail on how these connectors can be used and how connection string properties can be set. Remember also that the OLE DB Provider for SQL Server was un-deprecated in October 2017.

However, apart from possible performance differences between the two (which you should test yourself – Henk van der Valk wrote a good post on this for SSAS MD and most of what he said is relevant for Tabular) there’s one less-than-obvious difference between these two options: the OLE DB connector does not appear to support query folding right now whereas the ODBC connector does. Of course this isn’t an issue if you’re writing your own SQL queries to import data, but if you do want to use M functions for partitioning (as I show here) you’re likely to get very poor performance with the OLE DB connector.

Using Your Own SQL Queries For Tables With Modern Data Sources In SSAS 2016+ And Azure Analysis Services

While the integration of the Power Query engine into Analysis Services Tabular 2017 and Azure Analysis Services with modern data sources will certainly bring a lot of benefits, I think it’s fair to say that the implementation has not been entirely painless. One problem is that it is no longer obvious how to specify your own SQL query to populate a table or partition in your Tabular model – and while the Query Editor is great, there are a lot of cases where this is necessary. In this post I’ll show you how to do this.

If you’re used to using the Power Query UI in Excel or Power BI Desktop, you’ll notice that when you connect to a SQL Server database using the SQL Server connector in SSDT:

image

…there is no option to enter your own SQL query when you do so:

image

This is deliberate. In Analysis Services, unlike Power BI and Excel, there is a distinction made between data sources and other M queries that return data from those data sources, one that makes a lot of sense in my opinion. While it is possible to enter your own SQL for other data source types, such as OLE DB connections, a data source object is really intended just to define a connection to a data source and not to define what data you want from that data source.

[You may also notice that there’s a “SQL statement” property on a SQL Server data source visible in the Visual Studio properties pane, but I don’t recommend you use it – it doesn’t seem to work well with the rest of the SSDT/Power Query UI]

To import a table or view in your database all you have to do is right-click on your data source and choose Import New Tables; my blog post from September last year describes how to do this, and how to use M functions for creating partitions.

To use your own SQL queries though you need to write some M code. First, import a table – any table, but preferably a small one – and get to the Query Editor UI. In this case I’ve imported the DimDate table from the Adventure Works DW database:

image

Next, select your query in the Queries pane on the left-hand side of the screen and open the Advanced Editor either by clicking on the relevant button in the toolbar (shown above) or by right-clicking on the query name in the Queries pane. You’ll see the following dialog:

image

The M code will be something like this:

let
    Source = #"SQL/localhost;Adventure Works DW",
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data]
in
    dbo_DimDate

In this example the Source step creates a reference to the data source you have already created, and the dbo_DimDate step gets the contents of the DimDate table from this data source.

You can modify this code to use your own SQL by using the Value.NativeQuery() function (which I have blogged about here):

let
    Source = #"SQL/localhost;Adventure Works DW",
    MyQuery = 
	Value.NativeQuery(
		#"SQL/localhost;Adventure Works DW",
		"SELECT DISTINCT FiscalYear FROM DimDate"
	)
in
    MyQuery

Here what I’ve done is replaced the dbo_DimDate step in the previous query with a step called MyQuery that uses Value.NativeQuery() to run my own SQL.

image

Now all you need to do is click Import and you have the output of the query loaded into SSAS. It would be nice if there was UI support for using your own SQL queries when importing data in the future. Note that, as soon as you use this method, any other steps or queries further downstream will not be able to perform query folding, so you should make sure that you do as much of your filtering and transformation in the SQL as possible otherwise you may encounter performance problems.

The documentation describes a similar – but not identical – workflow for achieving the same result here. Personally I think it’s counter-intuitive that you should click on Expressions to create a Table object! Expressions are used for functions and other M code that is shared by the M queries used by Tables.

An alternative to doing all this is to go back to the old way of doing things and use a legacy data source rather a modern data source in SSDT. You lose the ability to use the Query Editor and M if you do this, but in a lot of cases you probably won’t care. The 17.4 release of SSDT for Visual Studio 2015, released in December 2017, has exposed a property that allows you to create legacy data sources again easily. In Visual Studio, go to the Tools menu and select Options and in the Options dialog go Analysis Services Tabular/Data Import and check “Enable legacy data sources”:

image

When you do this, you’ll notice two new options when you right-click on Data Sources in the Tabular Model Explorer pane: Import From Data Source (Legacy) and Existing Connections (Legacy).

image

This gives you access to the Table Import wizard that was available in previous versions of Analysis Services Tabular, which not only allows you to enter your own SQL but also creates a legacy data source that in turn makes it easy to use your own SQL when creating partitions.

In-Person Power BI And Azure Analysis Services Training In London

I’m pleased to announce two in-person training dates in London next year.

First of all, I’ll be running a three-day “Introduction to Power BI” course from January 29th to January 31st 2018.  Suitable for BI pros, analysts and anyone who needs to use Power BI to build reports, I’ll be covering data loading, data modelling, a bit of M, a lot of DAX, report design, publishing, security and administration. Full details and registration can be found here:

http://technitrain.com/coursedetail.php?c=84&trackingcode=CWB

Not long after, on February 22nd 2018, I’ll be teaching a full-day preconference seminar at SQLBits in London on Azure Analysis Services. The agenda can be found here:

http://sqlbits.com/information/event17/Introduction_to_Azure_Analysis_Services/trainingdetails.aspx

…but, to be honest, there’s likely to be a lot of cool new functionality released for Azure SSAS between now and then so that will all have to be fitted in too. SQLBits is, of course, the biggest SQL Server/Microsoft data platform conference in Europe, ridiculously good value for money and loads of fun. Pricing and registration details can be found here:

http://sqlbits.com/information/pricing

UPDATE: the SQLBits team wanted me to make a video to promote my precon, so I outsourced the job to my daughter. Here’s the result:

%d bloggers like this: