While the documentation on how to import data from Azure Data Lake Gen2 Storage into Power BI is pretty detailed, the connector (which at the time of writing is in beta) that supports this functionality in the Power Query engine has some useful functionality that isn’t so obvious. If you look at the built-in documentation on the AzureStorage.DataLake M function in the Power Query Editor you’ll see there are a lot of options that aren’t in the documentation on the web yet:
These options are:
BlockSize: the number of bytes to read before waiting on the data consumer. The default value is 4MB.
RequestSize: the number of bytes to read in a single HTTP request to the server. The default value is 4MB.
ConcurrentRequests: The ConcurrentRequests option supports faster download of data by specifying the number of requests to be made in parallel, at the cost of memory utilization. The memory required is (ConcurrentRequest * RequestSize). The default value is 16.
HierarchicalNavigation: A logical (true/false) that controls whether the files are returned in a tree-like directory view on in a flat list. The default value is true.
All of these options derserve more detailed examination, but in this post I’m going to focus on the HierarchicalNavigation property.
Say you have the following set of files and folders in ADLSGen2: at the root level there’s a csv file called SimpleSales.csv and a folder called ParentFolder; inside ParentFolder there’s a folder called ChildFolder; and inside ChildFolder there’s another csv file called SimpleSales2.csv.
When you connect first in the Power Query Editor you’ll see a table that looks like this (there are some other columns but I’ve removed them to make the screenshot legible):
In this table there are two rows, one for each csv file, and a Folder Path column that shows where each file sits within the folder structure. Here’s the M code for this query:
If you alter this to use the HierarchicalNavigation option, like so:
…you’ll see a different table is returned by the query:
In this case the two rows show ParentFolder and SimpleSales.csv; if you click on the Table link in the first row of the Content column you can drill down to ChildFolder; if you click on the Table link with ChildFolder you’ll see SimpleSales2.csv:
If you have a large number of files and folders in ADLSGen2 this way of viewing them is likely to be much easier to work with, I think.
Just a quick post to let you know that an interview I recorded for the BIFocal Show podcast at the Microsoft Business Applications Summit in Atlanta a few months ago is now available for your listening pleasure:
John White and Jason Himmelstein, like less-furry versions of Paginated Report Bear, speak to all the top people in the world of Microsoft BI (they spoke to Marco Russo last week) so I highly recommend subscribing if you don’t do so already.
In my last post I showed lots of examples of how Power BI’s new custom format string feature can be used to format numbers. This post, looking at dates and times, will be a bit different for two reasons: there are a lot more useful examples of custom date and time formats built into Power BI Desktop, and some of the format placeholders listed in the VBA documentation aren’t supported in Power BI. As a result I’m going to concentrate on some useful formats that aren’t covered well by the examples and highlight a few things that aren’t possible right now.
Here’s the table containing the sample data, in the form of date/time values, for my examples:
[Note that the dates shown above are formatted in dd/mm/yyyy format]
The reason I’ve used date/time values for my examples is that they can be used to demonstrate formats for values of data type date and data type time, as well as data type date/time. As with my previous post I’m going to create a series of measures to show the effects of different format strings, each with the same DAX definition:
Eg1 = SELECTEDVALUE(Examples[DateAndTime])
With the default format of :
…applied, here’s what the output looks like in a Power BI report:
Let’s start by looking at date formats. The first thing to point out is that you can format a date/time so it only shows the date part and not the time and vice versa. For example, applying the custom format string:
…where dd is day number, mm is month number and yyyy is a four digit year, gives you:
If you’re American and want your months to come before your days you can simply swap the dd and the mm, for example with the format string:
You’re not forced to use a / as your separator in a date; in fact you can use any character. For example, the custom format:
If you don’t want a leading zero in front of your day or month number you can use a single d or m, and if you want a two-digit year you can use yy instead of yyyy. So, for example:
You can add full day names and month names using dddd and mmmm, so the format:
dddd dd mmmm yyyy
You can also get abbreviated day and month names using ddd and mmm, so:
ddd dd mmm yyyy
Last of all, for dates, if you want to make it clear that you’re not showing really, really, really old data you can put a g on the end of your date format like so:
…and you get the following:
Although the VBA documentation talks about showing day number of week, week number and quarter number, I haven’t found a way of making that work in Power BI (although I may have missed something and it may be possible).
Times are a bit more straightforward. The main placeholders are hh for hours, mm or nn for minutes and ss for seconds. So for example:
…both give you times formatted using the 24-hour clock as follows:
If you prefer to use the 12-hour clock you can add AM/PM onto the end of your format string, so:
The VBA documentation has several variants on AM/PM with slightly different outputs, but this is the only format that I could make work in Power BI.
The very last thing to mention is that, at the time of writing, although Power BI can store times with millisecond values there is no way to make milliseconds appear in a formatted time or date/time. For example in all the screenshots above there are two rows displaying 11/11/2019 15:15:15; they appear as different rows in the table because the millisecond values for each are different, but there’s no way of formatting these values to show that they are different. Hopefully this will be rectified soon; in the meantime you will need to store the millisecond part of any time or date/time separately in a different column in your dataset if you want to display it.
You can download the example .pbix file for this post here.
Now that we can apply custom format strings to fields and measures in Power BI in the September 2019 release, I thought it would be useful to provide some examples of what’s possible with this very flexible new feature because the existing documentation for VBA isn’t easy to make sense of. In fact there’s so much to say I’m going to have to write a series of blog posts to cover everything! In this first post I’m going to look at formatting numbers.
First of all, here’s the source data I’m going to use for my examples:
I’m going to create a whole series of identical measures defined like this:
SalesEg1 = SUM('ExampleTable'[Sales])
…and apply different custom format strings to each one so you can compare the output in a Power BI table visual. For reference, here’s what a blank custom format gives you with this measure:
Let’s start with the basics of formatting numeric values. The first thing to point out is that custom format strings are built up using a series of placeholder characters that allow you to control things like thousands separators, the number of decimal places, whether digits are displayed in a placeholder and so on.
Setting the number of decimal places
As you can see in the screenshots above, two of the values have four decimal places but by default only two decimal places are shown. To always show three decimal places, use the following format string:
Here’s the result:
In this case the 0 is a placeholder for a digit that must always be displayed and the . is the decimal separator; three 0s after the . means you always get three decimal places for non-blank numeric values.
You may have noticed in the last screenshot that all numbers show three decimal places, even the value for Pears and the Total. If you don’t want the decimal places to appear – or indeed you don’t want a digit to appear in a particular place if it’s a zero – you can use a # character as a placeholder instead. The following format string:
…always shows a zero before the decimal separator, but will only show the decimal places if they aren’t zeroes:
If you want to display a thousands separator in your numbers you can use a comma placeholder in your format string, like so:
If you have values that you want to display as percentages, you can use the % placeholder as follows:
Notice that two things have happened here:
A percentage sign has been added to the end of each value
The values appear to have been multiplied by 100. They actually haven’t, but the percentage format makes them look as though they have been. Any calculations that reference this measure will still get the unmultiplied value as you would expect.
If you want currency symbols to appear in your format string you can just add them in either before or after the main part of your format string. For example to put a UK pound sign in a format string you can use the following:
Different formats for positive values, negative values and zeroes
If you need to format positive values, negative values and zeroes differently, you can add up to three different sections to your custom format string separated by a semi colon, as follows:
In this case notice how the positive values have one decimal place, the negative value has three decimal places and the zero has no decimal places. In Analysis Services Multidimensional it used to be possible to add a fourth section to format blanks/nulls, but that does not seem to work here unfortunately…
Formatting negative values with parentheses
A common requirement in financial reporting is to format negative values with parentheses (round brackets) instead of a minus sign, and that’s possible with custom format strings. For example:
You can escape individual characters in your format string by preceding them with a \ placeholder. Say you wanted a # to actually appear in your formatted output and not have it considered as a placeholder, you could use the following:
You can also include whole chunks of text by putting it in double quotes, like so:
That’s enough for today; tune in for my next post with even more examples!
You can download the sample pbix file for this post here.
In Shabnam Watson’s recent blog post on a bug she found when trying to create a Live connection from Power BI to Analysis Services she mentioned that the AutoSetDefaultInitialCatalog server property could be used to solve her problem. This piqued my interested because I’d seen this property but had no idea what it did exactly or why it was there. Luckily, now I work for Microsoft, it’s even easier for me to find out about things like this from the dev team and Akshai Mirchandani was able to help.
First of all, what does it do? The documentation on this property has just been added here, and this is what it says:
A Boolean property. When set to true, new client connections automatically default to the first catalog (database) the user has permissions to connect to.
When set to false, no initial catalog is specified. Clients must select a default catalog prior to running queries or discover operations against a database on the server. If no default catalog is specified, an error is returned. If Initial Catalog property is specified in the connection string, the default catalog will be applied from this property.
The default value for this property is true.
Let me illustrate what this means. Say you have an instance of Analysis Services (in this case it’s Tabular, but it could be Multidimensional) with two databases on it:
I’ve expanded the Roles tab for each database reasons that will become clear later.
Next, let’s say you run a simple trace on this server looking at the Discover End and Session Initialize events:
…and while this trace is running, you open up SQL Server Management Studio and connect to the SSAS instance. Here’s what you see in Profiler:
Now, just to be clear, all I did was open up SQL Server Management Studio and connect to the instance. I did not open up a DAX query window or anything like that; all that happened was the list of databases on the instance was displayed in the Object Explorer pane.
The interesting thing to notice from the trace above is that when I did that there are five Session Initialize events and even though the Database column in Profiler is blank, you can see from the list of role names in the TextData column that in each case a connection has been made to the Adventure Works Internet Sales database.
This is because when you open a connection to Analysis Services and do not set the Initial Catalog connection string property, what happens is that you will get a connection to the default database on the instance. Which database is the default? It’s just the first database that the user has permission to access on the instance, which is a bit random.
This happens at other times too. Let’s say you right click on the EmptyDB database and process it in SQL Management Studio:
Here’s what I see in Profiler:
In this case there are three connections to the default database, Adventure Works Internet Sales, when the database I am processing is EmptyDB!
Most of the time these unnecessary connections have no impact at all but sometimes they can cause problems such as the ones Shabnam describes in her blog post. For example:
It can cause performance problems, because there is an overhead to opening a connection – for example roles are evaluated when a connection is opened
Monitoring and auditing gets complicated because, as you can see from the traces above, there are a whole lot of connections to the default database taking place that you aren’t expecting
Most importantly, when a connection is opened a read-commit lock is acquired on that database and in a few rare cases this can cause deadlocks and other locking-related issues
This is why the AutoSetDefaultInitialCatalog server property was introduced. With this server property set to False, when you open a connection to SSAS with no Initial Catalog set, then you get a connection with no database set. You can find this server property in SQL Server Management Studio in the Analysis Server properties dialog (which you can find by right-clicking on your instance name, selecting Properties, and going to the General tab) and checking the Advanced (All) Properties box.
With AutoSetDefaultInitialCatalog set to False, here’s what Profiler shows when I rerun my original test of connecting to SQL Server Management Studio:
Note that there are now no Session Initialize events now.
Here’s what opening up a new MDX query window in SQL Management Studio shows with AutoSetDefaultInitialCatalog set to False if you don’t explicitly set a database when you connect:
Note the empty database dropdown box on the toolbar and the “Error loading metadata: No cubes were found” error message shown in the Metadata pane.
So why didn’t the dev team set AutoSetDefaultInitialCatalog to False by default on new instances? The problem with doing this is that it is a potential breaking change that could cause errors in some client tools. I’m not aware of any specific cases where this might happen but if you did decide to change AutoSetDefaultInitialCatalog to False on your instance you would need to test thoroughly to make sure it didn’t break anything. My feeling is, though, it is probably a good idea to AutoSetDefaultInitialCatalog to False on production servers and do the appropriate testing just in case those unnecessary connections are causing problems.
If you’re a Power BI fan there are three possible answers to the question “Did you know that the second edition of The Definitive Guide To DAX has just been published?”:
Answer#1: Yup, I’ve already got my copy!
If this is your answer there’s no need to read any further.
Answer #2: What’s “The Definitive Guide To DAX”?
If, on the other hand, you’re new to Power BI and this is what you’re thinking then I should explain that “The Definitive Guide To DAX” is a book by Marco Russo and Alberto Ferrari and is what its title suggests it is – the sum total of human knowledge about the DAX calculation and query language used by Power BI, written by the two people who know most about it outside the development team. Marco and Alberto are friends of mine but I don’t think anyone can accuse me of bias when I say that it’s a book that every Power BI developer needs to own, so go out and buy it! If you use Power BI you need to learn DAX and while this book may not be a simple step-by-step tutorial it has in it somewhere answers to just about every question you’ll ever ask about DAX – and, more importantly, the answers it has are as correct and as up-to-date as they possibly can be. I can tell you that it’s proved invaluable to me in my work at least twice in the last week alone.
Answer #3: Yes, I saw that but I already have the first edition – is it worth buying this one too?
This is a slightly more difficult question to answer, but I’m still going to recommend that you buy the second edition. As Marco says in his announcement blog post, a lot of the existing content has been updated and rewritten and a lot of new content has been added. If you care about following all the latest DAX best practices and you don’t want the new hire in your department to mock you because you’ve never heard of DAX Studio, you need to buy this new edition.
[Note: I didn’t get a free copy of this book for review (yet?) but I have an O’Reilly Online Learning account which means I could read it as soon as it was published]
PS I know someone needs to write the “Definitive Guide to M” but it’s not going to be me, at least not right now.