Hierarchical Navigation For Azure Data Lake Storage Gen2 In Power BI/Power Query

While the documentation on how to import data from Azure Data Lake Gen2 Storage into Power BI is pretty detailed, the connector (which at the time of writing is in beta) that supports this functionality in the Power Query engine has some useful functionality that isn’t so obvious. If you look at the built-in documentation on the AzureStorage.DataLake M function in the Power Query Editor you’ll see there are a lot of options that aren’t in the documentation on the web yet:

image

These options are:

  • BlockSize: the number of bytes to read before waiting on the data consumer. The default value is 4MB.
  • RequestSize: the number of bytes to read in a single HTTP request to the server. The default value is 4MB.
  • ConcurrentRequests: The ConcurrentRequests option supports faster download of data by specifying the number of requests to be made in parallel, at the cost of memory utilization. The memory required is (ConcurrentRequest * RequestSize). The default value is 16.
  • HierarchicalNavigation: A logical (true/false) that controls whether the files are returned in a tree-like directory view on in a flat list. The default value is true.

All of these options derserve more detailed examination, but in this post I’m going to focus on the HierarchicalNavigation property.

Say you have the following set of files and folders in ADLSGen2: at the root level there’s a csv file called SimpleSales.csv and a folder called ParentFolder; inside ParentFolder there’s a folder called ChildFolder; and inside ChildFolder there’s another csv file called SimpleSales2.csv.

image

image

image

When you connect first in the Power Query Editor you’ll see a table that looks like this (there are some other columns but I’ve removed them to make the screenshot legible):

image

In this table there are two rows, one for each csv file, and a Folder Path column that shows where each file sits within the folder structure. Here’s the M code for this query:

let
    Source = 
    AzureStorage.DataLake("https://x.dfs.core.windows.net/powerbi")
in
    Source

If you alter this to use the HierarchicalNavigation option, like so:

let
    Source = 
    AzureStorage.DataLake(
        "https://x.dfs.core.windows.net/powerbi",
        [HierarchicalNavigation=true]
        )
in
    Source

…you’ll see a different table is returned by the query:

image

In this case the two rows show ParentFolder and SimpleSales.csv; if you click on the Table link in the first row of the Content column you can drill down to ChildFolder; if you click on the Table link with ChildFolder you’ll see SimpleSales2.csv:

image

If you have a large number of files and folders in ADLSGen2 this way of viewing them is likely to be much easier to work with, I think.

%d bloggers like this: