Power BI Custom Data Connector For Language Detection, Key Phrase Extraction And Sentiment Analysis

I’m pleased to announce that I’ve published my first Power BI custom data connector on GitHub here:

https://github.com/cwebbbi/PowerBITextAnalytics

Basically, it acts as a wrapper for the Microsoft Cognitive Services Text Analytics API and  makes it extremely easy to do language detection, sentiment analysis and to extract key phrases from text when you are loading data into Power BI.

Full documentation for the Text Analytics API can be found here and there is more detailed documentation available for the Detect Language, Key Phrases and Sentiment APIs. You can learn more about Power BI custom data connectors here and here.

Note: you will need to sign up for the Text Analytics API and obtain an access key before you use this custom data connector. You’ll be prompted to enter the access key in Power BI the first time you use the custom data connector. A number of pricing tiers are available, including a free tier that allows for 5000 calls per month. The custom data connector batches requests so that you can send up to 1000 individual pieces of text per call to the API.

Why build a custom data connector for this? Well, first of all, text analysis in Power BI and Power Query is something I’ve been interested in for a long time (see here for example), and I know a lot of other people want to do this too. However, calling any API – and the Microsoft Cognitive Services APIs in particular – involves a lot of tricky M code that is beyond most Power BI users. I certainly didn’t find it easy to write this custom data connector! I know Gil Raviv has blogged about how to use the Sentiment analysis API this data connector calls in two posts (here and here) but he doesn’t handle all the limitations of the API, including the 1MB limit per request, in his examples – which just goes to show what a complex task this is. Wrapping up the code for calling the Text Analytics API in a custom data connector hides this complexity from the developer, makes the code a lot more portable, and the fact that the code is open source means the community can work together to fix bugs and add new features. I welcome any contributions that anyone wants to make and I know there are a lot of improvements that can be made. Certainly the documentation is a bit sparse right now and I’ll be adding to it over the next week or so.

This is not quite a traditional custom data connector in the sense that it doesn’t act as a data source in its own right – you have to pass data to it in order to get data back. It exposes three M functions:

  • TextAnalytics.DetectLanguage(inputtext as list, optional numberoflanguages as number) as table
    This function takes a list of text values and returns a table containing the input text and the language detected in each piece of text
  • TextAnalytics.KeyPhrases(inputtext as list, optional languages as list) as table
    This function takes a list of text values (and an optional list of language identifiers for each piece of text) and returns a table containing the input text and key phrases detected in each piece of text. More than one key phrase may be returned for each piece of text.
  • TextAnalytics.Sentiment(inputtext as list, optional languages as list) as table
    This function takes a list of text values (and an optional list of language identifiers for each piece of text) and returns a table containing the input text and a score representing the sentiment detected for each piece of text.

Here are a few simple examples of how to use these functions:

First, the TextAnalytics.DetectLanguage() function. This query:

let
    input = {"hello all", "bonjour", "guten tag"},
    result = TextAnalytics.DetectLanguage(input)
in
    result

Returns the following table:

image

For the TextAnalytics.KeyPhrases() function, the following query:

let
    input = 
        {
        "blue is my favourite colour", 
        "what time it is please?", 
        "twinkle, twinkle little star, how I wonder what you are"
        },
    result = TextAnalytics.KeyPhrases(input)
in
    result

Returns this table:

image

And for the TextAnalytics.Sentiment() function, the following query:

 let
     input = 
        {
        "this is great", 
        "this is terrible", 
        "this is so-so"
        },
     result = TextAnalytics.Sentiment(input)
in
    result

Returns this table:

image

Because the first parameter of each of these functions is a list, it’s super-easy to pass in columns of data from existing tables. For example, here’s the output of a query that gets the last ten comments from the comments RSS feed of this blog:

image

If this query is called Comments, the following single line of code is all that’s needed to call the TextAnalytics.Sentiment() function for the Comment Text column on this table:

TextAnalytics.Sentiment(Comments[Comment Text])

image

You can download a .pbix file containing several examples of how to call these functions, including all the examples above and many more, here.

I hope you enjoy using these functions, and if you have any questions, find any bugs or want to make suggestions for how they can be improved please let me know via the Issues page on GitHub. Finally, this is my first time using GitHub and if I’ve done something really dumb while publishing the code please let me know what I need to do to fix it!

Thoughts On Power Query/Common Data Service Integration

Yesterday there was a webinar on how Power Query is going to be used as the way to load data into the Microsoft Common Data Service. You can watch it online here (if you’re in a hurry, skip to 24 minutes in for the details on the Power Query integration):

I don’t have much to add to what’s in the webinar, but there are a few things that occurred to me:

  • This is Power Query in a browser. If they can build a web interface for Power Query for CDS, why not for Power BI? It would give us the full power of Power BI Desktop in the browser, on any platform (I know a few people have been asking for Power BI Desktop for Mac), with no tedious manual updating.
    image
  • In the demo at around the 54 minute mark, Miguel shows a screen where there are two Database Load options:
    image
    The “Only load new or modified rows for existing entities” options is… incremental load! This makes me wonder whether Power BI users who want incremental load should be using using the CDS as a staging area (a super-simple data warehouse…?) and then connecting Power BI to it?

I’ll be honest, I’ve not done anything with the CDS so I can’t really say how useful this new functionality will actually be – and I’ve heard mixed reports about the CDS, if I’m honest. Certainly, as someone (I suspect Meagan), mentions in a question, the only way Power BI can connect to the CDS right now is via DirectQuery and not Import, which seems pretty crazy. Still… I’m very curious and will be paying close attention to how it develops. More Power Query in the world can only be a good thing!

Creating Animated Reports In Power BI With The Drilldown Player Custom Visual

Last week I had the chance to do something I have not done before: build a Power BI report to be displayed on a big screen hanging on a wall. To make up for the loss of user interactivity, I used the new Drilldown Player custom visual to cycle through different selections and display a new slice of data every few seconds; Devin Knight’s blog post here has a great summary of how to use it. However I wasn’t happy about the look of the Drilldown Player visual in this particular report: the play/stop/pause buttons aren’t much use if you can’t click on them and the visual doesn’t show all of the values that it is cycling through. As a result I hid the visual behind another one and came up with a different way of displaying the currently-displayed selection.

Here’s a simple example of what I did. Imagine you have two identical tables called Table1 and Table2 loaded into your dataset that contain a list of the 24 hours in a day:

image

With no relationship between these tables in the dataset, you can display the 24Hour column from one in a table in your report and then use the Drilldown Player to cycle through the values in the 24Hour column in the other. At this point, because there’s no relationship between the tables, the Drilldown Player visual has no effect on the table. Next create a measure called Displayed as follows:

Displayed =
IF (
    SELECTEDVALUE ( 'Table1'[24Hour] ) =
    SELECTEDVALUE ( 'Table2'[24Hour] ),
     UNICHAR ( 8680 ),
     " "
)

…and add it to the table in the report. This measure uses my old favourite the Unichar() function to display an arrow against the row in the table that matches the currently selected hour in the Drilldown Player. The result is this:

CurrentSelection

This got me thinking about other fun stuff that I could do with this technique. After adding some more columns to my source data:

image

…I created the following measure:

Clock = UNICHAR(128335 + MAX('Table1'[Hour]))

This takes the hour selected by the Drilldown Player and displays the corresponding Unicode character for a clock face showing that hour. Here’s what the measure looks like when displayed in a card:

AnimatedClock

I also had a go at an animation showing the sun and moon rising and setting – I did this by displaying the Unicode characters as data labels in a scatter chart, then using colour to hide everything apart from the data labels – but by this stage I thought things were getting too silly…

SunAndMoon

Anyway, you can download the report with these animations in here, and view it online here. Have fun!

Configuring Power BI Gateway Data Sources For Files And Folders

Recently I’ve been building a lot of Power BI reports from csv and Excel files, and to make sure that scheduled refresh works I have been setting up data sources in an On Premises Data Gateway (what used to be called the Enterprise Gateway). I had assumed that if I was connecting to file-based data sources in my Power BI dataset then, in the gateway, I would need to set up one data source for each file that I’m connecting to – which is a bit of a pain. In fact it turns out that you can set up a gateway data source for the folder that the files are in instead.

Let me give you an example. Imagine that you have three Excel files in a folder called C:\Sales Data:

image

Now imagine that you have three queries in Power BI that get data from these three files:

image

Here’s an example of the M code for one of these queries:

let
    Source = 
        Excel.Workbook(
        File.Contents("C:\Sales Data\SalesData_1.xlsx")
        , null, true),
    SalesDataTable_Table = 
        Source{[Item="SalesDataTable",Kind="Table"]}[Data]
in
    SalesDataTable_Table

There’s nothing really to notice here except that the code uses File.Contents() to get the data from a single file – I’m not using Folder.Contents().

However, once the report has been published only one data source needs to be set up in the On Premises Data Gateway for it to refresh successfully, even though the report connects to three different files. Here’s a screenshot of the gateway data source I set up in the Power BI service:

image

Two things to point out:

  • The data source type is set to Folder
  • The full path property is set to the path of the folder that the files used by the report are in, ie C:\Sales Data

Setting up a single gateway data source for a folder is obviously a much better option than setting up multiple data sources for all the files in the folder. Did everyone else know this but me? I guess this is all related to the inheritance of data privacy settings that I blogged about here.

Data Privacy Settings In Power BI/Power Query, Part 5: The Inheritance Of Data Privacy Settings And The None Data Privacy Level

Something I didn’t understand at all when I started writing this series was how the “None” data privacy level worked. Now, however, the ever- helpful Curt Hagenlocher of the Power Query dev team has explained it to me and in this post I’ll demonstrate how it behaves and show how data privacy levels can be inherited from other data sources.

Let’s go back to the original example I used in part 1 of this series where I showed how data from an Excel workbook can be combined with data from SQL Server, and how the data privacy settings on each data source determine whether query folding takes place or not (I suggest you read that post before continuing to get some background). Now, imagine that the Excel workbook is in a folder called C:\Data Privacy Demo, and a query called FilterDay is used to get data from it:

let
    Source = 
	Excel.Workbook(
		File.Contents(
		"C:\Data Privacy Demo\FilterParameter.xlsx"
		)
	, null, true),
    FilterDay_Table = 
	Source{[Item="FilterDay",Kind="Table"]}[Data],
    ChangedType = 
	Table.TransformColumnTypes(
		FilterDay_Table,
		{{"Parameter", type text}}
	),
    Output = 
	ChangedType{0}[#"Parameter"]
in
    Output

This query gets the name of a weekday from a table in the workbook, for example the text “Friday”:

image

When this query is referenced in a second query that uses the day name to filter the data in a table in SQL Server, like so:

let
    Source = Sql.Databases("localhost"),
    DB = Source{[Name="Adventure Works DW"]}[Data],
    dbo_DimDate = DB{[Schema="dbo",Item="DimDate"]}[Data],
    RemovedColumns = Table.SelectColumns(dbo_DimDate,
        {"DateKey", "EnglishDayNameOfWeek"}),
    FilteredRows = Table.SelectRows(RemovedColumns, 
        each ([EnglishDayNameOfWeek] = FilterDay))
in
    FilteredRows

…and the query is run for the first time, then you will get prompted for credentials to access SQL Server and after that you’ll get prompted to set data privacy levels on both data sources used:

image

The dropdown boxes in the second column allow you to set the data privacy settings for each data source, but look at the data sources listed in the first column. There are two things to point out:

  • The data sources the two queries are accessing are the DimDate table in the Adventure Works DW database on localhost, and the file C:\Data Privacy Demo\FilterParameter.xlsx. However you’re not being prompted to set data privacy levels on those exact data sources, you’re being prompted to set data privacy levels on the localhost instance and the c:\ drive
  • The data source names are displayed in dropdown boxes, so there are other options to select here

Clicking each dropdown box is revealing:

image

image

For the SQL Server database you can set the data privacy level at two places: the localhost instance (the default), or the Adventure Works DW database on that instance. For the Excel workbook you get set the data privacy level at three places: the c:\ drive (the default), the folder c:\Data Privacy Demo that the Excel workbook is in, or the Excel workbook itself.

Let’s say you accept the defaults and set the data privacy settings to Public on localhost and the c:\ drive:

image

As you would expect after reading part 1 of this series, the query runs and query folding takes place:

image

image

Now, let’s say you copy the Excel file up to the root of the c:\ drive and rename it to filterparameter2.xlsx, then update the FilterDay query above to load data from this new Excel file instead:

let
    Source = 
	Excel.Workbook(
		File.Contents(
		"C:\FilterParameter2.xlsx"
		)
	, null, true),
    FilterDay_Table = 
	Source{[Item="FilterDay",Kind="Table"]}[Data],
    ChangedType = 
	Table.TransformColumnTypes(
		FilterDay_Table,
		{{"Parameter", type text}}
	),
    Output = 
	ChangedType{0}[#"Parameter"]
in
    Output

 

At this point, when you click the Data Source Settings button and look at the permissions for the file c:\filterparameter2.xlsx you will see that the privacy level is set to None:

image

However, it behaves as if it has a data privacy level of Public: the second query that gets data from SQL Server runs successfully, query folding still takes place and you are not prompted to set a data privacy level for this data source. Why?

The “None” data privacy level means that no privacy level has been set for this exact data source. However, when this happens the engine checks to see if a data privacy level has been set for the folder that this file is in and then for all folders up to the root. In this case, since the data privacy level has been set to Public for the c:\ drive, all files in all folders on that drive that have a data privacy level set to None (like this one) will inherit the c:\ drive’s setting of Public:

image

The same goes for databases on a SQL Server instance: they can inherit the data privacy settings set for the instance. The same is also true for web services, where data privacy settings can be set for different parts of a URL; for example, here’s the list of options for a call to the https://data.gov.uk/api/3/action/package_search web service described in part 2 of this series:

image

The general rule is that the engine looks for permissions for the exact data source that it’s trying to access, and if none are set then it keeps looking for more general permissions until it runs out of places to look.

In my opinion, I don’t think the way the “None” privacy level and inheritance works is very clear right now – it makes sense now I’ve had it explained to me, but the UI does nothing to help you understand what’s going on. Luckily it sounds like the dev team are considering some changes to make it more transparent. I would like to see the fact that data privacy levels have been inherited for a data source, and where they have been inherited from, called out in the Edit Permissions dialog.

Data Privacy Settings In Power BI/Power Query, Part 3: The Formula.Firewall Error

In the first two parts of this series (see here and here) I showed how Power BI/Power Query/Excel Get & Transform’s data privacy settings can influence whether query folding takes place or even whether a query is able to run or not. In this post I’m going to talk about the situations where, whatever data privacy level you use, the query will not run at all and you get the infamous Formula.Firewall error.

I’ll admit I don’t understand this particular topic perfectly (I’m not sure anyone outside the Power Query dev team does) so what I will do is explain what I do know, demonstrate a few scenarios where the error occurs and show how to work around it.

Assume you have the two data sources described in my previous posts: an Excel workbook that contains just a single day name, and the DimDate table in SQL Server that can be filtered by the day name from Excel. Let’s also assume that both data sources have their data privacy levels set to Public. The following query, called FilterDay, loads the data from Excel and returns a text value containing the day name:

let
    Source = 
	Excel.Workbook(
		File.Contents("C:\FilterParameter.xlsx"), 
	null, true),
    FilterDay_Table = 
	Source{[Item="FilterDay",Kind="Table"]}[Data],
    ChangedType = 
	Table.TransformColumnTypes(
		FilterDay_Table,
		{{"Parameter", type text}}
	),
    Output = 
	ChangedType{0}[#"Parameter"]
in
    Output

image

Now, look at the following query:

let
    Source = 
	Sql.Database(
		"localhost", 
		"adventure works dw",
		[Query="select DateKey, EnglishDayNameOfWeek 
		from DimDate"]),
    FilteredRows = 
	Table.SelectRows(Source, 
		each ([EnglishDayNameOfWeek] = FilterDay)
	)
in
    FilteredRows

It filters the contents of the DimDate table and only returns the rows where the EnglishDayNameOfWeek column matches the day name returned by the FilterDay query. Notice that there are two steps in the query, Source (which runs a SQL query) and FilteredRows (which does the filtering). Here’s the output:

image

As you can see from the screenshot, the query runs. In fact it runs whatever data privacy settings you have set on both the data sources, although it’s worth pointing out that if you use your own SQL in an M query (as I do in this case) this stops query folding in all subsequent steps, as described here.

Now take a look at the following version of the query:

let
    Source = 
	Table.SelectRows(
		Sql.Database(
			"localhost", 
			"adventure works dw",
			[Query="select DateKey, 
				EnglishDayNameOfWeek 
				from DimDate"]
		), 
		each ([EnglishDayNameOfWeek] = FilterDay)
	)
in
    Source

The important difference here is that there is now one step in this query instead of two: the query and the filtering take place in the same step. Even more importantly, regardless of the data privacy settings, the query fails with the error:

Formula.Firewall: Query ‘DimDate With Native Query Single Step Fails’ (step ‘Source’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.

image

The problem here is that the Power Query engine is not allowed to access two different data sources originating from different queries in the same step – as far as I understand it this is because it makes it too hard for the engine to work out whether a step connects to a data source or not, and so which data privacy rules should be applied.

At this point you might think that it’s straightforward to break your logic up into separate steps, as in the first example above. However there are some situations where it’s not so easy to work around the problem. For example, consider the following query:

let
    Source = 
	Sql.Database(
		"localhost", 
		"adventure works dw",
		[Query="
		 select DateKey, EnglishDayNameOfWeek 
		 from DimDate 
		 where 
		 EnglishDayNameOfWeek='" & FilterDay & "'" 
		]
	)
in
    Source

In this example I’m dynamically generating the SQL query that is being run and passing the name of the day to filter by into the WHERE clause. In the two previous examples the query that was run had no WHERE clause and the filtering on day name took place inside Power BI – in this case the filtering is happening inside the query, so in order to generate the WHERE clause I have to refer to the value that the FilterDay query returns in the same step. Therefore, this query also gives the same Formula.Firewall error seen above.

How can you work around this? Well, the following version of the query that attempts to reference FilterDay in a separate step doesn’t work either:

let
    DayAsStep = FilterDay,
    Source = 
	Sql.Database(
		"localhost", 
		"adventure works dw",
		[Query="
		 select DateKey, EnglishDayNameOfWeek 
		 from DimDate 
		 where 
		 EnglishDayNameOfWeek='" & DayAsStep & "'" 
		]
	)
in
    Source

 

Luckily, it turns out that if you use the Value.NativeQuery() function to run your query instead you can avoid the error. As I showed here, you can use this function to pass parameters to SQL queries. If you generate the record containing the parameters for the query as a separate step (called ParamRecord here), like so:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    ParamRecord = [FilterParameter=FilterDay],
    Query = Value.NativeQuery(
                Source, 
                "select DateKey, EnglishDayNameOfWeek 
		from DimDate 
		where 
		EnglishDayNameOfWeek=@FilterParameter",
                ParamRecord)
in
    Query

Then the query runs successfully.

There is another way to avoid the error. In all the examples above I have two queries: one to get data from Excel, one to get filtered data from SQL Server. If these two queries are combined into a single query, it doesn’t matter if data from different data sources is accessed in the same step. So, for example, unlike all of the queries above the following query does not reference any other queries; instead it gets the day name from the Excel workbook in the ExcelSource step and then runs the dynamic SQL query in the SQLSource step, and runs successfully:

let
    ExcelSource = 
	Excel.Workbook(
		File.Contents("C:\FilterParameter.xlsx")
	, null, true),
    FilterDay_Table = 
	ExcelSource{[Item="FilterDay",Kind="Table"]}[Data],
    ChangedType = 
	Table.TransformColumnTypes(FilterDay_Table,
		{{"Parameter", type text}}),
    FilterDayStep = 
	ChangedType{0}[#"Parameter"],
    SQLSource = Sql.Database(
	"localhost", 
	"adventure works dw",
	[Query="
		select DateKey, EnglishDayNameOfWeek 
		from DimDate 
		where 
		EnglishDayNameOfWeek='" 
		& FilterDayStep & 
		"'" ])
in
    SQLSource

Clearly the M engine doesn’t get confused about accessing data from different sources in the same step if those data sources are created in the same query.

Of course you can avoid the Formula.Firewall error and make query folding happen as often as possible by turning off data privacy checks completely in the Options dialog. This will be the subject of the next post in this series.

White Paper On “Planning A Power BI Enterprise Deployment”

I’m pleased to announce that a white paper I co-authored with Melissa Coates on “Planning a Power BI enterprise deployment” has now been published. You can download it from the Power BI white papers site here: https://aka.ms/pbienterprisedeploy

Melissa has already blogged about the white paper here.

Topics covered include the different ways that Power BI can be deployed (as a self-service BI tool or as a corporate BI tool); licensing; preparing data for use in Power BI; choosing a data storage mode (import vs Live connections to SSAS vs DirectQuery); data refresh and the on-premises gateway; best practices for report development; collaboration and sharing (covering apps and content packs); options for consuming reports and data published to Power BI; and security, compliance and administration. If that sounds like a lot, it is: it’s 105 pages long!

It was a real pleasure working with Melissa on this, and I’d also like to thank Meagan Longoria for reviewing it. I’m also extremely grateful to Adam Wilson and an army of Microsoft employees for providing information, answering our questions and correcting our mistakes.