Colour Names Supported In Power BI Conditional Formatting

When I read about the new conditional formatting by values feature in the August 2018 release of Power BI Desktop, my first thought was to write a blog post on how DAX can  be used to generate the hex values for colours – but then Daniil Masyluk wrote an excellent post on that (and more) yesterday. I then got curious about what colour names are supported when you use a text name rather than a hex code, and Amanda Cofsky of the dev team told me that the official list is the one here:

https://www.w3schools.com/cssref/css_colors.asp

…although there are other names that might work, assuming they don’t contain punctuation. I loaded that list into Power BI and built the following report:

image

I’ve certainly learnt a few new colour names from this!

Analysis Services Import Performance Improvements In The August 2018 Release Of Power BI

While I was playing around with the new release (August 2018) of Power BI Desktop I noticed there was an undocumented change: similar to the OData improvements I blogged about here, there is a new option in the AnalysisServices.Database() and AnalysisServices.Databases() M functions that turns on a newer version of the MDX generation layer used by the Power Query engine. Like the OData improvements it is an option called Implementation=”2.0”, used like this:

AnalysisServices.Databases(
	"localhost", 
	[
		TypedMeasureColumns=true, 
		Implementation="2.0"
	]
)

…and also, as with the OData improvements, you will need to manually edit any existing M queries to take advantage of this.

In fact first heard about this option in a comment on this blog post back in January, but as I was told by the dev team that it hadn’t been tested properly I didn’t blog about it. However as it is now set by default in the M code generated by the Power Query Editor, I guess it’s ready for production use. I’m told it should improve the performance of M queries that import data from Analysis Services – and I would be very interested to hear from anyone who tests this about how much of an improvement they see.

I’ve done a little bit of testing myself and I can see there are indeed some differences in the MDX generated. For example, I created an M query that connected to the Adventure Works DW database and returned all combinations of customer and day name where the Internet Sales Amount measure is greater than 100. In the old version of the MDX generation layer (ie without Implementation=”2.0” set) the following MDX query is generated:

SELECT
    {
        [Measures].[Internet Sales Amount]
    }ON 0,
    SUBSET(
        NONEMPTY(
            CROSSJOIN(
                [Date].[Day Name].[Day Name].ALLMEMBERS,
                [Customer].[Customer].[Customer].ALLMEMBERS
            ),
            {
                [Measures].[Internet Sales Amount]
            }
        ),
        4096
    )
    PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON 1
FROM(
    SELECT
        FILTER(
            CROSSJOIN(
                [Date].[Day Name].[Day Name].ALLMEMBERS,
                [Customer].[Customer].[Customer].ALLMEMBERS
            ),
            (
                NOT((
                    ISEMPTY(
                        [Measures].[Internet Sales Amount]
                    )
                OR
                    ISEMPTY(
                        100
                    )
                ))
            AND
                ([Measures].[Internet Sales Amount] > 100)
            )
        )ON 0
    FROM [Adventure Works]
)CELL PROPERTIES VALUE

 

However with the Implementation=”2.0” option set, the following query is generated:

SELECT
    {
        [Measures].[Internet Sales Amount]
    }ON 0,
    SUBSET(
        NONEMPTY(
            FILTER(
                CROSSJOIN(
                    [Date].[Day Name].[Day Name].ALLMEMBERS,
                    [Customer].[Customer].[Customer].ALLMEMBERS
                ),
                (
                    NOT(ISEMPTY(
                        [Measures].[Internet Sales Amount]
                    ))
                AND
                    ([Measures].[Internet Sales Amount] > 100)
                )
            ),
            {
                [Measures].[Internet Sales Amount]
            }
        ),
        4096
    )
    PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON 1
FROM [Adventure Works] CELL PROPERTIES VALUE

The difference between the two is that the first query uses a subselect to do the filtering whereas the second does not; subselects in MDX are not necessarily bad, but I generally try to avoid using them unless I need to. There may be other differences in the way the MDX is generated in the new version but I haven’t had a chance to do any detailed testing.

Comparing The Performance Of CSV And Excel Data Sources In Power Query

My posts from two weeks ago (see here and here) on using Process Monitor to troubleshoot the performance of Power Query queries made me wonder about another question: how does the performance of reading data from CSV files compare to the performance of reading data from Excel files? I think most experienced Power Query users in either Power BI or Excel know that Excel data sources perform pretty badly but I had never done any proper tests. I’m not going to pretend that this post is a definitive answer to this question (and once again, I would be interested to hear your experiences) but hopefully it will be thought-provoking.

To start off, I took the 153.6MB CSV file used in my last few posts and built a simple query that applied a filter on one text column, then removed all but three columns. The query ran in 9 seconds and using the technique from my last post I was able to draw the following graph from a Process Monitor log file showing how Power Query reads data from the file:

image

Nothing very surprising there. Then I opened the same CSV file in Excel and saved the data as an xlsx file with one worksheet and no tables or named ranges; the resulting file was only 80.6MB. Finally I created a duplicate of the first query and pointed it to the Excel file instead. The resulting query ran in 59 seconds – around 6 times slower! Here’s a comparison with the performance of this query with the first query:

image

The black line in the graph above is the amount of data read (actually the offset values showing where in the file the data is read from, which is the same thing as a running total when Power Query is reading all the data) from the Excel file; the green line is the amount of data read from the CSV file (the same data shown in the first graph above). A few things to mention:

  • Running Process Monitor while this second query was refreshing had a noticeable impact on its performance – in fact it was almost 20 seconds slower
  • The initial values of 80 million bytes seem to be where data is read from the end of the Excel file. Maybe this is Power Query reading some file metadata? Anyway, it seems as though it takes 5 seconds before it starts to read the data needed by the query.
  • There’s a plateau between the 10 and 20 second mark where not much is happening; this didn’t happen consistently and may have been connected to the fact that Process Monitor was running

In any case you don’t need to study this too hard to understand that the performance of reading data from an xlsx format Excel file is terrible compared to the performance of reading data from a CSV. So, if you have a choice between these two formats, for example when downloading data, it seems fair to say that you should always choose CSV over xlsx.

%d bloggers like this: