I see a lot of Power BI datasets in the course of my work, and as a result I see a lot of datasets that look like this:
What’s wrong with this picture? Look at the names:
- The tables and columns have the same names that they had in the data source, in this case a SQL Server database. Note the table name prefixes of “Dim” for dimensions and “Fact” for fact tables.
- The column and measure names either don’t have spaces or use underscores instead of spaces.
- What on earth does the measure name _PxSysF even mean?
Datasets like this seem to work perfectly well and are often built by professional BI developers but these names are a mess – and this can cause a lot of problems later on.
This is an issue I’ve been moaning about for years, but I wanted to blog about it again because it’s just as important today for Power BI as it was ten years ago for Analysis Services. My advice is to make naming a top priority when you’re building a dataset. If you have already published your dataset it will be difficult to change the names you’ve used – if you do so, you risk breaking reports and calculations that you and other people have built on it – so this is something that should be dealt with as early in the development process as possible.
In my opinion there are three things to consider when naming a table, column or measure:
- You should use human-readable names rather than any kind of technical naming convention, with spaces where you would expect to have spaces and all vowels present. For example, that means having names like [Sales Amount] rather than [Sales_Amount] or [SlsAmt]; similarly, prefixes like “Dim” and “Fact” might make sense to you but won’t mean anything to your users.
- You should use the correct business terminology, the terminology that your users will know and understand, rather than just make up some names that seem appropriate. Your users might not understand what [Total Sales Value] is if the generally accepted term is [Net Sales Amount].
- The names you use should be consistent across all datasets that contain the same data. That means that if you have a table called Sales in one dataset it should be called Sales in every other dataset that you build from the same data source, not Transactions, FactSales or something else.
This advice might be controversial to some people, especially those with a database background, but to me designing a dataset is more like designing a user interface rather than designing a database. Indeed the consequences of a dataset with no thought put into naming are similar to the consequences of a poor user interface:
- If your end users don’t understand what your report is trying to show, what’s the point of even building a report?
- Reuse of datasets is a good thing and ideally any dataset you build should be easy for other people to build reports from. If those other people don’t understand what the names of your tables, columns and measures mean they won’t be able to build new Power BI reports from your dataset using Live connections or use Analyze in Excel.
- It’s not just other people who are building reports from your dataset that you need to think about. If you want to use the new visual personalisation feature or Q&A then you’ll only be able to if your end users can understand the names you’ve used.
- Even if you’re the only person building datasets and reports in your organisation, you owe it to yourself to make your code as readable as possible and to save yourself the effort of having to rename columns and measures when you use them in a visual.
That’s enough ranting for now. Good naming is only one part of good data modelling but it’s something that’s too often neglected!