Power BI Dataset Refresh Scheduling Using Outlook And Power Automate

When the new “Refresh a dataset” action for Power Automate (formerly Flow) was released last year I couldn’t help thinking there must be something really cool you could do with it – I just didn’t know what. There are lots of great resources explaining the basics (see Jon Levesque’s video here for example) and Adam Saxton did a nice demo here using the “When an item is modified” trigger to show how to refresh a dataset when a value in a SQL Server table is updated, but that’s it. So I got thinking about the types of problem it could solve and the fun I could have with it…

While Power BI’s scheduled refresh functionality is great, it doesn’t give you as much flexibility as you might need. For example:

  • You might want to schedule your dataset to refresh only on weekdays, not weekends, and you might also want to cancel refresh on certain days like public holidays. You might also only want to refresh a dataset on a monthly basis or at less than the half-hourly granularity that the UI allows. Why? Perhaps because it’s important to minimise the load you put on your source systems; it’s also the case that for many cloud data sources the more data you read, the more you pay.
  • If you have a lot of datasets to refresh you might want to control which datasets are refreshing in parallel, again to reduce the load on your data sources and if you’re using Premium, to reduce the load on your capacity. It’s hard to get an overview of when all your refreshes are scheduled in the Power BI Portal and manage what’s happening when.

The ideal way to view when multiple events are scheduled is a calendar and we’ve got great calendar functionality in Outlook. What if you could schedule refresh of your datasets from a calendar in Outlook? It turns out to be easier than you might think! Here’s how.

The first thing I did was create a new calendar in Outlook called Power BI Refreshes:

Calendar

In this calendar I created appointments (either recurring or one-off) for every dataset refresh:

Cal2

For each appointment, I entered the unique identifier of the dataset in the Title and the unique identifier of the workspace in the Location like so:

Event

You can find these unique identifiers by going to the Settings screen for your dataset in the Power BI Portal and looking at the url:

Settings

Cathrine Wilhelmsen has more details on finding these ids here.

Last of all, I created a very simple Flow in Power Automate:

Flow

The “When an upcoming event is starting soon” trigger is fired when each of the appointments on the Power BI Refreshes calendar is about to start. It then passes the Location and Subject from the event – which of course contain the ids the workspace and dataset to be refreshed – to the Refresh a dataset action, which does the refresh.

This isn’t something I recommend putting into production but I think it’s very interesting as a proof-of-concept. I guess Logic Apps would be a more robust alternative than Power Automate and I would want to be 100% sure that events fired when I was expecting them to fire, so some thorough testing would be needed. I’m not experienced enough with Power Automate/Logic Apps to know if I’m doing the right thing, to be honest. I also feel like using ids in the meeting title and location is a bit hacky and there must be a nicer way of handling this.

On the Power BI side, it’s worth remembering that when a refresh is kicked off in Power BI the actual refreshing only starts when the Power BI Service has the required resources available, and especially in Shared capacity this can involve a wait of several minutes. What’s more the “Refresh a dataset” action does not know whether the refresh it kicks off succeeds or fails; I guess if you wanted to handle retries or notifications on failure then you would need to call the Power BI API get the refresh history of a dataset – there’s no built in action to do it, but it’s possible with a Power Automate custom connector.

If you have any thoughts about this, ideas on how to make this better or if you do put something like this into production, let me know – I would love to hear from you!

Fifteenth Blog Birthday

Every year, on the anniversary of the first-ever post on this blog, I write a post summarising my thoughts on what’s happening in the world of Microsoft BI and what I’m up to professionally.

This year has seen bigger changes than most: in June I closed down my company, gave up being self-employed after thirteen years and took a job at Microsoft. I’m pleased to say that I don’t regret this decision at all and I’m really enjoying my new job. The work is more interesting (as a Power BI fanboy, what could be better than working for the Power BI team?), my colleagues are great, I’m travelling a lot less and as a result I’m feeling a lot more healthy and relaxed and I’m able to spend more time with my family. I am earning a bit less but overall there’s no question that going to Microsoft has been a change for the better. I’m not surprised that so many MVPs are doing the same thing these days: Microsoft is a great place to work right now.

The thing is that even after explaining my reasons I still get asked by people why I moved, as if I’m hiding something, because it seems the opposite of what most people hope to do – it’s a bit like the famously boring British Prime Minister of the 1990s, John Major, who as a child ran away from the circus to become an accountant. I guess there are a lot of people who dream about leaving corporate culture behind to be a freelancer. Now don’t get me wrong, I really enjoyed my previous life, but although freelancing has its benefits it’s not all great and there are many positives about working for a big company as well; I’m particularly happy that I don’t have to chase unpaid invoices any more, for example. One of these days I should really get round to writing about my experiences and how it’s possible to make a living as an independent Power BI consultant and trainer, to help anyone who’s interested in making the the jump to self-employment…

One last thing to reflect on is what I’ve learned in my first six months at Microsoft. I don’t think I know Power BI in any more depth than before I joined – it turns out there is no secret store of inside information that you get access to when you join the team – and all the PMs and devs that I used to bother with my questions before answer my questions with the same level of detail now. It just goes to show how well we treat our MVPs and customers! I do think I have a much broader understanding of how Power BI works and its architecture, though, and I now have a longer list of PMs and devs who I can ask questions. I also have a much better appreciation for how difficult some features are to implement and how tricky it is to prioritise features; next time you get frustrated at the lack of a particular feature in Power BI you can be reassured by the thought that it’s almost certain that someone on the team already knows about the issue and is working on it. Blogging is easier in some ways and more difficult in others: as I said, I now have more access to interesting information but not all of it is bloggable or indeed useful to anyone outside Microsoft, and there are things that I would have blogged about in the past that I won’t write about now (this is a good example) because they aren’t really intended for use by customers or practices that should be encouraged. Finally, getting to work on some of the biggest, most complex Power BI implementations in the world is an eye-opener. Sure, I used to get to work with customers who were doing interesting things before but the customers I work with now are at a different level, and from a technical point of view it’s really exciting.

So yes, life is good and I’m looking forward to 2020 and all the great things that planned for Power BI next year. Thanks for reading!

Fourteenth Blog Birthday

Every year, on the anniversary of the first-ever post on this blog, I write a post summarising my thoughts on what’s happening in the world of Microsoft BI and what I’m up to professionally.

This year, rather than go on about how Power BI is taking over the world (which we all know already), I thought it might be interesting to consider how the focus of this blog – and by implication the focus of my work – has changed over the last few years by looking at the viewing statistics of some of my most popular posts.

As you probably know, for a long time the only product I cared about was Analysis Services Multidimensional and MDX: it was pretty much all I blogged about and the only thing I did consultancy and training on. The following graph shows how the number of hits on four of the most-viewed posts on this subject from 2014 to 2018: Aggregating the result of an MDX calculation using scoped assignments; Joining the results of two MDX queries together; Last Ever Non Empty – a new, fast MDX approach; and One Cube vs Multiple Cubes.

image

 

None of these posts are, in technical terms, out of date but the downward trend is the same for all of them. The decline in traffic is matched by the decline in demand for consultancy and training on SSAS MD and MDX. While I still spend around 20% of my time doing SSAS MD and MDX consultancy, I do very little training on them at all these days – I guess because no-one is building new solutions using SSAS MD, although there are a still a large number of SSAS MD solutions in production that need maintenance and improvement. I expect the launch SSAS MD in the cloud as part of Power BI Premium will lead to a spike in the amount of work I do on it as I help my customers migrate but that will only be short-lived.

In contrast, look at the trend for four of my most-popular Power Query/M related posts: Referencing individual cell values from tables in Power QueryWorking with web services in Power Query; Creating tables in Power BI/Power Query M code using #table(); and Web.Contents(), M functions and dataset refresh errors in Power BI. These are not necessarily new posts (the earliest dates from 2014) but again they are all still technically relevant and the steep increase in the amount of hits over the last few years that they receive is clear:

image

Power Query and M is a bit of a niche topic, though; right now my most popular posts are on general Power BI data modelling and DAX – a topic I don’t actually blog about all that often, but which I nevertheless spend a lot of consultancy and training time on. The following graph shows the trends for the posts Comments and descriptions in DAX; Creating current day, week, month and year reports in Power BI using bi-directional cross-filtering and M; Dynamic chart titles in Power BI; and (although I’ve never really understood the popularity of this one) Using DateDiff() to calculate time intervals in DAX.

image

Perhaps I should blog about this more? The reason I don’t is twofold: first, there are a lot of people out there such as Marco and Alberto who specialise in DAX, have covered all the obvious angles and do a much better job than I ever could; second, my philosophy has always to blog about what I’m interested in and excited about, and frankly I have always enjoyed Power Query and M more than DAX.

One last graph is needed for context, showing the most popular posts from the three graphs above next to each other. The following graph shows how Aggregating the result of an MDX calculation using scoped assignments, Working with web services in Power Query and Dynamic chart titles in Power BI compare against each other:

image

It goes to show how the “Dynamic chart titles” post is now much more popular that the “Aggregating the result of an MDX calculation” post was, even at the peak of its popularity. I guess Power BI is a safe bet for my future.

Thirteenth Blog Birthday

Every year, on the anniversary of the first-ever post on this blog, I write a post summarising my thoughts on what’s happening in the world of Microsoft BI and what I’m up to professionally. Here I am thirteen years and 1232 posts on from where I started…

First of all, an announcement. For several years now I’ve owned and operated two companies: Crossjoin Consulting, which I use for my own SSAS and Power BI consultancy and private training activities, and from which I make most of my income; and the much smaller Technitrain, which I use for running public training courses in London taught by me and various other notable SQL Server and Microsoft BI trainers. After some deliberation I’ve decided to shut down Technitrain, stop running public courses taught by other people, and focus completely on delivering my own consultancy and training through Crossjoin. There are various reasons for doing this but mostly it’s because organising and marketing public courses is very time-consuming and I’d like to free up that time for other purposes. There’s one final public Power BI course on the Technitrain site and after that all public courses that I teach will appear on the Crossjoin site. I’d like to say thank you to everyone who has taught for Technitrain or who has attended a course over the years.

What do I want to do with the time this frees up? There’s so much Power BI work out there I don’t think I’ll have any trouble keeping myself busy, but I’d really like to build more Power BI custom data connectors. I’ve built a few already for my customers and published another one I built for fun here; when this functionality comes out of preview (which should be soon I hope) and gets added to SSAS and Excel I think there will be a lot of SaaS providers who will want custom data connectors built for their platforms.

One other side-effect of this change is that I have changed my Twitter username to @cwebb_bi. If you already follow me on Twitter then you won’t need to do anything except remember to use the new username when you mention or DM me. If you don’t already follow me, please do so!

Thinking about Microsoft BI as a whole, it’s been another massively successful year for Power BI. Its popularity has now extended beyond small/medium businesses attracted by the price and dyed-in-the-wool Microsoft shops and I have seen several cases where it has replaced Tableau and Qlik based on its features alone. Another few years of this kind of growth and Microsoft will have the kind of domination in BI that it has enjoyed with Windows and Office.

I’ve also been pleased to see how Azure Analysis Services has grown this year too. I particularly like how the dev team have focussed on adding new features that take advantage of the cloud to solve problems easily that are much harder to solve on-premises – the new auto scale-out feature is a great example of this. It will be interesting to see if we get a Multidimensional version of Azure Analysis Services in 2018 – if we do it will be a vote of confidence in a platform whose users are wondering whether it will every see any investment ever again.

Finally, thinking about my other great enthusiasm – Power Query – it seems like things have gone a bit quiet recently. Apart from custom data connectors there hasn’t been much in the way of new functionality added in 2017. I suppose it’s a mature platform now and to be honest I struggle to think how it could be improved; parameters need some work for sure and there are a lot of people complaining about performance in Excel, which can be awful if you’re working with lots of complex queries. Maybe also the web-based interface seen in the CDS preview will appear in other places?

Anyway, time to sign off and get back to enjoying a few more days of holiday before work starts again. Thanks for reading and I hope you all have a great 2018!

PS For those of you looking for video training on Power BI, DAX, SSAS and MDX you can get 12% off all subscriptions at Project Botticelli by clicking here and using the discount code CHRIS12BYE2017

Exploring The New SSRS 2017 API In Power BI

One of the new features in Reporting Services 2017 is the new REST API. The announcement is here:

https://blogs.msdn.microsoft.com/sqlrsteamblog/2017/10/02/sql-server-2017-reporting-services-now-generally-available/

And the online documentation for the API is here:

https://app.swaggerhub.com/apis/microsoft-rs/SSRS/2.0

Interestingly, the new API seems to be OData compliant – which means you can browse it in Power BI/Get&Transform/Power Query and build your own reports from it. For example in Power BI Desktop I can browse the API of the SSRS instance installed on my local machine by entering the following URL:

http://localhost/reports/api/v2.0

…into a new OData feed connection:

image

image

image

This means you can build Power BI reports on all aspects of your SSRS reports (reports on reports – how meta is that?), datasets, data sources, subscriptions and so on. I guess this will be useful for any Power BI fans who also have to maintain and monitor a large number of SSRS reports.

However, the most interesting (to me) function isn’t exposed when you browse the API in this way – it’s the /DataSets({Id})/Model.GetData function. This function returns the data from an SSRS dataset. It isn’t possible to call this function direct from M code in Power BI or Excel because it involves making a POST request to a web service and that’s not something that Power BI or Excel support. However it is possible to call this function from a Power BI custom data extension – I built a quick PoC to prove that it works. This means that it would be possible to build a custom data extension that connects to SSRS and that allows a user to import data from any SSRS dataset. Why do this? Well, it would turn SSRS into a kind of centralised repository for data, with the same data being shared with SSRS reports and Power BI (and eventually Excel, when Excel supports custom data extensions). SSRS dataset caching would also come in handy here, allowing you to do things like run an expensive SQL query once, cache it in SSRS, then share the cached results with multiple reports both in SSRS and Power BI. Would this really be useful? Hmm, I’m not sure, but I thought I’d post the idea here to see what you all think…

12th Blog Birthday

Today is the 12th anniversary of the first post on this blog, and as in previous years I’m going to use this as an opportunity to sum up my thoughts over what’s been going on in my corner of the Microsoft BI world in the last twelve months.

Power BI

I think it’s fair to say that 2016 was the year that Power BI became the big commercial success that many of us hoped it would be. After the achingly slow uptake of Power Pivot and the failure of the original Office 365 Power BI it’s great to see Microsoft BI with a hit on its hands. Many of my existing customers have started using it alongside the rest of the SQL Server BI stack, especially SSAS, because it’s much easier to build reports and share them via the browser or mobile devices than with SSRS or Excel. I’ve also started working with new type of customer, one that I’ve never worked with before: small and medium organisations (including many not-for-profits) who have Office 365 but no existing BI solution, the kind of organisation that does not have the money or resources for a SQL Server BI solution or indeed any other kind of traditional BI solution. This, I believe, is where the real opportunity for Power BI lies and where the majority of the new growth will come from.

Apart from my own customers, there’s plenty of other evidence for the success of Power BI. The energy of the Power BI community, on forums and at user groups, is amazing – and once again, the people that I meet at user groups are completely different to the crowd you get at a normal SQL Server user group. The analysts love it too: for example, Microsoft is now in the Leaders section of the Gartner Magic Quadrant. There’s also the fact that competitors like Tableau have started attacking Power BI in their marketing, so I guess they must consider it a major threat.

Why has it been such a success? The underlying technology is great, but then again the technology was always great. The pace of change is incredible and it’s good to see Microsoft throwing its vast resources behind a product with some potential, rather than another Zune or Windows phone. There’s still some catching up to do but at this rate any existing gaps will have been closed by the end of 2017. The willingness to listen to customer feedback and act on it is refreshing. The Excel/Power Query/Power Pivot and SSAS crossover adds a angle that the competition doesn’t have. Finally, the licensing is almost perfect: it’s simple (compared to the usual thousands of SKUs that Microsoft usually comes up with) and cheap/free, although organisations with thousands of users who all need Pro subscriptions find the costs escalate rapidly; I’d like to see special deals for large numbers of users, and some recognition that many users who need to see Pro-level reports don’t need to create reports using these features. I know Microsoft has already heard this from a lot of people, though, and has taken it on board.

Probably the only criticism that I can make that Microsoft doesn’t seem to be actively addressing is the fact that the data visualisation functionality is relatively weak. If you know what you’re doing and you have the patience, you can create good-looking reports. For people like me who have minimal artistic talent and limited patience the experience of building reports can be frustrating. There are some features like small multiples that I can’t believe are still not implemented in the core product, and nothing to help users to follow good data visualisation practice. R visuals and custom visuals help fill the gap (I was quite impressed by this one by Business Solution Group, for example, which isn’t available in the gallery) but really Microsoft need to put some more thought into this area.

Analysis Services

There’s been a lot of good news in the world of Analysis Services this year too. SSAS Tabular 2016 dealt with a lot of the shortcomings that dogged it in 2012 and 2014: a much faster and less buggy development experience; many-to-many relationships supported using bi-directional cross filtering; and powerful new DAX functions and features like variables. SSAS Tabular v.next promises even more great new features such as the integration of the M language. These changes and the fact it’s now available in Standard Edition mean that Tabular should be the default choice over Multidimensional for almost all new SSAS projects.

Sadly, it looks like the neglect of Multidimensional will continue for the foreseeable future. I stopped being angry about this a long time ago and I understand that Microsoft need to concentrate their resources on SSAS Tabular and Power BI, but a lot of Multidimensional customers are now wondering where they stand. Either Microsoft needs to show some commitment to Multidimensional by adding new features – it wouldn’t take much to make a difference – or add features to Tabular that make it possible for more Multidimensional users to migrate over to it, for example equivalents to Scope statements or calculated members on non-Measures dimensions.

Last of all, Azure SSAS opens up a lot of exciting new possibilities for both on-prem SSAS users as well as Power BI users. Kasper does a great job of summing them up here and I won’t repeat what he has to say; once again I’m seeing a lot of interest from my customers and I’m sure I’ll be helping a few to migrate to the cloud very soon. The pricing seems a bit excessive at the moment, even when you take into account the ability to pause servers, and I hope it changes before RTM. Also it’s SSAS Tabular only at this stage but support for Multidimensional is by far the top-voted request on the feedback forum, with more than five times as many votes as the next highest request, so maybe this will be Microsoft’s opportunity to show some love to the Multidimensional world?

If I Could Have New Features In SSAS Multidimensional, What Would They Be?

Indulge me for a moment, please. Let’s imagine that somewhere in Microsoft, someone is planning for SQL Server v.next and is considering investing in new features for SSAS Multidimensional (don’t laugh – I wouldn’t be writing this post if I didn’t think it was a possibility). What features should they be?

Before I answer that question, it’s worth pointing out that despite what you might think there has been some investment in SSAS Multidimensional over the last few years. This post lists what was new in SSAS 2012 Multidimensional; since then support for DAX queries has been added and, umm, the new Divide() function. This must have been a lot of work for someone – but why does it get overlooked? One reason: none of these changes have made much difference to the ordinary SSAS Multidimensional developer’s life. DAX query support is great if you’re one of the few people that uses the SharePoint version of Power View; shockingly, it still doesn’t work in Excel 2013 Power View yet (though I guess it will be the way the new Power BI connects to on-prem Multidimensional). NUMA support is great if you work for an investment bank and have vast amounts of data and a high-spec server, but that’s only about 0.1% of the installed base.

So from this we can learn that the main consideration when choosing new features to implement should be that they should be relevant to the majority of SSAS Multidimensional developers, otherwise they’ll be ignored and MS may as well have not bothered doing anything. To that we can add these other considerations:

  • These features should provide compelling reasons to upgrade from earlier versions of SSAS to the new version
  • While some features should be available in all editions, there should also be some features that encourage customers to upgrade from Standard Edition to Enterprise Edition
  • There are a limited resources (time and developers) available and Power Pivot/SSAS Tabular will be the priority, so only a few features can be delivered.
  • Features that are only there to support Power BI don’t count

With all of that borne in mind, here’s what I would choose to implement based on what I see as a consultant and from the popularity of particular topics on my blog.

Last-Ever Non Empty

One of the most popular posts I’ve ever written – by a gigantic margin – is this one on the last-ever non-empty problem. Given that so many people seem to come up against this, and that the MDX solution is complex and still doesn’t perform brilliantly, I think it should be built into the engine as a new semi-additive aggregation type. Since semi-additive measures are Enterprise Edition only, this would be my sole Enterprise Edition feature.

MDX Calculation Parallelism

Ever since I’ve been working with SSAS, people have always asked why the Formula Engine has been single-threaded. I understand why the SSAS dev team have ignored this question and instead concentrated on tuning specific scenarios: doing parallelism properly would be extremely difficult given the way MDX calculations can be layered over each other, and in plenty of cases it could lead to worse performance, not better. However I’m not asking for a ‘proper’ implementation of parallelism. I just want something dumb: a boolean property that you can set on a calculation that tells the Formula Engine to do this calculation on a separate thread. If it makes performance better then great; if not, then don’t set it. My guess is that even a crude implementation like this could make a gigantic difference to performance on many calculation-heavy cubes.

Drillthrough

Drillthrough is one of those features that almost everyone wants to use, but for some reason has been left in a semi-broken state ever since 2005. Here’s what needs to change:

  • It should work with calculated members. I don’t expect SSAS to understand magically how to work out which rows to display for any given MDX calculation, but I would like a way of specifying in MDX what those rows should be.
  • Those stupid, ugly column names – SSDT should let us specify readable column names and let us have complete control over the order they appear in.
  • Excel should allow drillthrough on multiselect filters.

‘Between’ Relationships

This might seem a bit of a strange choice, and I suspect it may not be easy to implement, but another problem that I come across a lot in my consultancy is the ‘events-in-progress’ problem. I’ve blogged about solving it in MDX and DAX, as have many others. I would love to see a new ‘between’ dimension/measure group relationship type to solve this. In fact, competing OLAP vendor iccube already implemented this and you can see how it works on that platform here and here. My feeling is that this would open up a massive number of modelling opportunities, almost as many as many-to-many relationships.

 

And that’s it, four features that I think could make SSAS Multidimensional v.next a must-have upgrade. I’m not so naive to believe that any or all of these will be implemented, or even that we’ll get any new features at all, but who knows? If you have any other suggestions, please leave a comment.

Thoughts On Office Sway And BI

When I first saw the announcement about Office Sway last week, I thought – well, you can probably guess what I thought. Does it have any potential for BI? After all, the Sway team are clearly targeting business users (as well as hipster designers and schoolchildren): look at the Northwest Aquarium and Smith Fashion Expansion samples, and notice that they contain tables, charts and infographics. What’s more, data storytelling is currently a very hot concept and Sway is clearly all about telling stories. Wouldn’t it be cool if you could have interactive PivotTables, PivotCharts and Power View reports from your Power BI site embedded in a Sway? It would be a much more engaging way of presenting data than yet another PowerPoint deck.

I have no idea whether any integration between Sway and Power BI is actually planned (I have learned not to get my hopes up about this type of thing), but even if it isn’t maybe someone at Microsoft will read this post and think about the possibilities… And isn’t this kind of collaboration between different teams supposedly one of the advantages Microsoft has over its competitors in the BI space?

Office Sway introductory video

 

PS I want a pink octopus costume just like the one that girl in the video has

This Is My 1000th Blog Post

Just a few months away from the tenth anniversary of my first post here, I’ve reached the milestone that is my 1000th blog post. If you’ve been with me since back then, thanks for reading! I have no idea how I managed to write so much – it’s an average of around two posts per week, which I certainly haven’t managed recently – but I suspect that the answer lies in the fact that I posted a lot of rubbish here in the early years that I’m embarrassed by now.

I can remember the day when I decided to start this blog quite well. It was just after Christmas so the office was quiet and I didn’t have much work to do; blogging was the cool new thing back in late 2004 and having discovered that Mosha had started a blog I thought it was something I should be doing too, so as not to be left behind. Microsoft had just launched its own blogging platform so I signed myself up. I didn’t think I would stick at it this long…

At first I thought I would just use it writing up solutions to common Analysis Services and MDX problems, so that I didn’t have to keep repeating myself when I was answering questions on the microsoft.public.sqlserver.olap newsgroup. I kept going, though, for a lot of other reasons:

  • To remember what I’ve learned. If I didn’t write this stuff down I would forget it, and trust me, I’m always googling for old posts here. This also explains why there is very little overall structure or purpose to what I write about. Technical books need to cover a topic very methodically: start at the basics, explain all the concepts and functionality, not miss anything out, and so on. Here, if I learn something interesting and useful while at work, or helping someone on a forum, or while playing around with a new tool, I just need to write that one thing down and not worry about whether it fits into some greater plan.
  • I also find that the act of writing up a problem or topic for a post helps me understand it better. To be able to explain a technical concept you first have to be sure you understand it properly yourself, and writing for other people forces you to do that.
  • To pass on Microsoft BI-related news. I work with these tools every day and so it’s natural that I want to find out what new toys I’ll have to play with in the future. I find this stuff interesting and fun, and it seems like there are several thousand other people around the world who also want to know what’s going on (even if we might not want to admit this publicly). I like airing my opinions too: sometimes Microsoft does things I agree with, sometimes it does things I think are crazy, and since my career and business is wholly dependent on Microsoft BI I think the occasional bit of public feedback is healthy and allowable. Brent Ozar sums up my feelings on this subject perfectly here. I’ve got in trouble once or twice for things I’ve written, but I’ve never regretted writing any of my posts.
  • It’s marketing for my consultancy and training. I have to make a living somehow, and if I didn’t blog then it would be much harder to find customers – I think my blog is much more valuable in this respect than writing books or speaking at conferences or user groups. I don’t want to sound cynical, though, and I don’t see this blog as something that is purely commercial. I love to share and it just so happens that sharing my knowledge is also good for business. Some two years after starting this blog, just after I resigned from my permie job to become a self-employed consultant, one of my soon-to-be ex-colleagues said to me “You know, you’ll have to stop blogging now: why would anyone hire you if they can read everything you know on your blog for free?”. I didn’t have a good answer for him at the time but I soon found that if someone finds the answer to a problem on my blog, they are much more likely to think about hiring me when they have a problem they can’t solve. What’s more, I firmly believe that the way that people in the SQL Server community share knowledge publicly, even when they are aware that this knowledge could be used by their competitors, means that the community as a whole is stronger, SQL Server is more successful, and we all benefit more commercially than if we had not shared in the first place.
  • I enjoy writing so I’m quite happy to spend my spare time writing blog posts. There’s no way I could have forced myself to write a thousand posts if I didn’t enjoy doing it. I also travel a lot for work, so that results in a lot of time spent in airports and hotel rooms with nothing better to do. To make another comparison with writing tech books: a tech book has to be objective, impartial, polished, structured, sober and impersonal, whereas a blog is (or at least in my opinion should be) personal, subjective, haphazard, rough-edged and sometimes controversial. This makes blogging less of an effort and more of a pleasure.
  • Finally, I admit it, I get a kick out of knowing that when I write something there are people out there who want to read it.

Will I make it to my 2000th post? I have no idea, but I probably will if Microsoft are still making BI tools and I’m still using them.

The Ethics Of Big Data

Some time ago I received a review copy of a book called “Ethics Of Big Data” from O’Reilly; I didn’t get round to writing a review of it here for a number of reasons but, despite its flaws (for example its brevity and limited scope), it’s worth reading. It deals with the ethics of data collection and data analysis from a purely corporate point of view: if organisations do not think carefully about what they are doing then

“Damage to your brand and customer relationships, privacy violations, running afoul of emerging legislation, and the possibility of unintentionally damaging reputations are all potential risks”

All of which is true, although I think what irked me about the book when I read it was that it did not tackle the wider and (to my mind) more important question of the social impact of new data technologies and their application. After all, this is what you and I do for a living – and I know that I haven’t spent nearly enough time thinking these issues through.

What prompted me to think about this again was a post by Adam Curtis which argues that the way that governments and corporations are using data is stifling us on a number of levels from the personal to the political:

“What Amazon and many other companies began to do in the late 1990s was build up a giant world of the past on their computer servers. A historical universe that is constantly mined to find new ways of giving back to you today what you liked yesterday – with variations.

Interestingly, one of the first people to criticise these kind of “recommender systems” for their unintended effect on society was Patti Maes who had invented RINGO. She said that the inevitable effect is to narrow and simplify your experience – leading people to get stuck in a static, ever-narrowing version of themselves.

Stuck in the endless you-loop.”

Once our tastes and opinions have been reduced to those of the cluster the k-means algorithm has placed us in we have become homogenised and easier to sell to, a slave to our past behaviour. Worse, the things we have in common with the people in other clusters become harder to see. Maybe all of this is inevitable, but if there is going to be an informed debate on this then shouldn’t we, as the people who actually implement these systems, take part in it?

%d bloggers like this: