As I said, I don’t have the time to enter the Connected Systems Developer competition that I blogged about the other week, but that hasn’t stopped me thinking about what I might build if I did enter. The following idea came to me at around 2am this morning when I was desperately trying to get my 22-month-old daughter to go back to sleep, and having nothing better to do this afternoon I thought I’d bounce it off anyone reading my blog. So comments are invited – even if they are just to say that it’s a rubbish idea and/or someone’s thought of it before and/or it’ll never work. I have after all categorised this post under ‘Random Thoughts’!
Business Case:
In the modern office everyone does a lot of web surfing; some of it might even be business-related. And whenever we see something interesting we typically copy the link into a mail, add a few words of explanation and send it on to a few people who might also want to have a look. I send at least two or three such emails a day. For the typically lazy web surfer, though, this process is a bit of a hassle so we only bother to do it when we think the link is really interesting and (because we don’t want to get a reputation as the office spammer) we only send it to a small number of people we know who we think are going to find it interesting too. It’s my contention that it would be cool if we could share more of these links with more people.
So, we need to solve three problems in our quest to share the interesting links we find during our daily surfing:
1) It needs to be easier to share the link once we’ve found it
2) We don’t want to send stuff on to people who aren’t going to be interested, and we want to receive only the links that we’re going to be interested in.
3) We’d like to be able to share links with people who we don’t actually know well enough to contact directly.
Of course there are plenty of existing ways that people share links, such as newsgroups, email discussion lists and blogs but they typically only address the third of the above problems fully, the second only partially and the first not very well at all. For instance, anyone reading my blog is presumably doing so because they’re interested in Microsoft’s BI tools and they’re going to be interested in any links to webcasts, articles etc that I post up, but if they’re like me they subscribe to upwards of a hundred rss feeds – and that’s only on subjects they’re really interested in – so we still have the proverbial information overload. The same goes for email discussion lists and newsgroups. And in all these cases in order to share information you have to open an email, write a blog post etc, which all require effort.
Let me give you an idea of the kind of scenario I want to tackle. This morning I was reading this story on the Register, and followed a link on a whim to this page, a set of pictures of Cybermen with funny captions. It brought a smile to my face but I didn’t send it on to anyone else because a) it didn’t seem worth the bother, and b) I didn’t know whether any of the people I usually send stuff onto were at least mildly into Dr Who in the way I am. I’m not going to blog about it because it’s not relevant to BI, I don’t subscribe to any Dr Who blogs, dls or newsgroups because I’m not that much of a Dr Who fan, and so no-one else is going to see it. Which is a shame.
Functional Spec:
Anyway, enough waffle about the theory. The solution I’m thinking of would consist of something like the following:
- An IE toolbar with only two controls on it: a button saying "This is a cool page" which you hit whenever you find an interesting link (regardless of whether you’ve found it yourself or it has been recommended to you by the system), and a textbox which allows you to add a short commentary on the contents of the page if you want. Whenever you hit the button, it sends the current url in your browser plus any comments to a web service which…
- …Puts the information in a queue on a server. There’s an app which gradually works its way through every link submitted, retrieves the page, strips the text from the html and does some funky text mining on this and the comments you’ve submitted, and classifies it. This is then used by…
- …Another server app which looks at your tastes (based on pages you’ve submitted in the past and perhaps other users who you’ve said you’re interested in seeing links from) and then using some more data mining gives you a short list of recently submitted links that you might be interested in, along with the comments of the people who have recommended them. This could either come in the form of a web page, a customised rss feed or a regular email newsletter.
So, in practice, let’s imagine it working as follows. Chris, Jon and Colin all work in a large corporation, in the same team doing the same kind of BI stuff. During his morning surfing, Chris submits 5-10 links; one, on a new feature of MDX, gets recommended automatically to Jon and Colin because everyone in the team works with MDX and has submitted MDX-related pages in the past. One, containing pictures of Cybermen with amusing comments, gets recommended only to Colin and only appears about halfway down his list because he’s a bit of a sci-fi fan and has submitted a few sci-fi links in the past. Meanwhile, David, who works in a different team and doesn’t know Chris, Jon or Colin finds a cool article on C-Omega and submits it so it gets recommended to the rest of his team; they all in turn click their buttons and so it eventually appears at the top of Jon’s list (because he’s really into coding) and somewhere down the list for Chris (because he’s not so into coding, but this is a really cool article nonetheless).
The larger the number of users with similar taste, the better it should work – more links submitted plus more people voting on the same links, and so the mining models can get to know people’s tastes much more quickly. I could imagine it doing well as an intranet app at a large tech company. It would probably need to give more priority to newer links (people want the latest stuff, and you don’t want old but popular links clogging up your recommendations) and maybe have some way of removing links you’ve already seen from your list of recommendations. One other extra feature that occurred to me was that the app could also generate a report showing the users who submitted the most interesting links, so as to generate a bit of rivalry and encourage future usage.
The key to it all though is the fact that all you need to do to submit a page is click a button in IE – the absolute minimum effort possible – and the fact that the job of the mining model is clear – recommend a page which will make you click your button in turn.
Technology:
It should be fairly straightforward to build the toolbar and the web service. Qualification for the competition comes with the use of SQL 2005 for storing all the data, SQLIS to do the processing, AS to do the data mining, and RS to do the web-based reports, daily email, even the rss feed (maybe as a custom rendering extension?). I’ll admit that I don’t know enough about data mining to know whether that bit will really work, but hey, it might.
OK, enough fantasising. If anyone does implement this and enters the competition, please can I have a share of the winnings?