Sunday, September 16

The Digg Oracle - Data Mining On The Client

Brian Shaler noticed that almost a year ago, Digg removed the “search your own Diggs” feature, to the dismay of thousands of Digg users. To explain why the feature had not yet returned, they cited hardware and software solutions as being very complicated and expensive.

Brian decided to re-implement the feature himself using the Digg APIs, and we end up with The Digg Oracle:

Because the dataset is relatively small and user-specific, performing tasks like searching/filtering and  sorting can easily be done on the client, using Google Gears. The tool downloads the selected user’s entire voting history, indexes the stories in the local DB, then does all the sorting/searching without  connecting to Digg’s servers.

Here we see an original query, and the application starts to download the users usage data:

 

Digg Oracle Loading

 

When the data is loaded, searching and filtering the data is extremely fast, even if you use Kevin Rose as your sample :) This is a great non-offline example of using the database and workerpool components.

Source | The Digg Oracle

No comments: