Christian Heilmann

Posts Tagged ‘mashup’

How I build my data.gov.uk mashup – UK-House-Prices.com

Thursday, January 21st, 2010

UK-House-Prices.com is a web site to see how the prices in a certain area changed over the years using a data set released by the UK government as part of the data.gov.uk initiative.

Here’s a screencast showing the app:

The first step was to get the right data. I was lucky enough to be invited to the initial “hack day” and pre-release of the data and looked around for something to mash up. Initially I wanted to do something with environmental data but I found a lot of it to be very old. Therefore I just did a search for “2009” at data.gov.uk and found that the house prices data from 1996 to now in England and Wales is available. The plan was set. This was it:

  • I wanted to build an interface to show this information that was very fast, very portable and show a nice map of the area next to the numbers.
  • I wanted to build this as a web app and as an application for the Yahoo homepage (as I needed to build one as a demo anyways)
  • Traffic and speed was the most important issue – as this might get huge.

Cleaning and converting data

I got the spreadsheet and was confronted with my old nemesis: Excel. After saving the sheet as CSV and spending some fun time regular expressions and split() I had the data in a cleaner, and more usable version (JSON, specifically). One fun part is that when there was no data available for a certain area the field was either “..”, “n/a” or just empty. Something to work around. The numbers were also formatted like 100,312 which is nice on the eye but needs un-doing when you want to sort them outside Excel.

Once I had the list of locations and their numbers I wanted to turn them into geographical locations to display maps of the area. For this I used Yahoo Placemaker, especially the YQL table (see an example for Rugby in the YQL console). This is the script I ran over the list of locations:


$out = ‘’;
for($i=0;$i $select = preg_replace(‘/,.*/’,’‘,$lines[$i]);
$select = preg_replace(‘/ UA/’,’‘,$select);
$url = ‘http://query.yahooapis.com/v1/public/yql?q=select%20match.place.woeId%2Cmatch.place.centroid%20from%20geo.placemaker%20where%20documentContent%20%3D%20%22’.urlencode($select.’,uk’).’%22%20AND%20documentType%3D%22text%2Fplain%22%20and%20appid%20%3D%20%22%22%20limit%201&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys’;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
$data = json_decode($output);
echo ‘{“place”:”’.$select.’”,’;
echo ‘”w”:”’.$data->query->results->matches->match->place->woeId.’”,’;
echo ‘”lat”:”’.$data->query->results->matches->match->place->centroid->latitude.’”,’;
echo ‘”lon”:”’.$data->query->results->matches->match->place->centroid->longitude.’”’.”},n”;
;

}

That was that – I had a data set I can work with.

Adding more information

The next thing I wanted to add was some more information about the area which meant using maps. As both Yahoo and Google maps have static map versions but are rate limited I wondered if there is a free version of that. And there is. Openstreetmap was the answer, especially the somewhat unofficial API I found with Google. To play safe, I wrote a script that gets the images and I cache it on my server to avoid killing this API.

I also wanted to show currently available houses in the area in case you are looking to buy. For this the natural choice for me was to use Nestoria as they also have an open YQL table (see the Nestoria table in the YQL console). So I used YQL and sorted the results by date:

select * from nestoria.search where place_name="Rugby" | sort(field='updated_in_days')

Using this I can get offers in the area live:

$url = ‘http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20nestoria.search%20where%20place_name%3D%22’.urlencode($city).’%22%20|%20sort%28field%3D%27updated_in_days%27%29&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&diagnostics=false’;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
$data = json_decode($output);
if($data->query->results){
$i=0;
$results = array_slice($data->query->results->listings,0,5);
if(sizeof($results)>0){
echo ‘

Current property listings (powered by Nestoria)

    ‘;
    foreach($results as $r){
    echo ‘
  • lister_url).’”>‘.($r->title).’‘;
    echo ‘

    Price: ‘.($r->price_formatted).’, Type of property: ‘.ucfirst($r->property_type).’, Updated: ‘.($r->updated_in_days_formatted).’ (‘.($r->updated_in_days).’ days)

    ‘;
    echo ‘

    Listed at: ‘.($r->datasource_name).’ by ‘.($r->lister_name).’.

    ‘;
    echo ‘
  • ‘;
    }

    echo ‘

‘;
}

}

Finding a charting solution

Adding interactive charts was the next step. I had a few issues with that:

  • While Google charts are full of win, they are rate-limited and I didn’t want to pull images. As the app was also meant to become a Yahoo application every image would have to be run through Caja for safety reasons which slowed it down.
  • Canvas and Flash solutions like YUI charts or Raphael were also not possible because of the performance of the YAP app.

So I wrote my own pure CSS bar charts to work around that issue.

Building the API

I put all these solutions together and built a small API that will give me the search results with three parameters: the location as an id and the start and end of the time range.

http://uk-house-prices.com/graphs.php?loc=1&start=10&end=20

Building the interface

To build the interface, I went all-out YUI. I took the YUI grids builder to create the main layout, the AutoComplete demo, the dual slider demo and the button and put them all together. Add an Ajax call to the form, and you are done. OK, I admit, there was quite a bit of cleaning up to be done :)

Notice that I am using progressive enhancement all the way. Without JavaScript you get dropdowns:

UK House Prices - without JavaScript by  you.

That’s it

The next thing I had to do is move the app over to the Yahoo Application Platform which was easy as I based it on an API - but this is another blog post :)

Building a (re)search interface for Yahoo, Bing and Google with YQL

Wednesday, December 9th, 2009

If you do a lot of research using web searches can be frustrating. Different search engines have different results, you need to open things in tabs and in general it can be pretty time consuming to find what you need.

To make this a bit easier I thought it’d be cool to have an interface that searches Yahoo, Google and Bing at the same time and thus I built GooHooBi:

As explained in the screen cast the thing under the hood of GooHooBi is YQL. Instead of fussing about with all the different search APIs all I did was to play with the YQL console and put together the following YQL statement:

select * from query.multi where queries=’
select Title,Description,Url,DisplayUrl
from microsoft.bing.web(20) where query=”cat”;
select title,clickurl,abstract,dispurl
from search.web(20) where query=”cat”;
select titleNoFormatting,url,content,visibleUrl
from google.search(20) where q=”cat”

The query.multi table in YQL allows you to list a few queries which will be executed one after the other on the YQL server. The queries themselves I put together by using the different tables in the console and only selecting what I really need from each of them.

You can try this query in the YQL console and you can see the JSON output.

The rest is pretty easy. Cut this up into a parameterized string and do a cURL call:

$query = filter_input(INPUT_GET, ‘search’, FILTER_SANITIZE_SPECIAL_CHARS);

$queries[] = ‘select Title,Description,Url,DisplayUrl ‘.
‘from microsoft.bing.web(20) where query=”’.$query.’”’;
$queries[] = ‘select title,clickurl,abstract,dispurl ‘.
‘from search.web(20) where query = “’.$query.’”’;
$queries[] = ‘select titleNoFormatting,url,content,visibleUrl ‘.
‘from google.search(20) where q=”’.$query.’”’;
$url = “select * from query.multi where queries=’”.join($queries,’;’).”’”;
$api = ‘http://query.yahooapis.com/v1/public/yql?q=’.
urlencode($url).’&format=json&env=store’.
‘%3A%2F%2Fdatatables.org%2Falltableswithkeys&diagnostics=false’;

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $api);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
$data = json_decode($output);

Then loop over the results and assemble the HTML output.

You can check the source code of GooHooBi.

In addition to this, here’s a half hour live coding screencast how to build something similar:

Building a search mashup with YQL using Google, Yahoo and Bing – live :) from Christian Heilmann on Vimeo.

The source of the code built in this screencast is also on GitHub.

Newsmap – using Placemaker to add geo location to a news feed

Friday, May 22nd, 2009

I am right now very excited about the new Placemaker beta – a location extraction web service released at Where2.0. Using Placemaker you can find all the geographical locations in a feed or a text or a web url and you get them back as an array of places.

As a demo I took the Yahoo News feed and ran it through Placemaker. The resulting places are plotted on a map and the map moves from location to location when you hover over the news items.

The result is online at http://isithackday.com/hacks/placemaker/map.php

Yahoo News Map by  you.

Getting the data from the data feed and running it through placemaker is very straight forward. I explained the basic principle in this blog post on the Yahoo Developer Network blog. The only thing to think about is to define the input and output types correctly:


$key = ‘PASTE YOUR API KEY HERE’;
$apiendpoint = ‘http://wherein.yahooapis.com/v1/document’;
$url = ‘http://rss.news.yahoo.com/rss/topstories’;
$inputType = ‘text/rss’;
$outputType = ‘rss’;
$post = ‘appid=’.$key.’&documentURL=’.$url.
‘&documentType=’.$inputType.’&outputType=’.$outputType;
$ch = curl_init($apiendpoint);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$results = curl_exec($ch);
?>

If you look at the source of this example you will find that Placemaker injected contentlocation elements in the feed itself:


xmlns:georss=”http://www.georss.org/georss” xmlns:cl=”http://wherein.yahooapis.com/v1/cle”
xmlns:xml=”http://www.w3.org/XML/1998/namespace” xml:lang=”en”>

2514815

38.8913
-77.0337


23424793

21.511
-77.8068


23424977

48.8907
-116.982


55843872
Guantanamo, CU]]>
19.9445
-75.1541


You’ll also notice that the elements are namespaced and the names of the locations in CDATA blocks, both things I hate with a passion. Not because they don’t make sense, but because simplexml can be drag to make understand them.

What I wanted to do with this data was twofold: create a JSON array of geo locations to plot on a map and a display of the news content. This is the PHP that does that:


$places = simplexml_load_string($results, ‘SimpleXMLElement’,
LIBXML_NOCDATA);
// if there are elements found
if($places->channel->item){
// start a JSON array
$output .= ‘[‘;
// start the HTML output
$html = ‘‘;
?>

The result of this can be seen here http://isithackday.com/hacks/placemaker/map-2.php.

The JavaScript to show the map is pretty straight forward and more or less the demo example of the maps API:


// will be called with the array assembled in PHP
function placeonmap(o){
// if there are locations
if(o.length > 0){
// create a new geopoints array to hold all locations
// this is needed to determine the original zoom
// level of the map
var geopoints = [];
// add map with controls
var map = new YMap(document.getElementById(‘map’));
map.addZoomLong();
map.addPanControl();
// loop over locations
for(var i=0;i // define a new geopoint and store it in the array
var point = new YGeoPoint(o[i].lat,o[i].lon);
geopoints.push(point);
// create a new marker and give it the unique
// id defined in the PHP. Pop up the title of
// the news item and the name of the location when the
// user hovers over the marker
var newMarker = new YMarker(point,o[i].id);
newMarker.addAutoExpand(o[i].title + ‘(‘+o[i].name+’)’);
map.addOverlay(newMarker);
}

// define best zoom level and show map
var zac = map.getBestZoomAndCenter(geopoints);
map.drawZoomAndCenter(zac.YGeoPoint,zac.zoomLevel);
}

// add a mouseover handler to the list of results
YAHOO.util.Event.on(‘news’,’mouseover’,function(e){
// remove the “news” text of the ID of the current target
// as we named the list items news0 to news19
var id = YAHOO.util.Event.getTarget(e).id.replace(‘news’,’‘);
// if there is still something left we have one of the news
// items
if(id!==’‘){
// get the first marker with the ID we defined in the loop.
var marker = map.getMarkerObject(‘m’+id+’x0’);
// if there is one, pan the map there and show the message
// attached to it.
if(marker){
map.panToLatLon(marker.YGeoPoint);
marker.openAutoExpand();
}

}
});
}

// call placeonmap with the JSON array
placeonmap();

That’s pretty much it. I am sure it can be refined, but it is amazing how easy it is to get geo information into any text with Placemaker.

News Mixer – my first attempt at using the Guardian’s open platform content API

Tuesday, March 10th, 2009

I am a very happy bunny at the moment. First of all because there is more yummy data on the web to play with as The Guardian just released a brand new API to access their archives and secondly as I was invited to play with it before it was public. The announce of the API was today and I’ve spent a few hours yesterday in my hotel room before checking out to build news mixer

News Mixer - web news and images enhanced by Guardian content

The API is simple enough to use and once you got your developer key you can search for content and request the more detailed data using a content ID. The next problem to tackle was what to build.

Access of data and tags is easy

I love that we turned the web from yet another information channel into a read/write web and that user generated content allows us to get information from everybody and not just from dedicated journalists. I also love that you can tag information and make it easier to find that way. Lastly I love that with products like BOSS you can now get access to information of search engines and use that in your own sites.

Relevancy of tags?

The tagging bit has me a bit annoyed though. While a few years ago when the idea was still fresh people tagged like mad and with high quality keywords this seemed to be on the decline a bit and as faster connections allow us to upload more and more data in bulk people stopped tagging sensibly and rely more on automated tags like geolocation or exif data in images.

Mixing user tags and professional categories

I wanted to show a news site that allows you to find keywords that match your search term that make sense and used two different APIs for that. BOSS allows you to search for news items and images and the BOSS web search also offers keyterms for certain web sites. These keyterms are to a degree user generated as this is what people entered into Yahoo to find the sites. I then used the new Guardian Data API to pull relevant articles and as these are professionally tagged by journalists this makes for more relevant keywords. Putting the two together means a good mix of professional and up-to-date information.

The outcome is News Mixer and you can download the source code to play with it yourself.

It was amazingly straight forward to build, the only snags I hit were the following:

  • Whilst BOSS provides keyterms for web searches, it does not do so for news searches. Therefore I used YQL to get the keyterms of each of the urls returned by news search in a nested loop. This is a bit hacky and I would love for that to change.
  • The Guardian API returns articles by relevancy and not by date. You can specify though that you want articles before or after a certain date, which is why all I had to do is get the current date and go back one month from that.
  • The content body of the Guardian API does not provide any paragraph or list information. This is very annoying as it results in terrible display (a massive chunk of text). I’ve worked around the issue by splitting the content at full stops and then injecting paragraphs after every third of them but that is just guesswork and not real structure of text.

In any case I am happy to have such a cool new archive of information to play with and we’re working on open table definitions for YQL to make it easy for you to get to the goodies the Guardian offers us.

TTMMHTM: Good job news, extending browsers, TweetEffect and converting Wikipedia to JSON

Monday, January 19th, 2009

Things that made me happy this morning (and I tell you about in the afternoon):