Christian Heilmann

Posts Tagged ‘curl’

cURL – your “view source” of the web

Friday, December 18th, 2009

What follows here is a quick introduction to the magic of cURL. This was inspired by the comment of Bruce Lawson on my 24 ways article:

Seems very cool and will help me with a small Xmas project. Unfortunately, you lost me at “Do the curl call”. Care to explain what’s happening there?

What is cURL?

OK, here goes. cURL is your “view source” tool for the web. In essence it is a program that allows you to make HTTP requests from the command line or different language implementations.

The cURL homepage has all the information about it but here is where it gets interesting.

If you are on a Mac or on Linux, you are in luck – for you already have cURL. If you are operation system challenged, you can download cURL in different packages.

On aforementioned systems you can simply go to the terminal and do your first cURL thing, load a web site and see the source. To do this, simply enter

curl "http://icant.co.uk"

And hit enter – you will get the source of icant.co.uk (that is the rendered source, like a browser would get it – not the PHP source code of course):

showing with curl

If you want the code in a file you can add a > filename.html at the end:

curl "http://icant.co.uk" > myicantcouk.html

Downloading with curl by  you.

( The speed will vary of course – this is the Yahoo UK pipe :) )

That is basically what cURL does – it allows you to do any HTTP request from the command line. This includes simple things like loading a document, but also allows for clever stuff like submitting forms, setting cookies, authenticating over HTTP, uploading files, faking the referer and user agent set the content type and following redirects. In short, anything you can do with a browser.

I could explain all of that here, but this is tedious as it is well explained (if not nicely presented) on the cURL homepage.

How is that useful for me?

Now, where this becomes really cool is when you use it inside another language that you use to build web sites. PHP is my weapon of choice for a few reasons:

  • It is easy to learn for anybody who knows HTML and JavaScript
  • It comes with almost every web hosting package

The latter is also where the problem is. As a lot of people write terribly shoddy PHP the web is full of insecure web sites. This is why a lot of hosters disallow some of the useful things PHP comes with. For example you can load and display a file from the web with readfile():

<?php
  readfile('http://project64.c64.org/misc/assembler.txt');
?>

Actually, as this is a text file, it needs the right header:

<?php
  header('content-type: text/plain');
  readfile('http://project64.c64.org/misc/assembler.txt');
?>

You will find, however, that a lot of file hosters will not allow you to read files from other servers with readfile(), or fopen() or include(). Mine for example:

readfile not allowed by  you.

And this is where cURL comes in:

<?php
header('content-type:text/plain');
// define the URL to load
$url = 'http://project64.c64.org/misc/assembler.txt';
// start cURL
$ch = curl_init(); 
// tell cURL what the URL is
curl_setopt($ch, CURLOPT_URL, $url); 
// tell cURL that you want the data back from that URL
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
// run cURL
$output = curl_exec($ch); 
// end the cURL call (this also cleans up memory so it is 
// important)
curl_close($ch);
// display the output
echo $output;
?>

As you can see the options is where things get interesting and the ones you can set are legion.

So, instead of just including or loading a file, you can now alter the output in any way you want. Say you want for example to get some Twitter stuff without using the API. This will get the profile badge from my Twitter homepage:

<?php
$url = 'http://twitter.com/codepo8';
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
$output = curl_exec($ch); 
curl_close($ch);
$output = preg_replace('/.*(<div id="profile"[^>]+>)/msi','$1',$output);
$output = preg_replace('/<hr.>.*/msi','',$output);
echo $output;
?>

Notice that the HTML of Twitter has a table as the stats, where a list would have done the trick. Let’s rectify that:

<?php
$url = 'http://twitter.com/codepo8';
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
$output = curl_exec($ch); 
curl_close($ch);
$output = preg_replace('/.*(<div id="profile"[^>]+>)/msi','$1',$output);
$output = preg_replace('/<hr.>.*/msi','',$output);
$output = preg_replace('/<?table>/','',$output);
$output = preg_replace('/<(?)tr>/','<$1ul>',$output);
$output = preg_replace('/<(?)td>/','<$1li>',$output);
echo $output;
?>

Scraping stuff of the web is but one thing you can do with cURL. Most of the time what you will be doing is calling web services.

Say you want to search the web for donkeys, you can do that with Yahoo BOSS:

<?php
$search = 'donkeys';
$appid = 'appid=TX6b4XHV34EnPXW0sYEr51hP1pn5O8KAGs'.
         '.LQSXer1Z7RmmVrZouz5SvyXkWsVk-';
$url = 'http://boss.yahooapis.com/ysearch/web/v1/'.
       $search.'?format=xml&'.$appid;
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
$output = curl_exec($ch); 
curl_close($ch);
$data = simplexml_load_string($output);
foreach($data->resultset_web->result as $r){
  echo "<h3><a href=\"{$r->clickurl}\">{$r->title}</a></h3>";
  echo "<p>{$r->abstract} <span>({$r->url})</span></p>";
}
?>

You can also do that for APIs that need POST or other authentication. Say for example to use Placemaker to find locations in a text:

$content = 'Hey, I live in London, England and on Monday '.
           'I fly to Nuremberg via Zurich,Switzerland (sadly enough).';
$key = 'C8meDB7V34EYPVngbIRigCC5caaIMO2scfS2t'.
       '.HVsLK56BQfuQOopavckAaIjJ8-';
$ch = curl_init(); 
define('POSTURL',  'http://wherein.yahooapis.com/v1/document');
define('POSTVARS', 'appid='.$key.'&documentContent='.
                    urlencode($content).
                   '&documentType=text/plain&outputType=xml');
$ch = curl_init(POSTURL);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, POSTVARS);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);  
$x = curl_exec($ch);
$places = simplexml_load_string($x, 'SimpleXMLElement',
                                LIBXML_NOCDATA);    
echo "<p>$content</p>";
echo "<ul>";
foreach($places->document->placeDetails as $p){
  $now = $p->place;
  echo "<li>{$now->name}, {$now->type} ";
  echo "({$now->centroid->latitude},{$now->centroid->longitude})</li>";
};
echo "</ul>";
?>

Why is all that necessary? I can do that with jQuery and Ajax!

Yes, you can, but can your users? Also, can you afford to have a page that is not indexed by search engines? Can you be sure that none of the other JavaScript on the page will not cause an error and all of your functionality is gone?

By sticking to your server to do the hard work, you can rely on things working, if you use web resources in JavaScript you are first of all hoping that the user’s computer and browser understands what you want and you also open yourself to all kind of dangerous injections. JavaScript is not secure – every script executed in your page has the same right. If you load third party content with JavaScript and you don’t filter it very cleverly the maintainers of the third party code can inject malicious code that will allow them to steal information from your server and log in as your users or as you.

And why the C64 thing?

Well, the lads behind cURL actually used to do demos on C64 (as did I). Just look at the difference:

horizon 1990

haxx.se 2000

Using YQL to load and convert RSS feeds really, really fast.

Tuesday, December 8th, 2009

My esteemed colleague Stoyan Stefanov is currently running an advent calendar (blog post a day) on performance. Today I have a guest slot on his blog showing how you can use YQL to retrieve five RSS feeds much faster than with any other technology.

Retrieving five RSS feeds speed comparison.

As stated at the end of the article, you could use a YQL open table with embedded JavaScript to move all of the hard conversion work to the YQL server, too.

This table does exactly that. The speed of the retrieval slows down a bit with this (as YQL needs to do another request to pull the table definition):

Retrieving five RSS feeds and converting it on the server with YQL execute by  you.

However, using this table to retrieve multiple feeds as HTML is dead easy:

$data = array(
‘http://code.flickr.com/blog/feed/rss/’,
‘http://feeds.delicious.com/v2/rss/codepo8?count=15’,
‘http://www.stevesouders.com/blog/feed/rss’,
‘http://www.yqlblog.net/blog/feed/’,
‘http://www.quirksmode.org/blog/index.xml’
);
$url =’http://query.yahooapis.com/v1/public/yql?q=’;
$query = “use ‘http://github.com/codepo8/yql-rss-speed-comparison/raw/master/rss.multi.list.xml’ as m;select * from m where feeds=”’”.implode(“’,’”,$data).”’” and html=’true’ and compact=’true’”;
$url.=urlencode($query).’&format=xml&diagnostics=false’;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$content = curl_exec($ch);
curl_close($ch);
$content = preg_replace(‘/.*
$content = preg_replace(‘/div>.*/’,’div>‘,$content);
echo $content;

To use the open table you simply need to give it the list of RSS feeds as the feeds parameter:

use ‘http://github.com/codepo8/yql-rss-speed-comparison/raw/master/rss.multi.list.xml’ as m;
select * from m where feeds=”
‘http://code.flickr.com/blog/feed/rss/’,
‘http://feeds.delicious.com/v2/rss/codepo8?count=15’,
‘http://www.stevesouders.com/blog/feed/rss’,
‘http://www.yqlblog.net/blog/feed/’,
‘http://www.quirksmode.org/blog/index.xml’
” and html=’true’ and compact=’true’

Try it out in the YQL console.

The html parameter defines if you want to get HTML back from the table. Take it out to get a list of feeds instead.

See the results as HTML (with an HTML parameter) or as feeds (without the HTML parameter).

The compact parameter defines if you want to get descriptions back for each entry or not.

See the results as HTML (without descriptions) or as HTML with descriptions.

By using the JSON-P-X output format (xml with callback) you could easily use this in JavaScript:



If you want to compare yourself, get the source code of all the examples from GitHub.

TTMMHTM: Guardian getting enabled by design,interview,open hack day,bash magic,and XSS filters

Wednesday, March 18th, 2009

Things that made me happy this morning:

Geekmeet Stockholm – Performance and Play

Friday, December 5th, 2008

I am just getting ready for my second day in Stockholm, Sweden to go to the bwin offices and talk about professional web development and the change of JavaScript. The professional thing is going to be interesting as I am still feeling the beers of the GeekMeet yesterday night.

Geekmeet SwedenGeekmeet Sweden

Talking of GeekMeet, except for the interesting choice of advertising keywords showing up when you look for it, it was a roaring success and if I had a hat it’d be off to the organizers at Creuna and Robert Nyman for pulling this out of their hats (ok, you killed that metaphor, now let it die in peace).

Over 150 geeks came to drink beer and pizza and had to wait for those by listening to my drivel about website performance and ethical hacking. Some seemed to have been inspired by it, so that’s good I guess.

What I have to say to the credit of the Swedish audience is that they have a great sense of humour and are very happy to get distracted by unexpected slides and side-stories. It was great fun presenting and chatting to people afterward.

Credit must also go to Robert Nyman for not being only a masterly “one, two” announcer but also finding a very nice way to introduce myself – playing hangman with my name using all the emails and messages I sent him over the years telling him off for doing things wrong. Thanks for making me sound like a picky bastard, but I understand that it came from the heart. I also explained that my connection with PPK started the same way – but with him being the picky one.

My first presentation revolved around things you can do to speed up your web sites, unashamedly based on the work done by Steve Souders, Nicole Sullivan, Stoyan Stefanov, Ed Eliot and Stuart Colville. You can get the slides on slideshare:

[slideshare id=819648&doc=shiftinggears-1228434047613720-9&w=425]

The second presentation was (re)introducing the concept of ethical hacking and an invitation for people to see the web as their playground using cURL and GreaseMonkey to remix and improve it:

Playing With The Web

Playing With The Web

My second talk at geekmeet sweden talking about the tools you can use to hack and remix the web.

Read “Playing With The Web” with Easy SlideShare

All in all I had a wonderful time and I was impressed how easy it was for me to deliver all of this in such a short amount of time (I just gave seven presentations and two interviews in three days in two countries, having written the presentations on airports and flights in between).

Sweden rocks! Now I am off to check out the ice bar in the hotel and tomorrow it is back to England.

Show the world your Twitter type (using PHP and Google Charts)

Sunday, November 23rd, 2008

I just had a bit of fun with Twitter and the Google charts API. You can now add an image to your blog, web site or wherever and show a picture of what kind of a twitter user you are. All you need to do is embed an image and give it the right source:

For example my user name is codepo8, which would be:

And the resulting image is:

For John Hicks for example it is:

And the resulting image is:

How it is done and how to “change stuff”

You can download the source code and have a play with this (I hope this will not spike my traffic :) so it might go offline if that is the case). There’s really not much magic to this:

First I get the user name and filter out nasties:


$user = $_GET[‘user’];
$isjs = “/^[a-z|A-Z|_|-|$|0-9|.]+$/”;
if(preg_match($isjs,$user)){

Then I set the content type to show the image and use cURL to get the information from the user’s twitter page.

header(‘Content-type:image/png’);
$info = array();
$cont = get(‘http://twitter.com/’.$user);

I get the information using regular expressions and put them in an associative array:

preg_match_all(‘/([^>]+)/msi’,$cont,$follow);
$info[‘follower’] = convert($follow[1][0]);
preg_match_all(‘/([^>]+)/msi’,$cont,$follower);
$info[‘followed’] = convert($follower[1][0]);
preg_match_all(‘/([^>]+)/msi’,$cont,$updates);
$info[‘updater’] = convert($updates[1][0]);

The convert function removes the comma punctuation added by twitter and makes sure the values are integers.

I then need to determine which of the three values is the highest and define a scaling factor as the Google API only allows values up to 100. I then check what the type of the user is by getting the right array key and change the values for displaying.


$max = max($info);
$convert = 100 / $max ;
foreach($info as $k=>$f){
if($f = $max){
$type = $k;
}

$disp[$k] = $f * $convert;
}


I check the type and assemble the display string accordingly:

if($type = ‘updater’){
$t = ’ is an ‘;
}

if($type = 'follower'){
$t = ' is a ';
}

if($type = ‘followed’){
$t = ’ is being ‘;
}

$title = $user . $t . $type;


I assemble the labels array and the values array and add all up to the correct Google charts API url. I use cURL to get the image and echo it out.

$out = array();
foreach($info as $k=>$i){
$out[] = $k.’+(‘.$i.’)’;
}

$labels = join($out,’|’);
$values = join($disp,’,’);
$img = get(‘http://chart.apis.google.com/chart?cht=p3&chco=336699&’.
‘chtt=’.urlencode($title).’&chd=t:’.$values.
‘&chs=350×100&chl=’.$labels);
echo $img;
}


The rest are the cURL and convert helper functions.

function get($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$feed = curl_exec($ch);
curl_close($ch);
return $feed;
}

function convert($x){
$x = str_replace(‘,’,’‘,$x);
$x = (int)$x;
return $x;
}

You like?

Faster version (one cURL, instead of two)

Instead of setting the PNG header and echoing out the image you can also just set a location header at the end and redirect the URL request to Google’s servers. I guess they have more bandwidth. :)