Browser benchmarks are gamed – so why not make them a game?Monday, October 1st, 2012 at 10:35 pm
Tomorrow the Core Mobile Web Platform Community Group is meeting the Mozilla space in London to discuss the future of browser benchmarking. Sadly I won’t be able to attend as I am at Create the Web London on the same day and flying to Amsterdam for Fronteers later. However, I think this is a good opportunity to mention some things I have thought about which my colleague Jet Villegas will also mention tomorrow morning for me.
Here is what I am worried about: browser benchmarks are very hot, but fail to deliver data that will help us make the web the main development platform.
Benchmarks are becoming marketing material
My main concern is that browser benchmarks as a whole are a very academic and “close to the metal” exercise. Creating and wiping an empty canvas or creating and destroying thousands of objects gives you results, but it doesn’t mean that real product needs are met by optimising for these use cases. Jet will talk about some issues that actually do bring up false positives unless you test in a real browsing and using the browser scenario.
Even worse is that the press is hungry for browser news and loves when big corporates shoot at it each other. That’s why a lot of benchmarks are flawed from the very start to give a better result for one browser or platform or another. They’ve become a marketing tool, rather than helping us building better products with the web. Above all though, they are very, very boring.
Making benchmarking fun again
Let’s wind back the clock a bit to 1997. You might not have heard of this but back then Final reality was all the rage. It was a product of the demo scene (closely related to the much acclaimed Second reality) and a very cool thing to watch back then. You can check the YouTube video to see what it looked like.
This pushed the limits of the video card, the sound system and the CPU and the amazing thing was that it was a benchmark. After the demo ran all the way through you got a report on how well your hardware did in the test. These reports were not only bragging rights amongst overclockers but also used by admins to test out if hardware works in 10 minutes that would have taken ages with conventional test methods.
So why don’t we do something similar now?
The benchmarking game
How about that instead of an automated script we’d have a game? This already happens in some games that are built by browser makers.
However, this aims too high as sadly enough there is a lot of hardware out there that still chokes on WebGL and not everybody wants to play a 3D Shooter (I can’t be bothered, to be fair).
So how about this: a platformer or 2D shooter that gets incrementally more technically challenging to the platform the longer you play and offers extra levels to browsers and environments that support certain technologies and offers simpler ones to others.
Imagine a game that tests performance and reports it back after each level running on Facebook and being promoted in the Android (and Apple, yeah, a boy can dream) stores. People could play the game without being the wiser that they are actually helping us get real information from users on a large variety of devices on real (and flawed) connections and on browsers that are not 100% allocated to doing one task but have other tabs open and junk in their caches and RAM.
My other colleague Rob Hawkes is currently testing a lot of HTML5 games and compares the performance of different browsers on different mobile OS with these. This is great and a lot of work. I found that a lot of demos and also game demos have a developer mode that shows the FPS and the general performance. Wouldn’t it be great to have a database of this data instead of just seeing it on the screen for tweaking while we develop? There are systems like scoreloop who centralise the scores of games, why not the performance? This could be a whole new market in the HTML5 space.
Apps could of course benefit from that, too. Taking a well-used piece of software and adding performance reporting of – for example scroll-lists – would give us a lot of good information from our users rather than data built and reported in a lab environment. We could do Benchpress instead of WordPress?