Christian Heilmann

Posts Tagged ‘HTML’

Lynx would not be impressed – on semantics and HTML

Wednesday, November 16th, 2011

unimpressed lynx Lately there has been a lot of discussion about markup, and especially the new HTML5 elements. There was a big hoo-hah when Hixiethe WHATWG wanted to remove the time element from the HTML spec, Divya stirred lots of emotions with her “Our Pointless Pursuit Of Semantic Value” and of course Jeremy posted his views on the subject, too in a counterpoint article “Pursuing semantic value“.

Maybe there wasn’t a counterpoint, maybe there was. Frankly, I was too busy to read the lot. It also doesn’t matter that much, as I get more and more the feeling that we really need to think about the web as it was and how it will be. The lack of understanding of the value of semantic markup to me is just a symptom of a change that is happening.

Tales of yesteryear

A lot of the debate about semantic value and using the correct HTML is kept alive by people who have been around for a long time and seen browsers fail in more ways we care to remember. Valid markup and sensible structure was our only chance to reach maintainability and make sense of the things around us. This was especially important in the long long ago. I remember using Lynx to surf the web.

Lynx showing twitter.com

I also remember to keep Lynx in my arsenal for a bit longer. I used it to “see” what search engines and assistive technologies see. The former was correct at the time (not any longer, Google does index Flash and JavaScript and actually follows ill conceived links using hashbangs).

The latter was even wrong back then. A lot of the debate around using proper HTML5 right now tries to back up with the argument that “assistive technology like screen readers need it”. Nah, not quite the case yet.

Build it quickly, make it work

I’ve mentioned it in a few talks that when people mention the good old days where markup mattered and people cared and such they are talking nonsense. These days never existed and when we started with web development we struggled to make things work. We used tables for layout, NBSP for whitespace, lots of BR elements for vertical whitespace and more evil things. We then used spacer gifs for padding and margin and just started to care when CSS got out and supported. The reason was not that we wanted to write cleaner HTML. The reason was that we wanted to make things work as all we got was a design to build, not a description how to structure the document or what to build. When you start from the look of a web product, semantics are already on the endangered list.

Write less, achieve more

This is the mantra of now. The big success of jQuery is based on it. JavaScript standards were too complex and too verbose to write code quickly and change it quickly. So the jQuery crowd analysed what people did the most – changes to the DOM and adding and removing classes (and later Ajax) and made it damn easy and short to do. No need to write code that doesn’t do much.

The same happens over and over again. less and sass make the prefix hell and repetition for different browsers in CSS easy to maintain and client-side templating languages and browser-internal templating and client-side MVC make HTML the outcome of computation and programming logic and not a starting point.

If you can’t see it, why do it?

A lot of what the fans of HTML and semantics are getting excited about is not visible. Whenever a new HTML element got support and had a visual representation in the browser it was a no-brainer. People used it immediately. In most cases they used it wrongly, but they used it (I’ve seen fieldset and legend used around images as it is pretty and of course indentation with BLOCKQUOTE).

A lot of the semantically rich elements don’t show up at all. Blockquotes’s cite attribute was meant to give a quote meaning by telling us where it is from. ACRONYM and ABBR were supposed to tell people what a TLA meant – heck we don’t even do that in meetings and press releases so why bother adding info that the browsers don’t show the users.

This is also a big issue for Microformats. If browsers made an address draggable to your address book or created voting buttons for VoteLinks, if a browser would automatically detect events and give you a simple interface to add to your calendar they’d be a no-brainer to use. As it is, we have a few success stories to tell, for a lot of work to do.

The big book of ARIA

It gets really frustrating when we are talking about accessibility. Making a web document available for people with various abilities should be easy when we stick to keeping things simple and follow human logic.

It should, but it isn’t. And by keeping things very simple we can reach more people but we could also deprive a large group of great interfaces. Whenever we do crazy things in the browser and the talk comes to making them accessibile the way out is the mythical ARIA.

If you dive into ARIA you will realise very quickly that it is a lot of work, hard to understand concepts and above all a lot of code to write. Instead of having accessibility as an integral part of HTML5, we have to deal with two parallel standards. One to achieve things quickly and move the web from documents to applications and another one to keep it available for everyone out there. This is not a good place to be in. Accessibility happens when you embrace it from the very start. There is no magic bullet layer at the end of the process that makes things work.

So what about HTML and semantics?

You know what? There is no solution for all. The reason is not that technology moved on or people don’t care about users or that our standards are holding us back or anything like that. The reason is that “write once, deploy anywhere” is simply bullshit. The one thing that made the web work so far and become an amazing market to work in is flexibility. We all enjoy that you can reach a seemingly similar experience for our end users in many different ways. So why are we banging on about one side of the development range or another?

How about this:

  • If you write a document by hand, use all the semantics you can add in. This is your handwriting, your code is your poetry and people learn from looking at what you did.
  • If you need to write a hard-core application and every byte is a prisoner try to play nice with the semantics but follow your end goal of delivering speed. Make sure to tell people though that your code is the end result of conversions and optimisations and not for humans to look at.
  • Regardless of what you build – when you can use new technology – use it.
  • Remember that the web is not your browser and computer – add fallbacks for other browsers when using bleeding edge technology. When the others catch up you won’t have to alter your code!
  • The main focus of markup and web code that is not optimised for edge case apps is to make it easy for people to maintain it. If people can see in the HTML what is going on – win. If what only works with JS is generated by JS - even better.
  • More markup is not a crime when it is markup that adds value. Arguments that STRONG is worse than B because it means more code and slower loading pages are irrelevant in times of gzipping on the server
  • We can only escape the chicken and egg problem of new HTML when we use it. Right now, if you ask for support in browsers for new elements the answer from most vendors is that nobody uses them so why bother. And when you ask people why they don’t use them they tell you because browsers don’t support them. One of us has to start changing that.

Bigger fish to fry

Personally I am concentrating more on the things that really worry me about the web these days, and it might be interesting to list them.

  • Death of longevity – I always loved the fact that I can find something on the web and go back to it. This is not the case any longer. A lot of my old bookmarks are dead, my tweets go into the data nirvana after a certain number and I cannot access them any longer, and code you write for companies will be totally different shortly after your departure. This is not the web I want. It is a great mix of entertainment and archive and the “real time web” really messes with this.
  • High fidelity web sites – I remember when Flash made our 500Mhz machines flare up. Nowadays almost every cool new site I try out does that to my dual core macbook. I can see in the very near future pages coming up telling me that my video card is not good enough to enjoy them. This is the reason I never played games on a PC. Let’s use cool new and flashy in a sensible manner instead
  • Identity – we are spreading ourselves thin on the web right now and leave a lot of outdated and erroneous profiles of ourselves. Who you are on the web is becoming a very strange concept and some of the work I do right now tries to bring that back into an easier to maintain fashion.
  • The open web – today the US debates if it is a good idea to censor the internet like China, Syria and other countries do. This scares me. I started on the web as it was less regulated and much less commercial than radio or TV. Let’s not give up that freedom
  • The maker web – the web is ubiquitous, we use it as a part of our day-to-day work and play. Lately I find though that the creative part of the web is dying and people are consuming it rather than using and enriching it. This, again, scares and annoys me. We should not become virtual couch potatoes.

Semantics are like wonderful prose. You use them to deliver an enjoyable product. People are not celebrated for writing books. They are celebrated for what they filled them with. If we keep putting things on the web that have structure and get better on more sophisticated display products we are building for the future. If we point fingers at others doing it wrong we waste our time.

Got comments? Give them on Google+ or Facebook

Lynx photo by Jimmy Tohill #40 in the National Geographic photo contest 2011

Getting rusty – we need new best practices for a different development world

Monday, August 15th, 2011

Here’s the good news: we, those who promoted open web standards, have won! The web of today uses less and less closed technology and plugins and HTML, CSS and JavaScript are the tools used to create a lot of great web experiences.

For example nearly nobody uses Flash for a simple image gallery and more and more companies advertise themselves to us and also to the world as “supporting web standards” and “using open technology”.

Of course, there is a lot of lip service going on and – as Bruce Lawson put it we get HTML5, hollow demos and forgetting the basics. Bruce points out that a lot of HTML5 demos don’t have any semantic markup, don’t even create working links and have a lot of traits we’ve seen in Flash tunnels in the nineties and beginning of the millennium.

On the other side of this argument, a few people keep telling me they are working on blog posts on “why semantic markup and JavaScript fallbacks are not important any longer”.

I think there is a happy middle ground to be found and it mostly means that we need to understand the following: what you do as a web developer is very much dependent on the medium our work is consumed in.

How we use the internet has changed over the years and if we don’t want to be seen as enemies of progress we need to alter our best practices and give them a bit more flexibility when it comes to applying them.

Much like a web site looking and working the same in every browser means catering to the lowest common denominator and not using our platforms smartly having a “one stack of technologies used in a certain fashion” limits us in reaching people who just start on the web and simply want to get some work done.

Are our best practices really rooted in reality?

Altering best practices? How is that possible? Well, for starters I think a lot of what we preach is cargo cult rather than based on real happenings. A lot of the things we tell people “are absolutely necessary to make something work on the web” are not needed and many an excited explanation of the usefulness of semantic markup actually is not based on facts. A lot of what we do as best practices is done for us, not for the end users or the technology we use. But more on that later.

Looking back (not in anger)

How did we get to where we are now, where newly built showcase sites violate the simplest concepts like providing alternative text for an image or using structured HTML rather than a few empty DIVs?

In order to learn how we got into this perceived mess it is important to understand what we did in the past. A lot of talks and books and posts paint a picture of the brave new world of web standards brazenly cutting a path through the jungle of closed technology towards a bright future. This is far from what really happened. If we are honest, a lot of what we did was hacking around to make things work and then trying to find a way to make what we did sustainable. And that last step is what brought us semantics. When I hear praises of POSH – plain old semantic HTML as the way we built things in the past and forgot that skill over time I have to snigger. We did not do any of the sort – at least not in production.

Humble beginnings – HTML and CGI

in the long time ago, HTML was used for presentation, behaviour and structure

In the beginning there were no plugins and there was no JavaScript. We had HTML and images and the biggest mistake people already did was show text as images without any alternative text. This means that text-only browsers (which were still in use) and those on slow connections had the short end of the stick. Interaction was defined as clicking on links and submitting forms.

What we already started to try was to speed things up by using frames. For example we kept a “sticky navigation” and only loaded content pages without any menus. This was the start of breaking basic browser functionality like bookmarking for the sake of performance. I remember working around that using cookies to store the state of the page and re-write the frameset accordingly on subsequent visits. That only fixed it for the current user – sending a link out to others was not possible any more. But the pages loaded much faster.

Layout was achieved in HTML - with horizontal lines, PRE elements, lots of   and tables. It was most important to make the thing look right across all the browsers and not what the HTML really is.

What we tell people instead though is that these were simpler times where the HTML and its semantic value really mattered. I remember it differently.

DHTML days (1)

When JavaScript got supported we started to go properly nuts. Whole menus were written out with document.write() and we used popup windows with frames written dynamically inside them (for example for image galleries):

JavaScript allowed for richer interaction - and more mistakes

We even started checking which browser was in use and – in the more sensible cases – rendered different experiences. In the lesser thought-out solutions we just told people that “This site needs Internet Explorer 4 to work”.

We also started hiding and showing content with JavaScript. Sometimes we wrote it out with JS and didn’t give text browsers (or those in companies where people turned off JS by default by means of a proxy) any content or far too much to take in without caring about structure much.

When SEO started to matter we also used the NOSCRIPT tag to provide fallback text and links – most of the time laden with keywords instead of meaning.

DHTML days (2)

When CSS got supported things really took off – we could not only create dynamic things and show and hide them but really go to town moving, rotating, animating and stacking them. And that we did. DHTML library sites had hundreds of effect menus and image sliders and rotating buttons and whatnot:

JavaScript and CSS gave us the chance to build a lot of shiny things

Most of behaviour was done with JavaScript but we also started to play with CSS and :hover pseudo selectors to build “CSS only multi level dropdown menus” and other things that couldn’t be used without a mouse.

This was the high time of DHTML and the first line of almost any script was checking for IE and document.all or Netscape with document.layers. The speed of computers also forced us to go through all kind of dangerous hacks and tricks (dangerous as they made maintenance very hard as the hacks tended to not get documented) to make things look smooth.

The gospel (according to Zeldman)

When the book of Zeldman came out this was the message: Let’s stop trying to fix things for browsers and jump through hoops to make our products work for an environment that is not to be trusted and rely on the standards instead. The main tool for this message was the separation of technologies:

In order to bring sanity to the web development world we claimed that HTML is structure, CSS is presentation and JS is behaviour

HTML is the structure, CSS is for the visual look and feel and JavaScript is for behaviour. If we keep all of these things separated, then we have a good web product that is easy to maintain, works for everyone and is clean to extend and work with.

That was the idea and we took it further by coining the term Unobtrusive JavaScript (I remember having lots of fun writing this course) and subsequently DOM Scripting (with the DOM Scripting task force of the WaSP driving a lot and Jeremy Keith’s and my book giving good examples how to use it).

The state today

Nowadays it seems we have gone full circle back to the world of mixing and matching the concerns and layers of development:

Today, it seems, all layers of separation are mixed and matched again

With great power comes great responsibility and right now I get the feeling that the latter is very low on our radars as it is far too much fun playing with the cool new things we have at our disposal. We have mobile phones with incredibly fast processors, we have supersonic JavaScript engines capable of 3D animation and hardware accelerated CSS animations. This makes it hard to get excited about semantic values.

Almost all the technologies in the stack get washed out in this new web technology world and separation becomes much harder. What good is a CANVAS without any scripting? Should we animate in CSS or in JavaScript? Is animation behaviour or presentation? Should elements needed solely for visual effects be in the HTML or generated with JavaScript or with :after and :before in CSS?

We do a lot more on the client these days than we did in the past. It is time we give the client more credit in our best practices. Yes, old browsers are not likely to away anytime soon (and this is sometimes on purpose with for example Microsoft not offering IE upgrades to Windows XP - and soon Vista – users).

Separation of concerns vs. the image of web developers

In its meaning and approach the separation first explained by Zeldman is still an incredibly good idea – the thinking behind the separation of the different technologies is great. Some companies very much embraced the concept in their training and for example Yahoo even goes a step further in making it understandable by calling it “separation of concerns” and not layers of development.

This subtle difference also shows partly why this great idea was not always implemented in real life products: you need to have an understanding of how the different technologies work and how to write them properly. In essence, you want to have a team of people with different expert subject matters work together to build a kick-ass product.

In reality though web development is still seen as something any developer with a bit of training could do or when you hire a dedicated web development team they are considered to be experts across the board and are not allowed to concentrate on semantics, CSS or JavaScript.

This is the main reason why final web products out there do not have clear separation. In most cases, the developers are aware that they could have done a better job but they got forced to rush it or work with a technology they did not care much for. If you ever had to debug and optimise CSS written by Java developers you’ll know what I mean.

Web standards showcases and attrition

Whenever we praised a new product coming out as using web standards the right way it was not a big product. It was almost never the result of an enterprise framework or CMS. And – in a lot of cases – it was actually built to make a point about using web standards and not streamline the process of building a web product.

Take the site that most likely was the main cause for the breakthrough of CSS in the view of the community: CSS Zen Garden. The garden was a simple XHTML document, semantically correct but already with a lot of IDs and classes as handles to apply CSS rules to. It’s job was to show that by separating look and feel you can redesign a web site easily and make it look (and later on react to the user) totally different from one case to another.

This went incredibly well, until we got too excited about the possibilities we had with image replacement. Later submissions to the garden had large parts of text in background images, which was ironic as the original argument was that all the content should be in HTML.

In the real world, however, we never had a fixed HTML document to play with – we had CMS to create web pages for us and everything was in flux. You can’t control the amount of menu elements, you can’t control the amount of text, you will not be able to “simply add a class to an element” to give it some extra functionality. It is time we understood that we can inspire with showcase sites and presentations but we really don’t help the people developing web sites and fighting the concept of “everyone can do frontend, it is not hard code”.

Nobody wants to hear about the depth and composition of the sea when what we do is riding jet skis

Right now “best practice web development” talks, presentations and tutorials are incredibly self-referential. We speak to the same people about the same subjects and claim that people use semantics whilst out there the web is in a struggle to survive against walled garden development and native mobile development for a very small part of the world-wide market.

People happily say they “only build for webkit” as this is “the best and fastest and most stable browser”. People are OK to see a showcase site completely and utterly failing when you don’t have the right browser and the right OS.

We start to recede into our respective specialist areas and speak at specialist conferences. A lot that is taught at design conferences is the total opposite of what you hear at performance conferences. We build abstraction layer above abstraction layer to work around browser issues and release dozens of “miracle” libraries and scripts at conferences without even caring if anyone will ever use them.

Speed is still the main thing that we talk about. How to shave off 20 milliseconds off a script loader? How to make an animation 50fps instead of 30fps?

Best practices for a new market of developers

My favourite example was attending the Google IO accessibility talk. For about an hour we got taught how to turn an element into a button and keep it accessible. Not once was mentioned why we didn’t use a BUTTON element for the job. There was lots of great information in that talk – but all of it not needed as we simulate something the browser readily gives us with JS and CSS.

The new generation of developers we have right now are very excited about technology. We, the educators and explainers of “best practices” are tainted by years and years of being let down by browsers. jQuery and other environments propagated “write less, achieve more” as the main goal to success. Most of what we tell people is “to add this and that to give things meaning” and when they ask us “Why?” we have to come up with lies as for example not a single browser cares about the outline algorithm right now.

The only real benefit of using web standards

Using web standards means first and foremost one thing: delivering a clean, professional job. You don’t write clean markup for the browser, you don’t write it for the end users. You write it for the person who takes over the job from you. Much like you should use good grammar in a CV and not write it in crayon you can not expect to get the respect from people maintaining your code when you leave a mess that “works”.

And this is what we need to try to make new developers understand. It is about pride in delivering a clean job. Not about using the newest technology and chasing the shiny. For ourselves we have to understand that the only one who really cared about our beloved standards and separation of concerns is us – as we think maintainability and not quick deployment and continuous iteration of code. The web is not code – the web is a medium where we use a mix of technologies fit for the purpose on hand to deliver a great experience for the end users.

Slideshare embeds without Flash

Friday, November 12th, 2010

I’ve said it a few times before, but I love Slideshare. For a professional speaker like me it is a great way to share my decks and get feedback from people allowing them to re-use. The thing that some people complained about is that the embed is Flash based and as we all know Flash makes kittens cry and Ninjas visible so we can’t have that.

Don’t fret though as there is a way out. Say you have a presentation on Slideshare at http://www.slideshare.net/cheilmann/reasons-to-be-cheerful-fronteers-2010:
Reasons to be cheerful - Fronteers 2010 by photo

Simply add a /mobile/ before the user name to see the mobile version which is images with a bit of HTML:

Slideshare Mobile by photo

You could just slap this in an iframe but the chrome of the mobile version can be a bit overwhelming. No worries – the open web can fix that. Looking at the source code, you find a JSON object with all the info:

The interesting parts here are the baseSlideUrl and the totalSlides. To get the different images, just add —slide—{n}.jpg to the baseSlideUrl with {n} being the number of the slide.

Putting this together, adding some styling and a dash of YUI3 for functionality I can now present you with the embeddable HTML version:

Go to http://icant.co.uk/slidesharehtml and simply enter the URL of the slides to convert them. The source code of the converter is on GitHub so you can host it yourself.

See the flow in the following screencast:

I love open web technologies and clever converters, don’t you?

Why I don’t write my slides in HTML

Tuesday, November 2nd, 2010

At the Fronttrends2010 conference Tantek Çelik spent the last few minutes of his HTML5 talk praising HTML as a great format for presentations and urged people for the good of the open web to use HTML slide systems instead of Flash or PDF. Other presenters right now write awesome CSS3 driven slide shows and build their own scripts to show their presentation. I could, but I don’t – and here’s why.

Presentations are not web documents

I am all for the open web (heck, I just took a job evangelising it) but I don’t write my slides in HTML and I really don’t consider it a good format for something like a presentation. Here are my reasons:

  • A presentation is not what you see on the screen – many speakers have notes that are shown in presenter view in Keynote or Powerpoint and not shown to the audience but on a different screen. There are probably ways to do that with CSS and media queries but I have yet to find a slide system using web standards that supports this requirement. If you just read your slides you might as well not be on stage.
  • Adding images should also allow you to edit them – I find myself dragging photos into Keynote and cropping and resizing them all the time. This can be done with CSS and JavaScript but I have yet to see a slide system have that functionality.
  • Presentations need to scale to different resolutions – I’ve encountered anything from 800×600 to 1280×1024. A slide package resizes my fonts and keeps them the way I intended them to – HTML doesn’t do that yet. Again, I am sure with SVG, Canvas and clever trickery this can be easily done but please tell me about a system that considered that.
  • Presentations need to be a single, printable file – presentations get mailed around and printed out for people who like to edit or read on paper. Using a PDF I can do that. Printing out is needed for example when you have a conference with sign translation. As sign translators do not translate word by word but sentences by meaning it is important that they know what is coming. Unless HTML slide systems also support good print styles this is not really possible
  • HTML slides can’t be embedded and resized – using Slideshare people can embed my slides in their blog posts or articles and people can watch them in context. You can put HTML slides in an iframe but they wouldn’t resize but instead get massive scrollbars
  • Slides might need to be synced with audio to make sense – I normally record my talks in addition to offering the Slideshare embed. I also used to do slidecasts, but the editor on Slideshare is not good enough yet. This is something we could write for HTML slides – a syncing tool with audio that automatically moves ahead in the deck.
  • Slides need to work offline – many a conference doesn’t have working wireless and people want to read the slides on the train. If you use third party fonts or images hosted elsewhere or you link to live demos this is very frustrating. You can use offline storage for that though.
  • Slides should work without your computer and your browser – many hand-rolled slide decks expect the presenters settings, operating system or nightly build of a certain browser and are not written with progressive enhancement as they are for personal use. When people try to watch them on their own computer and cannot see the effects or demos explained this actually is bad advertising for open web technologies.
  • A slide deck has a fixed layout and fonts – and differences in browser rendering or elastic design effects are not welcome in a slide deck – so why choose a technology that excels in this?

Presentations are more than a document on the web and unless I can do the things above as effectively and easily in HTML as I can do them in Keynote, I won’t switch.

Arguments for HTML slide decks

The main argument – beyond “doing the right thing for the web” that Tantek mentioned was that your slides as a PDF or Flash movie just can’t be found on the web. This is not true – Google happily indexes PDF and Flash and furthermore Slideshare creates a transcript of your slides as an ordered list for SEO reasons.

The other argument which is more to the point is that HTML documents are easy to edit, re-use and update. Collaborating on a slide deck in Keynote and Powerpoint can lead to annoying inconsistencies across operating systems and software versions.

My hybrid approach

Personally, I use a hybrid approach to the issue. I write my presentation as notes and then create a slide deck from those. I explained the lot (and the above arguments against HTML slides) in the presentations chapter of the Developer Evangelism handbook:


When I write a new slide deck I start with a text editor. I write the story of my presentation and I follow the same rules as for writing online articles. That way I make sure of a few things:

  • I know the content and the extent of what I want to cover – which also allows me to keep to the time limit when presenting.
  • I have the information in a highly portable format for people to read afterwards – by converting it to HTML later on or blogging these notes.
  • I already know all the links that I want to show and can create easy-to-find versions of them – for example by bookmarking them in Delicious.
  • I don’t get carried away with visuals and effects – which is a big danger when you play with good presentation software.

Yes, this is duplicating work, but I think it is worth it – after all Slideshare is a community for slide decks – you already have a captive audience rather than hoping Googlebot comes around and considers you better than another resource on the same subject.

The Table of Contents script – my old nemesis

Wednesday, January 6th, 2010

One thing I like about – let me rephrase that – one of the amazingly few things that I like about Microsoft Word is that you can generate a Table of Contents from a document. Word would go through the headings and create a nested TOC from them for you:

Adding a TOC to a Word Document

Microsoft Word generated Table of Contents.

Now, I always like to do that for documents I write in HTML, too, but maintaining them by hand is a pain. I normally write my document outline first:

<h1 id="cute">Cute things on the Interwebs</h1>
<h2 id="rabbits">Rabbits</h2>
<h2 id="puppies">Puppies</h2>
<h3 id="labs">Labradors</h3>
<h3 id="alsatians">Alsatians</h3>
<h3 id="corgies">Corgies</h3>
<h3 id="retrievers">Retrievers</h3>
<h2 id="kittens">Kittens</h2>
<h2 id="gerbils">Gerbils</h2>
<h2 id="ducklings">Ducklings</h2>

I then collect those, copy and paste them and use search and replace to turn all the hn to links and the IDs to fragment identifiers:

<li><a href="#cute">Cute things on the Interwebs</a></li>
<li><a href="#rabbits">Rabbits</a></li>
<li><a href="#puppies">Puppies</a></li>
<li><a href="#labs">Labradors</a></li>
<li><a href="#alsatians">Alsatians</a></li>
<li><a href="#corgies">Corgies</a></li>
<li><a href="#retrievers">Retrievers</a></li>
<li><a href="#kittens">Kittens</a></li>
<li><a href="#gerbils">Gerbils</a></li>
<li><a href="#ducklings">Ducklings</a></li>
 
<h1 id="cute">Cute things on the Interwebs</h1>
<h2 id="rabbits">Rabbits</h2>
<h2 id="puppies">Puppies</h2>
<h3 id="labs">Labradors</h3>
<h3 id="alsatians">Alsatians</h3>
<h3 id="corgies">Corgies</h3>
<h3 id="retrievers">Retrievers</h3>
<h2 id="kittens">Kittens</h2>
<h2 id="gerbils">Gerbils</h2>
<h2 id="ducklings">Ducklings</h2>

Then I need to look at the weight and order of the headings and add the nesting of the TOC list accordingly.

<ul>
  <li><a href="#cute">Cute things on the Interwebs</a>
    <ul>
      <li><a href="#rabbits">Rabbits</a></li>
      <li><a href="#puppies">Puppies</a>
        <ul>
          <li><a href="#labs">Labradors</a></li>
          <li><a href="#alsatians">Alsatians</a></li>
          <li><a href="#corgies">Corgies</a></li>
          <li><a href="#retrievers">Retrievers</a></li>
        </ul>
      </li>
      <li><a href="#kittens">Kittens</a></li>
      <li><a href="#gerbils">Gerbils</a></li>
      <li><a href="#ducklings">Ducklings</a></li>
    </ul>
  </li>
</ul>
 
<h1 id="cute">Cute things on the Interwebs</h1>
<h2 id="rabbits">Rabbits</h2>
<h2 id="puppies">Puppies</h2>
<h3 id="labs">Labradors</h3>
<h3 id="alsatians">Alsatians</h3>
<h3 id="corgies">Corgies</h3>
<h3 id="retrievers">Retrievers</h3>
<h2 id="kittens">Kittens</h2>
<h2 id="gerbils">Gerbils</h2>
<h2 id="ducklings">Ducklings</h2>

Now, wouldn’t it be nice to have that done automatically for me? The way to do that in JavaScript and DOM is actually a much trickier problem than it looks like at first sight (I always love to ask this as an interview question or in DOM scripting workshops).

Here are some of the issues to consider:

  • You can easily get elements with getElementsByTagName() but you can’t do a getElementsByTagName('h*') sadly enough.
  • Headings in XHTML and HTML 4 do not have the elements they apply to as child elements (XHTML2 was proposing that and HTML5 has it to a degree – Bruce Lawson write a nice post about this and there’s also a pretty nifty HTML5 outliner available).
  • You can do a getElementsByTagName() for each of the heading levels and then concatenate a collection of all of them. However, that does not give you their order in the source of the document.
  • To this end PPK wrote an infamous TOC script used on his site a long time ago using his getElementsByTagNames() function which works with things not every browser supports. Therefore it doesn’t quite do the job either. He also “cheats” at the assembly of the TOC list as he adds classes to indent them visually rather than really nesting lists.
  • It seems that the only way to achieve this for all the browsers using the DOM is painful: do a getElementsByTagName('*') and walk the whole DOM tree, comparing nodeName and getting the headings that way.
  • Another solution I thought of reads the innerHTML of the document body and then uses regular expressions to match the headings.
  • As you cannot assume that every heading has an ID we need to add one if needed.

So here are some solutions to that problem:

Using the DOM:

(function(){
  var headings = [];
  var herxp = /h\d/i;
  var count = 0;
  var elms = document.getElementsByTagName('*');
  for(var i=0,j=elms.length;i<j;i++){
    var cur = elms[i];
    var id = cur.id;
    if(herxp.test(cur.nodeName)){
      if(cur.id===''){
        id = 'head'+count;
        cur.id = id;
        count++;
      }
      headings.push(cur);
    }
  }
  var out = '<ul>';
  for(i=0,j=headings.length;i<j;i++){
    var weight = headings[i].nodeName.substr(1,1);
    if(weight > oldweight){
      out += '<ul>'; 
    }
    out += '<li><a href="#'+headings[i].id+'">'+
           headings[i].innerHTML+'</a>';
    if(headings[i+1]){
      var nextweight = headings[i+1].nodeName.substr(1,1);
      if(weight > nextweight){
        out+='</li></ul></li>'; 
      }
      if(weight == nextweight){
        out+='</li>'; 
      }
    }
    var oldweight = weight;
  }
  out += '</li></ul>';
  document.getElementById('toc').innerHTML = out;
})();

You can see the DOM solution in action here. The problem with it is that it can become very slow on large documents and in MSIE6.

The regular expressions solution

To work around the need to traverse the whole DOM, I thought it might be a good idea to use regular expressions on the innerHTML of the DOM and write it back once I added the IDs and assembled the TOC:

(function(){
  var bd = document.body,
      x = bd.innerHTML,
      headings = x.match(/<h\d[^>]*>[\S\s]*?<\/h\d>$/mg),
      r1 = />/,
      r2 = /<(\/)?h(\d)/g,
      toc = document.createElement('div'),
      out = '<ul>',
      i = 0,
      j = headings.length,
      cur = '',
      weight = 0,
      nextweight = 0,
      oldweight = 2,
      container = bd;
  for(i=0;i<j;i++){
    if(headings[i].indexOf('id=')==-1){
      cur = headings[i].replace(r1,' id="h'+i+'">');
      x = x.replace(headings[i],cur);
    } else {
      cur = headings[i];
    }
    weight = cur.substr(2,1);
    if(i<j-1){
      nextweight = headings[i+1].substr(2,1);
    }
    var a = cur.replace(r2,'<$1a');
    a = a.replace('id="','href="#');
    if(weight>oldweight){ out+='<ul>'; }
    out+='<li>'+a;
    if(nextweight<weight){ out+='</li></ul></li>'; }
    if(nextweight==weight){ out+='</li>'; }
    oldweight = weight;
  }
  bd.innerHTML = x;
  toc.innerHTML = out +'</li></ul>';
  container = document.getElementById('toc') || bd;
  container.appendChild(toc);
})();

You can see the regular expressions solution in action here. The problem with it is that reading innerHTML and then writing it out might be expensive (this needs testing) and if you have event handling attached to elements it might leak memory as my colleage Matt Jones pointed out (again, this needs testing). Ara Pehlivavian also mentioned that a mix of both approaches might be better – match the headings but don’t write back the innerHTML – instead use DOM to add the IDs.

Libraries to the rescue – a YUI3 example

Talking to another colleague – Dav Glass – about the TOC problem he pointed out that the YUI3 selector engine happily takes a list of elements and returns them in the right order. This makes things very easy:

<script type="text/javascript" src="http://yui.yahooapis.com/3.0.0/build/yui/yui-min.js"></script>
<script>
YUI({combine: true, timeout: 10000}).use("node", function(Y) {
  var nodes = Y.all('h1,h2,h3,h4,h5,h6');
  var out = '<ul>';
  var weight = 0,nextweight = 0,oldweight;
  nodes.each(function(o,k){
    var id = o.get('id');
    if(id === ''){
      id = 'head' + k;
      o.set('id',id);
    };
    weight = o.get('nodeName').substr(1,1);
    if(weight > oldweight){ out+='<ul>'; }
    out+='<li><a href="#'+o.get('id')+'">'+o.get('innerHTML')+'</a>';
    if(nodes.item(k+1)){
      nextweight = nodes.item(k+1).get('nodeName').substr(1,1);
      if(weight > nextweight){ out+='</li></ul></li>'; }
      if(weight == nextweight){ out+='</li>'; }
    }
    oldweight = weight;
  });
  out+='</li></ul>';
  Y.one('#toc').set('innerHTML',out);
});</script>

There is probably a cleaner way to assemble the TOC list.

Performance considerations

There is more to life than simply increasing its speed. – Gandhi

Some of the code above can be very slow. That said, whenever we talk about performance and JavaScript, it is important to consider the context of the implementation: a table of contents script would normally be used on a text-heavy, but simple, document. There is no point in measuring and judging these scripts running them over gmail or the Yahoo homepage. That said, faster and less memory consuming is always better, but I am always a bit sceptic about performance tests that consider edge cases rather than the one the solution was meant to be applied to.

Moving server side.

The other thing I am getting more and more sceptic about are client side solutions for things that actually also make sense on the server. Therefore I thought I could use the regular expressions approach above and move it server side.

The first version is a PHP script you can loop an HTML document through. You can see the outcome of tocit.php here:

<?php
$file = $_GET['file'];
if(preg_match('/^[a-z0-9\-_\.]+$/i',$file)){
$content = file_get_contents($file);
preg_match_all("/<h([1-6])[^>]*>.*<\/h.>/Us",$content,$headlines);
$out = '<ul>';
foreach($headlines[0] as $k=>$h){
 if(strstr($h,'id')===false){
   $x = preg_replace('/>/',' id="head'.$k.'">',$h,1);
   $content = str_replace($h,$x,$content);
   $h = $x;
 };
 $link = preg_replace('/<(\/)?h\d/','<$1a',$h);
 $link = str_replace('id="','href="#',$link);
 if($k>0 && $headlines[1][$k-1]<$headlines[1][$k]){
   $out.='<ul>';
 }
 $out .= '<li>'.$link.'';
 if($headlines[1][$k+1] && $headlines[1][$k+1]<$headlines[1][$k]){
   $out.='</li></ul></li>';
 }
 if($headlines[1][$k+1] && $headlines[1][$k+1] == $headlines[1][$k]){
   $out.='</li>';
 }
}
$out.='</li></ul>';
echo str_replace('<div id="toc"></div>',$out,$content);
}else{
  die('only files like text.html please!');
}
?>

This is nice, but instead of having another file to loop through, we can also use the output buffer of PHP:

<?php
function tocit($content){
  preg_match_all("/<h([1-6])[^>]*>.*<\/h.>/Us",$content,$headlines);
  $out = '<ul>';
  foreach($headlines[0] as $k=>$h){
   if(strstr($h,'id')===false){
     $x = preg_replace('/>/',' id="head'.$k.'">',$h,1);
     $content = str_replace($h,$x,$content);
     $h = $x;
   };
   $link = preg_replace('/<(\/)?h\d/','<$1a',$h);
   $link = str_replace('id="','href="#',$link);
   if($k>0 && $headlines[1][$k-1]<$headlines[1][$k]){
     $out.='<ul>';
   }
   $out .= '<li>'.$link.'';
   if($headlines[1][$k+1] && $headlines[1][$k+1]<$headlines[1][$k]){
     $out.='</li></ul></li>';
   }
   if($headlines[1][$k+1] && $headlines[1][$k+1] == $headlines[1][$k]){
     $out.='</li>';
   }
  }
  $out.='</li></ul>';
  return str_replace('<div id="toc"></div>',$out,$content);
}
ob_start("tocit");
?>
[... the document ...]
<?php ob_end_flush();?>

The server side solutions have a few benefits: they always work, and you can also cache the result if needed for a while. I am sure the PHP can be sped up, though.

See all the solutions and get the source code

I showed you mine, now show me yours!

All of these solutions are pretty much rough and ready. What do you think how they can be improved? How about doing a version for different libraries? Go ahead, fork the project on GitHub and show me what you can do.