In December 2005 we did an analysis of a sample of slightly over a billion documents, extracting information about popular class names, elements, attributes, and related metadata.
Some interesting things I picked up from the study are:
- A whole slew of people are specifying the xml:lang attribute, which will have absolutely no effect (no HTML processor will look at that attribute; it’s an XML attribute).
- Of the top twenty most-used attributes on body, fourteen are purely presentational.
- The br element is a simple one, yet used on so many pages that it is the 8th most-used element. It is used more than the p element. There are very few legitimate semantic places to use this element (addresses and poems are the canonical examples), which means that most uses are probably presentational.
- In our data sample there were twice as many pages that used the table element but didn’t use the td element
- The script element was used on roughly half the pages we checked.
Google Code: Web Authoring Statistics