Google Code: Web Authoring Statistics

In December 2005 we did an analysis of a sample of slightly over a billion documents, extracting information about popular class names, elements, attributes, and related metadata.

Some interesting things I picked up from the study are:

  • A whole slew of people are specifying the xml:lang attribute, which will have absolutely no effect (no HTML processor will look at that attribute; it’s an XML attribute).
  • Of the top twenty most-used attributes on body, fourteen are purely presentational.
  • The br element is a simple one, yet used on so many pages that it is the 8th most-used element. It is used more than the p element. There are very few legitimate semantic places to use this element (addresses and poems are the canonical examples), which means that most uses are probably presentational.
  • In our data sample there were twice as many pages that used the table element but didn’t use the td element
  • The script element was used on roughly half the pages we checked.

Google Code: Web Authoring Statistics

Google Book Search

Backlash and confusion have dogged Google’s plans to scan millions of books, and, while continuing with its efforts, it has opted for a name change from Google Print to Google Book Search.

Fully Story

“We don’t think that this new name will change what some folks think about this program,”

“But we do believe it will help a lot of people understand better what we’re doing.”

Do you believe them? I sure as hell don’t. Why does Gator… er, I mean Claria spring to mind?