Saturday, May 31, 2008

Parsing Bug Double Take

I use SiteMeter's free web counter to monitor traffic statistics for my site. The log gives some basic information on the browser and platform the visitor is using. This can usually be determined from the HTTP header submitted by the browser. I was a bit shocked to see this for operating system field:



I know that there is a small, and vocal, community of OS/2 users out there, and that IBM has been trying to prod them gently into moving on, but I was highly skeptical that someone might be browsing my site with such an ancient operating system. Then I looked at the browser string.



Apparently SiteMeter parsed the rendering engine version information, found "OS/2" in the string and concluded that must be the OS, even though the platform in parentheses is X11/Linux.

Admittedly it must be tough to write heuristic algorithms that can interpret these strings, given that there isn't a formal standard for them. Heck , even Microsoft's Internet Explorer 7 still uses "Mozilla" in its identifying string "for historical reasons" that date back over a decade.
blog comments powered by Disqus