otsukare Thoughts after a day of work

Magic Strings and User Agent Sniffing

User Agent sniffing is bad most of the time. It creates a lot of issues. It relies on the idea that a Web site should be working only for a few browser vendors. But User Agent sniffing becomes really unacceptable when the site definitely exclude specific browsers based on their user agent string. Even more so when we realize that once the user agent string has been spoofed, it is possible to access and use the content of the Web site.

Let’s take an example from last week, the exact domain name is not important, so let’s call it: http://bad.example.com/. It always starts with one or more bug reports of Opera users saying. I’m a customer of the company Bad Inc. and I’m not able to access the Web site with my browser. Then, we check if it’s a bug in Opera or an issue with the Web site. curl is a wonderful tool to quickly test what’s goint in between the browser and the server.

So let’s start. We check with Firefox, Safari and Opera and look what is working and not working. The combination is not always the same. In this case it was working with Safari, Firefox and not working in Opera. Let’s switch to the command line. The option I in curl creates a HEAD HTTP request.

% curl -sI http://bad.example.com/
HTTP/1.1 404 Not Found

That means that the server is clearly doing “whitelist” user agent sniffing. It allows only what it knows and blocks the rest. Let’s try with a Webkit user agent string.

% curl -sI -A "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; de-de) AppleWebKit/534.15+ (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4" http://bad.example.com/
HTTP/1.1 200 OK

It is working and with Opera?

% curl -sI -A "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.6; U; fr) Presto/2.7.62 Version/11.00" http://bad.example.com/
HTTP/1.1 404 Not Found

Not working. At this point the issue is clear the next step will be to contact the site and ask them to modify their server side user agent sniffing to include Opera or even better to include everyone else. But I was wondering what exactly triggered the user agent sniffing. For example, I tried to reduce the user agent string to Mozilla only.

% curl -sI -A "Mozilla" http://bad.example.com/
HTTP/1.1 404 Not Found

Not working that’s not it. What about Gecko?

% curl -sI -A "Gecko" http://bad.example.com/
HTTP/1.1 404 Not Found

Not working either… hmmm… ok one more try.

% curl -sI -A "Mozilla Gecko" http://bad.example.com/
HTTP/1.1 200 OK

Bingo! But what about IE it doesn’t have Gecko in its user agent string. after trial and errors I got

% curl -sI -A "Mozilla MSIE 6" http://bad.example.com/
HTTP/1.1 200 OK

with MSIE n, where the n >= 6. Put a 5 in there and it stops working. I thought ok that’s interesting what about adding these strings to Opera.

% curl -sI -A "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.6; U; fr) Presto/2.7.62 Version/11.00 Mozilla Gecko" http://bad.example.com/
HTTP/1.1 404 Not Found

Hmmm. oh! One more try

% curl -sI -A "Mozilla Opera/9.80 (Macintosh; Intel Mac OS X 10.6.6; U; fr) Presto/2.7.62 Version/11.00 Gecko" http://bad.example.com/
HTTP/1.1 200 OK

Bingo! The site is working. Big smile and then head banging on the table on realizing how it is dumb. What did I say? ah yes, do not use user agent sniffing if you do not know what you are doing.