otsukare Thoughts after a day of work

meta refresh and HTTP

When inspecting Web Compatibility issues related to UA detection, I go through a small routine to isolate the source of the problem. It often starts with the shell and curl or httpie, where I test the bare domain name as a person would likely it. Today, I was checking an issue with the Chinese Web site baidu, which "serves a static HTML site to Firefox for Android and a desktop site to Firefox OS." according to the bug:

→ http -v GET http://baidu.com/

It generates the following HTTP request (note that the User-Agent is HTTPie by default.)

GET / HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate, compress
Host: baidu.com
User-Agent: HTTPie/0.7.2

The response from the server was surprising…

HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: max-age=86400
Connection: Keep-Alive
Content-Length: 81
Content-Type: text/html
Date: Fri, 25 Oct 2013 18:18:45 GMT
ETag: "51-4b4c7d90"
Expires: Sat, 26 Oct 2013 18:18:45 GMT
Last-Modified: Tue, 12 Jan 2010 13:48:00 GMT
Server: Apache

<html>
<meta http-equiv="refresh" content="0;url=http://www.baidu.com/">
</html>

200 OK means I have what you want. Here's the content, but the response payload was minimal with only a meta for refreshing the content of the page after 0 second and loading the URI http://www.baidu.com/.

There's already an existing tool in HTTP for doing this 301 Moved Permanently

HTTP/1.1 301 Moved Permanently
Location: http://www.baidu.com/

And that's all, you do not need anything else. If you think that the redirection might be blocked by a mechanism and want a human to decide to follow the new location, you can add an HTML payload.

HTTP/1.1 301 Moved Permanently
Location: http://www.baidu.com/
Content-Length: 89
Content-Type: text/html

<!doctype html><html><title>baidu</title><a href="http://www.baidu.com">Baidu</a></html>

Note: I'm avoiding cache information, because it's a permanent redirect. Though I wonder what bots usually do with and without the dates and ETag headers, and the redirect information.

The final issue is a bit more complicated as baidu is serving at least three types of content depending on the User-Agent: A low-tech mobile, an enhanced mobile, and a desktop version. Not really related to this post but worth noting for people who think there are only a few browsers on earth. One of the baidu script has a pretty interesting piece of code:

var w = /se /gi.test(navigator.userAgent);
var o = /AppleWebKit/gi.test(navigator.userAgent) && /theworld/gi.test(navigator.userAgent);
var k = /theworld/gi.test(navigator.userAgent);
var p = /360se/gi.test(navigator.userAgent);
var a = /360chrome/gi.test(navigator.userAgent);
var f = /greenbrowser/gi.test(navigator.userAgent);
var t = /qqbrowser/gi.test(navigator.userAgent);
var m = /tencenttraveler/gi.test(navigator.userAgent);
var j = /maxthon/gi.test(navigator.userAgent);
var u = /krbrowser/gi.test(navigator.userAgent);
var l = /BIDUBrowser/gi.test(navigator.userAgent) && (typeof window.external.GetVersion != "undefined");
var b = false;

That's a good thing to remember, the market is always more diverse that what you think. The diversity is local, the access is global. Anyone with any tool might access your Web site.

Otsukare!