Update 30 March 2011: The issue with Starbucks Web site has been fixed. There are still other sites exhibiting the issue. A fix has been proposed by Rohan Singh
I have always been experimenting with XHTML. When properly served as
application/xhtml+xml
, it becomes a powerful tool to check the quality
of your markup. The browser will throw an error if the code is not
well-formed. It doesn’t solve the accessibility or semantics issues, but
it helps a bit for constraining the quality of your code. On the other
hand, it introduces issues in terms of scripting if you decide to change
the Content-Type at the end of the chain.
Accept and Content-Type in HTTP
A browser (client) and a server exchange messages on how to handle a
specific piece of information identified by a URI. So When a client is
requesting http://www.opera.com
, it sends along some HTTP headers. One
of these headers specifies the type of format the browser is able to
process and the order in which it would like to process. This header is
Accept:
. Each browser has a slight different Accept:
header.
Browser Accept header
- Opera Desktop 11.0 text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
- Opera Mobile Emulator text/html, application/xml;q=0.9, application/xhtml+xml, multipart/mixed, application/vnd.wap.multipart.mixed, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
- Firefox 4 text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
- Safari 5 application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
- curl */*
- IE9 image/gif, image/jpeg, image/pjpeg, application/x-ms-application, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*
-
Table of Accept headers for some HTTP clients
Usually the server replies with the most appropriate format matching
what the client is requesting. So if the client is asking
application/xhtml+xml
and the server has the representation of this
resource in this format it will send it along with the appropriate
Content-Type
, in this case application/xhtml+xml
.
The issue
Some high profiles Web sites all under Microsoft
IIS are
behaving strangely. Let’s use curl as we did it previously and its
possibility to send specific User-Agent:
information and Accept:
HTTP headers. We will do our tests on Starbucks Web site.
Curl Accept: /
A very simple one, curl by default sends Accept: */*
. It means “send
me anything you could have for this URI and I will do my best to
understand it”
curl -sI http://www.starbucks.com
The server returns a Content-Type: application/xhtml+xml
. Fair enough,
we said we were accepting anything.
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 41954
Content-Type: application/xhtml+xml; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=40pk0kfikyv2lnr3t10yckjl; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 16:58:38 GMT
Curl Accept: text/html
Let’s be more precise. We will tell the server that we want text/html
only.
curl -sI -H "Accept: text/html" http://www.starbucks.com/
Neat! The server returns the right Content-Type: text/html
. Everything
is good! This server is behaving quite well so far.
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 41954
Content-Type: text/html; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=xm0t31ebosa1symnb5xn4gkm; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:03:35 GMT
Opera 11.01
I will be using Opera 11.01 with these two parameters :
User-Agent: Opera/9.80 (Macintosh; Intel Mac OS X 10.6.6; U; fr) Presto/2.7.62 Version/11.01
Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
We can use Opera Dragonfly to check the headers or use curl by mocking Opera.
curl -sI -H "Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1" -A "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.6; U; fr) Presto/2.7.62 Version/11.01" http://www.starbucks.com/
We then receive from the server Content-Type: application/xhtml+xml
.
As I said before, it is ok, because we said it was one of the formats we
accepted.
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: application/xhtml+xml; charset=utf-8
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=xdtworhtqlte5mxur2zeulay; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:09:43 GMT
The big issue is that the server is sending to Opera a file which is
obviously not well-formed XHTML and because the server said it was
Content-Type: application/xhtml+xml
, the XML parser fails to process a
none well-formed XML (XHTML). That is a major usability issue for Opera
users. Let’s continue a bit our testing.
Firefox 4
This time we will be using Firefox 4.
curl -sI -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" -A "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b11) Gecko/20100101 Firefox/4.0b11" http://www.starbucks.com/
Hmmm Amazing the server this time is answering with
Content-Type: text/html
. This starts to be fishy. Note that it is
still a valid answer from the server. The client said it could receive
both. The server decided to send html.
HTTP/1.1 200 OK
Cache-Control: private
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:06:43 GMT
Stricter Firefox 4 with only “application/xhtml+xml”
This time we will ask the server to send only application/xhtml+xml
curl -sI -H "Accept: application/xhtml+xml" -A "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b11) Gecko/20100101 Firefox/4.0b11" http://www.starbucks.com/
Oh surprise. This time the answer of the server is wrong. The server
sent back Content-Type: text/html
without respecting the HTTP
contract. Fishy, the server does user agent sniffing ?
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 41954
Content-Type: text/html; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=q41qfc3adwjsidjgtevpxnnk; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:24:48 GMT
Opera 11 with only “text/html”
Let’s be sure about the intuition on user agent sniffing. We send Opera
user agent and we require to receive text/html
.
curl -sI -H "Accept: text/html" -A "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.6; U; fr) Presto/2.7.62 Version/11.01" http://www.starbucks.com/
The server sends to Opera… Content-Type: application/xhtml+xml
. This
time we can be sure, the server does user agent sniffing.
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 41954
Content-Type: application/xhtml+xml; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=vfzwujvflotbibe4qet0foxb; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:27:51 GMT
Why is this an issue?
Opera gets a different content-type than other browsers which could be fine if the server was sending well-formed XML. Opera is right to fail on this none well-formed content. But unfortunately because nobody cared to test that the markup was well-formed, Opera looks like if it was wrong compared to other browsers. It’s why I call wrong to be right. The consequences are terrible, because the Opera users can’t access these sites and they are penalized because we do the right thing.
What should we do?
- We try to contact the owners of these Web sites (starbucks, spanair, phenomblue, leisurepro, mcafee, teavana, etc.)
- We try to identify the library which is in charge of the user-agent sniffing, we have a lead for an unsupported library called MDBF
- We could PATCH for these specific Web sites, but then we lose the benefits of pressuring the owners of fixing their sites, because they will have the impression it is working.
Really there is no perfect solution, but in the end, the Opera users do not have the freedom of choice.