otsukare Thoughts after a day of work

Wrong To Be Right - application/xhtml+xml

Update 30 March 2011: The issue with Starbucks Web site has been fixed. There are still other sites exhibiting the issue. A fix has been proposed by Rohan Singh

I have always been experimenting with XHTML. When properly served as application/xhtml+xml, it becomes a powerful tool to check the quality of your markup. The browser will throw an error if the code is not well-formed. It doesn’t solve the accessibility or semantics issues, but it helps a bit for constraining the quality of your code. On the other hand, it introduces issues in terms of scripting if you decide to change the Content-Type at the end of the chain.

Accept and Content-Type in HTTP

A browser (client) and a server exchange messages on how to handle a specific piece of information identified by a URI. So When a client is requesting http://www.opera.com, it sends along some HTTP headers. One of these headers specifies the type of format the browser is able to process and the order in which it would like to process. This header is Accept:. Each browser has a slight different Accept: header.

Browser Accept header


Opera Desktop 11.0 text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
Opera Mobile Emulator text/html, application/xml;q=0.9, application/xhtml+xml, multipart/mixed, application/vnd.wap.multipart.mixed, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
Firefox 4 text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Safari 5 application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
curl */*
IE9 image/gif, image/jpeg, image/pjpeg, application/x-ms-application, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*

Table of Accept headers for some HTTP clients

Usually the server replies with the most appropriate format matching what the client is requesting. So if the client is asking application/xhtml+xml and the server has the representation of this resource in this format it will send it along with the appropriate Content-Type, in this case application/xhtml+xml.

The issue

Some high profiles Web sites all under Microsoft IIS are behaving strangely. Let’s use curl as we did it previously and its possibility to send specific User-Agent: information and Accept: HTTP headers. We will do our tests on Starbucks Web site.

Curl Accept: /

A very simple one, curl by default sends Accept: */*. It means “send me anything you could have for this URI and I will do my best to understand it”

curl -sI http://www.starbucks.com

The server returns a Content-Type: application/xhtml+xml. Fair enough, we said we were accepting anything.

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 41954
Content-Type: application/xhtml+xml; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=40pk0kfikyv2lnr3t10yckjl; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 16:58:38 GMT

Curl Accept: text/html

Let’s be more precise. We will tell the server that we want text/html only.

curl -sI -H "Accept: text/html" http://www.starbucks.com/

Neat! The server returns the right Content-Type: text/html. Everything is good! This server is behaving quite well so far.

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 41954
Content-Type: text/html; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=xm0t31ebosa1symnb5xn4gkm; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:03:35 GMT

Opera 11.01

I will be using Opera 11.01 with these two parameters :

  • User-Agent: Opera/9.80 (Macintosh; Intel Mac OS X 10.6.6; U; fr) Presto/2.7.62 Version/11.01
  • Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1

We can use Opera Dragonfly to check the headers or use curl by mocking Opera.

curl -sI -H "Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1" -A "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.6; U; fr) Presto/2.7.62 Version/11.01" http://www.starbucks.com/

We then receive from the server Content-Type: application/xhtml+xml. As I said before, it is ok, because we said it was one of the formats we accepted.

HTTP/1.1 200 OK 
Cache-Control: private
Content-Type: application/xhtml+xml; charset=utf-8
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=xdtworhtqlte5mxur2zeulay; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:09:43 GMT

The big issue is that the server is sending to Opera a file which is obviously not well-formed XHTML and because the server said it was Content-Type: application/xhtml+xml, the XML parser fails to process a none well-formed XML (XHTML). That is a major usability issue for Opera users. Let’s continue a bit our testing.

Firefox 4

This time we will be using Firefox 4.

curl -sI -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" -A "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b11) Gecko/20100101 Firefox/4.0b11" http://www.starbucks.com/

Hmmm Amazing the server this time is answering with Content-Type: text/html. This starts to be fishy. Note that it is still a valid answer from the server. The client said it could receive both. The server decided to send html.

HTTP/1.1 200 OK
Cache-Control: private
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:06:43 GMT

Stricter Firefox 4 with only “application/xhtml+xml”

This time we will ask the server to send only application/xhtml+xml

curl -sI -H "Accept: application/xhtml+xml" -A "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b11) Gecko/20100101 Firefox/4.0b11" http://www.starbucks.com/

Oh surprise. This time the answer of the server is wrong. The server sent back Content-Type: text/html without respecting the HTTP contract. Fishy, the server does user agent sniffing ?

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 41954
Content-Type: text/html; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=q41qfc3adwjsidjgtevpxnnk; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:24:48 GMT

Opera 11 with only “text/html”

Let’s be sure about the intuition on user agent sniffing. We send Opera user agent and we require to receive text/html.

curl -sI -H "Accept: text/html" -A "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.6; U; fr) Presto/2.7.62 Version/11.01" http://www.starbucks.com/

The server sends to Opera… Content-Type: application/xhtml+xml. This time we can be sure, the server does user agent sniffing.

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 41954
Content-Type: application/xhtml+xml; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
p3p: CP="CAO PSA OUR"
Set-Cookie: ASP.NET_SessionId=vfzwujvflotbibe4qet0foxb; path=/; HttpOnly
Set-Cookie: skin=; path=/
X-Powered-By: ASP.NET
Date: Thu, 03 Mar 2011 17:27:51 GMT

Why is this an issue?

Opera gets a different content-type than other browsers which could be fine if the server was sending well-formed XML. Opera is right to fail on this none well-formed content. But unfortunately because nobody cared to test that the markup was well-formed, Opera looks like if it was wrong compared to other browsers. It’s why I call wrong to be right. The consequences are terrible, because the Opera users can’t access these sites and they are penalized because we do the right thing.

What should we do?

  • We try to contact the owners of these Web sites (starbucks, spanair, phenomblue, leisurepro, mcafee, teavana, etc.)
  • We try to identify the library which is in charge of the user-agent sniffing, we have a lead for an unsupported library called MDBF
  • We could PATCH for these specific Web sites, but then we lose the benefits of pressuring the owners of fixing their sites, because they will have the impression it is working.

Really there is no perfect solution, but in the end, the Opera users do not have the freedom of choice.