otsukare Thoughts after a day of work

curl and User-Agent

Anthony (Mozilla) asked if we could make a curl request with an empty User-Agent HTTP header. Interesting!

So we tried to come up with solutions. First of all, a simple request. will the common -I, we only receive the answer from the server but not what has been sent by the client.

 curl -I http://www.opera.com/

HTTP/1.1 200 OK
Date: Wed, 11 Apr 2012 17:15:30 GMT
Server: Apache
Content-Type: text/html; charset=utf-8
Set-Cookie: language=none; path=/; domain=www.opera.com; expires=Thu, 12-Jan-2012 17:15:30 GMT
Set-Cookie: language=en; path=/; domain=.opera.com; expires=Sat, 09-Apr-2022 17:15:30 GMT
Vary: Accept-Encoding

If we want access to the HTTP headers sent by the client, we have two choices the trace-ascii and the verbose mode. Let's see

 curl -I --trace-ascii - http://www.opera.com/

== Info: About to connect() to www.opera.com port 80 (#0)
== Info:   Trying 195.189.143.147... == Info: connected
== Info: Connected to www.opera.com (195.189.143.147) port 80 (#0)
=> Send header, 148 bytes (0x94)
0000: HEAD / HTTP/1.1
0011: User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.2
0051: 1.4 OpenSSL/0.9.8r zlib/1.2.5
0070: Host: www.opera.com
0085: Accept: */*
0092: 
<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
HTTP/1.1 200 OK
<= Recv header, 37 bytes (0x25)
0000: Date: Wed, 11 Apr 2012 17:23:45 GMT
Date: Wed, 11 Apr 2012 17:23:45 GMT
<= Recv header, 16 bytes (0x10)
0000: Server: Apache
Server: Apache
<= Recv header, 40 bytes (0x28)
0000: Content-Type: text/html; charset=utf-8
Content-Type: text/html; charset=utf-8
<= Recv header, 96 bytes (0x60)
0000: Set-Cookie: language=none; path=/; domain=www.opera.com; expires
0040: =Thu, 12-Jan-2012 17:23:45 GMT
Set-Cookie: language=none; path=/; domain=www.opera.com; expires=Thu, 12-Jan-2012 17:23:45 GMT
<= Recv header, 91 bytes (0x5b)
0000: Set-Cookie: language=en; path=/; domain=.opera.com; expires=Sat,
0040:  09-Apr-2022 17:23:45 GMT
Set-Cookie: language=en; path=/; domain=.opera.com; expires=Sat, 09-Apr-2022 17:23:45 GMT
<= Recv header, 23 bytes (0x17)
0000: Vary: Accept-Encoding
Vary: Accept-Encoding

<= Recv header, 2 bytes (0x2)
0000: 
== Info: Connection #0 to host www.opera.com left intact
== Info: Closing connection #0

A trick for the trace is to use "-" after --trace-ascii, this will display the result in the terminal. Let's try with verbose.

 curl -I -v http://www.opera.com/

* About to connect() to www.opera.com port 80 (#0)
*   Trying 195.189.143.147... connected
* Connected to www.opera.com (195.189.143.147) port 80 (#0)
> HEAD / HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: www.opera.com
> Accept: */*
> 
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Date: Wed, 11 Apr 2012 17:28:05 GMT
Date: Wed, 11 Apr 2012 17:28:05 GMT
< Server: Apache
Server: Apache
< Content-Type: text/html; charset=utf-8
Content-Type: text/html; charset=utf-8
< Set-Cookie: language=none; path=/; domain=www.opera.com; expires=Thu, 12-Jan-2012 17:28:05 GMT
Set-Cookie: language=none; path=/; domain=www.opera.com; expires=Thu, 12-Jan-2012 17:28:05 GMT
< Set-Cookie: language=en; path=/; domain=.opera.com; expires=Sat, 09-Apr-2022 17:28:05 GMT
Set-Cookie: language=en; path=/; domain=.opera.com; expires=Sat, 09-Apr-2022 17:28:05 GMT
< Vary: Accept-Encoding
Vary: Accept-Encoding

< 
* Connection #0 to host www.opera.com left intact
* Closing connection #0

Both displays the information, we need, but maybe a tad too much. We just want to see the effect on the user agent.

→ curl -sI --trace-ascii - http://www.opera.com/ | grep "User-Agent:"

0011: User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.2

Ok let's go and try to modify it now. There is a command to specify a specific user agent with curl: -A "User-Agent-String". Does it work?

→ curl -sI -A "Poney and Rainbow" --trace-ascii - http://www.opera.com/ | grep "User-Agent:"

0011: User-Agent: Poney and Rainbow

OK. Anthony was asking how to remove it. Hmm… Let's try an empty string.

→ curl -sI -A "" --trace-ascii - http://www.opera.com/ | grep "User-Agent:"

Working! Excellent. But what if… you want to send the User-Agent: header with an empty string.

→ curl -sI -H "User-Agent:" --trace-ascii - http://www.opera.com/ | grep "User-Agent:"

This is not working it doesn't send the header at all. Let's try to send a space character instead.

 curl -sI -A " " --trace-ascii - http://www.opera.com/ | grep "User-Agent:" | cat -e
0011: User-Agent:  $

This is (kind of) working, but it sends exactly User-Agent: followed by two spaces. The cat -e is here for adding a "\$" at the end of the string, so it is easier to see the placeholder for spaces. It should not be significant at least for servers implementing the HTTP specification correctly.

A field value MAY be preceded by optional whitespace (OWS); a single\ SP is preferred. The field value does not include any leading or\ trailing white space: OWS occurring before the first non-whitespace\ octet of the field value or after the last non-whitespace octet of\ the field value is ignored and SHOULD be removed before further\ processing (as this does not change the meaning of the header field).\

Working together!

And in case you are wondering I took care of mentioning Mozilla for Anthony's affiliation. It is because many people think there is a browsers war when I see collaborative work and love.

Update

Mike Taylor noted that since version 7.23.0. It is possible to send empty headers with curl. My own version is "curl 7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5". To send an empty header, it requires to finish the header with a semi-colon only.

curl -sI -H "User-Agent;" --trace-ascii - http://www.opera.com/ | grep "User-Agent:"

will return first the current User-Agent of curl then followed by an empty user agent. So I guess it now depends on the server processing only the last one. \