otsukare Thoughts after a day of work

http on the command line for User-Agent sniffing

So we have this list of Web sites in Spain which are not completely working properly with Firefox OS. We need

  1. to analyze the Web sites issue
  2. to find the contact information
  3. to contact them

You can help!

Extracting the list from bugzilla

This part is easy, on bugzilla there is a link at the bottom of the search which proposes to extract the bugs as a CSV list. Let's use httpie.

http -b GET 'https://bugzilla.mozilla.org/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&field0-0-0=short_desc&field0-0-1=status_whiteboard&list_id=7530168&query_format=advanced&type0-0-0=substring&type0-0-1=substring&value0-0-0=country-es%5D&value0-0-1=country-es%5D&ctype=csv&human=1' > /tmp/urilist.csv

Let's look at the content of the file.

cat /tmp/urilist.txt 
"Bug ID","Product","Component","Assignee","Status","Resolution","Summary","Changed"
827678,"Tech Evangelism","Mobile","nobody","NEW","---","marca.com doesn't recognize B2G UA as mobile","2013-07-31 12:02:28"
828383,"Tech Evangelism","Mobile","nobody","NEW","---","as.com doesn't recognize B2G UA as mobile","2013-07-12 07:54:34"
828386,"Tech Evangelism","Mobile","nobody","NEW","---","ebay.es doesn't recognize B2G UA as mobile","2013-07-18 07:04:43"
[]

We do not want the first line which is starting with a " character.

cat /tmp/urilist.txt | grep -v '^"'
827678,"Tech Evangelism","Mobile","nobody","NEW","---","marca.com doesn't recognize B2G UA as mobile","2013-07-31 12:02:28"
828383,"Tech Evangelism","Mobile","nobody","NEW","---","as.com doesn't recognize B2G UA as mobile","2013-07-12 07:54:34"
828386,"Tech Evangelism","Mobile","nobody","NEW","---","ebay.es doesn't recognize B2G UA as mobile","2013-07-18 07:04:43"
[]

We want to extract the 7th column of data where there is the summary. The separator is a comma character.

 cat /tmp/urilist.txt | grep -v '^"' | awk -F ',' '{print $7}'
"marca.com doesn't recognize B2G UA as mobile"
"as.com doesn't recognize B2G UA as mobile"
"ebay.es doesn't recognize B2G UA as mobile"
[]

We want to extract the domain name of the summary and create a URI which can be requested

cat /tmp/urilist.txt | grep -v '^"' | awk -F ',' '{print $7}'  | sed -e 's,"\(.*\) doesn.*,http://\1/,' 
http://marca.com/
http://as.com/
http://ebay.es/
[]

We can now take the list and send it to httpie for introspection.

cat /tmp/urilist.txt | grep -v '^"' | awk -F ',' '{print $7}'  | sed -e 's,"\(.*\) doesn.*,http://\1/,' | xargs -J domain -n1 http --print hH GET domain User-Agent:'Mozilla/5.0 (Mobile; rv:18.0) Gecko/18.0 Firefox/18.0'
GET / HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate, compress
Host: marca.com
User-Agent: Mozilla/5.0 (Mobile; rv:18.0) Gecko/18.0 Firefox/18.0

HTTP/1.1 301 Moved Permanently
Connection: keep-alive
Content-Length: 184
Content-Type: text/html
Date: Thu, 08 Aug 2013 13:15:52 GMT
Location: http://www.marca.com/
Server: nginx/1.2.7

[]

That's it! We are getting the list of answers.

OK a very last one. If we want to have just the quick summary of sites.

cat /tmp/urilist.txt | grep -v '^"' | awk -F ',' '{print $7}'  | sed -e 's,"\(.*\) doesn.*,http://\1/,' | xargs -J domain -n1 http --print hH --pretty=none GET domain User-Agent:'Mozilla/5.0 (Mobile; rv:18.0) Gecko/18.0 Firefox/18.0' | grep -i "^\(host\|http\|location\
)"
Host: marca.com
HTTP/1.1 301 Moved Permanently
Location: http://www.marca.com/
Host: as.com
HTTP/1.1 200 OK
Host: ebay.es
HTTP/1.1 301 Moved Permanently
Location: http://www.ebay.es
Host: infojobs.net
HTTP/1.1 301 Moved Permanently
Location: http://www.infojobs.net/
Host: elconfidencial.com
HTTP/1.1 301 Moved Permanently
Location: http://www.elconfidencial.com/
Host: antena3.com
HTTP/1.1 301 Moved Permanently
Location: http://www.antena3.com
Host: ingdirect.es
HTTP/1.1 301 Moved Permanently
Location: http://www.ingdirect.es/
Host: fotocasa.es
HTTP/1.1 200 OK
Host: orange.es
HTTP/1.1 301 Moved Permanently
Location: http://www.orange.es/
Host: paginasamarillas.es
HTTP/1.1 301 Moved Permanently
Location: http://www.paginasamarillas.es/
Host: loteriasyapuestas.es
HTTP/1.1 302 Found
Location: http://loteriasyapuestas.es/es
Host: bbva.es
HTTP/1.1 200 OK
Host: publico.es
HTTP/1.1 301 Moved Permanently
Location: http://www.publico.es//
Host: rincondelvago.com
HTTP/1.1 301 Moved Permanently
Location: http://www.rincondelvago.com/
Host: enfemenino.com
HTTP/1.1 301 Moved Permanently
Location: http://www.enfemenino.com/
Host: movil.bankinter.es
HTTP/1.1 302 Moved Temporarily
Location: https://movil.bankinter.es/
Host: wwwhatsnew.com
HTTP/1.1 200 OK

http: error: Request timed out (30s).
Host: comunio.es
HTTP/1.1 302 Found
Location: http://www.comunio.es/
Host: softonic.com
HTTP/1.1 301 Moved Permanently
Location: http://www.softonic.com/

We just grep the lines with the status code, the host information and the location returned if any. So in that way we know if the browser is being redirected based on the user agent string. Let's compare with Firefox for Android User-Agent string 'Mozilla/5.0 (Android; Mobile; rv:18.0) Gecko/18.0 Firefox/18.0'. The only difference is the keyword Android.

cat /tmp/urilist.txt | grep -v '^"' | awk -F ',' '{print $7}'  | sed -e 's,"\(.*\) doesn.*,http://\1/,' | xargs -J domain -n1 http --print hH --pretty=none GET domain User-Agent:'Mozilla/5.0 (Android; Mobile; rv:18.0) Gecko/18.0 Firefox/18.0' | grep -i "^\(host\|http\|
location\)"
Host: marca.com
HTTP/1.1 301 Moved Permanently
Location: http://www.marca.com/
Host: as.com
HTTP/1.1 200 OK
Host: ebay.es
HTTP/1.1 301 Moved Permanently
Location: http://www.ebay.es
Host: infojobs.net
HTTP/1.1 301 Moved Permanently
Location: http://www.infojobs.net/
Host: elconfidencial.com
HTTP/1.1 301 Moved Permanently
Location: http://www.elconfidencial.com/
Host: antena3.com
HTTP/1.1 301 Moved Permanently
Location: http://www.antena3.com
Host: ingdirect.es
HTTP/1.1 301 Moved Permanently
Location: http://www.ingdirect.es/
Host: fotocasa.es
HTTP/1.1 302 Found
Location: http://m.fotocasa.es/
Host: orange.es
HTTP/1.1 301 Moved Permanently
Location: http://www.orange.es/
Host: paginasamarillas.es
HTTP/1.0 302 Found
Location: http://m.paginasamarillas.es
Host: loteriasyapuestas.es
HTTP/1.1 302 Found
Location: http://m.loteriasyapuestas.es
Host: bbva.es
HTTP/1.1 200 OK
Host: publico.es
HTTP/1.1 301 Moved Permanently
Location: http://m.publico.es
Host: rincondelvago.com
HTTP/1.1 301 Moved Permanently
Location: http://www.rincondelvago.com/
Host: enfemenino.com
HTTP/1.1 301 Moved Permanently
Location: http://m.enfemenino.com/
Host: movil.bankinter.es
HTTP/1.1 302 Moved Temporarily
Location: https://movil.bankinter.es/
Host: wwwhatsnew.com
HTTP/1.1 200 OK

http: error: Request timed out (30s).
Host: comunio.es
HTTP/1.1 302 Found
Location: http://www.comunio.es/
Host: softonic.com
HTTP/1.1 301 Moved Permanently
Location: http://www.softonic.com/

Results

As a first approximation, we can detect that when the word Android is in the string. fotocasa.es, paginasamarillas.es, loteriasyapuestas.es, publico.es, enfemino.com redirect to a mobile site which they don't for Firefox OS. They need to be contacted.

Otsukare!