So we have this list of Web sites in Spain which are not completely working properly with Firefox OS. We need
- to analyze the Web sites issue
- to find the contact information
- to contact them
Extracting the list from bugzilla
This part is easy, on bugzilla there is a link at the bottom of the search which proposes to extract the bugs as a CSV list. Let's use httpie.
http -b GET 'https://bugzilla.mozilla.org/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&field0-0-0=short_desc&field0-0-1=status_whiteboard&list_id=7530168&query_format=advanced&type0-0-0=substring&type0-0-1=substring&value0-0-0=country-es%5D&value0-0-1=country-es%5D&ctype=csv&human=1' > /tmp/urilist.csv
- We could pipe to the next command. We do not expect the list to change every moment, so it is better to save it in a file (here
/tmp/urilist.csv
) to not hammer the server with many similar requests. - the
-b
option onhttp
is telling http to only send the body of the HTTP response.
Let's look at the content of the file.
cat /tmp/urilist.txt
"Bug ID","Product","Component","Assignee","Status","Resolution","Summary","Changed"
827678,"Tech Evangelism","Mobile","nobody","NEW","---","marca.com doesn't recognize B2G UA as mobile","2013-07-31 12:02:28"
828383,"Tech Evangelism","Mobile","nobody","NEW","---","as.com doesn't recognize B2G UA as mobile","2013-07-12 07:54:34"
828386,"Tech Evangelism","Mobile","nobody","NEW","---","ebay.es doesn't recognize B2G UA as mobile","2013-07-18 07:04:43"
[…]
We do not want the first line which is starting with a "
character.
cat /tmp/urilist.txt | grep -v '^"'
827678,"Tech Evangelism","Mobile","nobody","NEW","---","marca.com doesn't recognize B2G UA as mobile","2013-07-31 12:02:28"
828383,"Tech Evangelism","Mobile","nobody","NEW","---","as.com doesn't recognize B2G UA as mobile","2013-07-12 07:54:34"
828386,"Tech Evangelism","Mobile","nobody","NEW","---","ebay.es doesn't recognize B2G UA as mobile","2013-07-18 07:04:43"
[…]
We want to extract the 7th column of data where there is the summary. The separator is a comma character.
→ cat /tmp/urilist.txt | grep -v '^"' | awk -F ',' '{print $7}'
"marca.com doesn't recognize B2G UA as mobile"
"as.com doesn't recognize B2G UA as mobile"
"ebay.es doesn't recognize B2G UA as mobile"
[…]
-F ','
is the option to specify the separator. The one by default being a space.- Note: This is not supposed to be the best way. In our search query previously we should have requested the URL field of bugzilla, but the data are not yet very reliable on the bugzilla. I will use this first pass to change that.
We want to extract the domain name of the summary and create a URI which can be requested
cat /tmp/urilist.txt | grep -v '^"' | awk -F ',' '{print $7}' | sed -e 's,"\(.*\) doesn.*,http://\1/,'
http://marca.com/
http://as.com/
http://ebay.es/
[…]
- The pattern matching is simple.
- We use comma for the separator in the sed command instead of the usual slash
s///
. You can use any characters. It's basically a good idea to use a character that you are not likely to use in the strings. URIs contain a lot of slahes.
We can now take the list and send it to httpie for introspection.
cat /tmp/urilist.txt | grep -v '^"' | awk -F ',' '{print $7}' | sed -e 's,"\(.*\) doesn.*,http://\1/,' | xargs -J domain -n1 http --print hH GET domain User-Agent:'Mozilla/5.0 (Mobile; rv:18.0) Gecko/18.0 Firefox/18.0'
GET / HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate, compress
Host: marca.com
User-Agent: Mozilla/5.0 (Mobile; rv:18.0) Gecko/18.0 Firefox/18.0
HTTP/1.1 301 Moved Permanently
Connection: keep-alive
Content-Length: 184
Content-Type: text/html
Date: Thu, 08 Aug 2013 13:15:52 GMT
Location: http://www.marca.com/
Server: nginx/1.2.7
[…]
That's it! We are getting the list of answers.
- xargs is sending arguments one by one to http
-n1
- Usually xargs will send the result from the previous pipe and add it at the end, except if you create a placeholder string
-J domain
. We used domain. - then the httpie command. We are using
http --print hH
with a GET which is basically makes a GET request but displays only the HTTP headers. The reason is that many sites unfortunately have implemented bogus HTTP HEAD. - we use the
domain
placeholder - and finally put the User-Agent string we want to test again. By changing the user agent string we can determine if there are differences.
'Mozilla/5.0 (Mobile; rv:18.0) Gecko/18.0 Firefox/18.0'
It is a first approximation. Nothing beats a test on the device itself.
OK a very last one. If we want to have just the quick summary of sites.
cat /tmp/urilist.txt | grep -v '^"' | awk -F ',' '{print $7}' | sed -e 's,"\(.*\) doesn.*,http://\1/,' | xargs -J domain -n1 http --print hH --pretty=none GET domain User-Agent:'Mozilla/5.0 (Mobile; rv:18.0) Gecko/18.0 Firefox/18.0' | grep -i "^\(host\|http\|location\
)"
Host: marca.com
HTTP/1.1 301 Moved Permanently
Location: http://www.marca.com/
Host: as.com
HTTP/1.1 200 OK
Host: ebay.es
HTTP/1.1 301 Moved Permanently
Location: http://www.ebay.es
Host: infojobs.net
HTTP/1.1 301 Moved Permanently
Location: http://www.infojobs.net/
Host: elconfidencial.com
HTTP/1.1 301 Moved Permanently
Location: http://www.elconfidencial.com/
Host: antena3.com
HTTP/1.1 301 Moved Permanently
Location: http://www.antena3.com
Host: ingdirect.es
HTTP/1.1 301 Moved Permanently
Location: http://www.ingdirect.es/
Host: fotocasa.es
HTTP/1.1 200 OK
Host: orange.es
HTTP/1.1 301 Moved Permanently
Location: http://www.orange.es/
Host: paginasamarillas.es
HTTP/1.1 301 Moved Permanently
Location: http://www.paginasamarillas.es/
Host: loteriasyapuestas.es
HTTP/1.1 302 Found
Location: http://loteriasyapuestas.es/es
Host: bbva.es
HTTP/1.1 200 OK
Host: publico.es
HTTP/1.1 301 Moved Permanently
Location: http://www.publico.es//
Host: rincondelvago.com
HTTP/1.1 301 Moved Permanently
Location: http://www.rincondelvago.com/
Host: enfemenino.com
HTTP/1.1 301 Moved Permanently
Location: http://www.enfemenino.com/
Host: movil.bankinter.es
HTTP/1.1 302 Moved Temporarily
Location: https://movil.bankinter.es/
Host: wwwhatsnew.com
HTTP/1.1 200 OK
http: error: Request timed out (30s).
Host: comunio.es
HTTP/1.1 302 Found
Location: http://www.comunio.es/
Host: softonic.com
HTTP/1.1 301 Moved Permanently
Location: http://www.softonic.com/
We just grep the lines with the status code, the host information and the location returned if any. So in that way we know if the browser is being redirected based on the user agent string. Let's compare with Firefox for Android User-Agent string 'Mozilla/5.0 (Android; Mobile; rv:18.0) Gecko/18.0 Firefox/18.0'
. The only difference is the keyword Android
.
cat /tmp/urilist.txt | grep -v '^"' | awk -F ',' '{print $7}' | sed -e 's,"\(.*\) doesn.*,http://\1/,' | xargs -J domain -n1 http --print hH --pretty=none GET domain User-Agent:'Mozilla/5.0 (Android; Mobile; rv:18.0) Gecko/18.0 Firefox/18.0' | grep -i "^\(host\|http\|
location\)"
Host: marca.com
HTTP/1.1 301 Moved Permanently
Location: http://www.marca.com/
Host: as.com
HTTP/1.1 200 OK
Host: ebay.es
HTTP/1.1 301 Moved Permanently
Location: http://www.ebay.es
Host: infojobs.net
HTTP/1.1 301 Moved Permanently
Location: http://www.infojobs.net/
Host: elconfidencial.com
HTTP/1.1 301 Moved Permanently
Location: http://www.elconfidencial.com/
Host: antena3.com
HTTP/1.1 301 Moved Permanently
Location: http://www.antena3.com
Host: ingdirect.es
HTTP/1.1 301 Moved Permanently
Location: http://www.ingdirect.es/
Host: fotocasa.es
HTTP/1.1 302 Found
Location: http://m.fotocasa.es/
Host: orange.es
HTTP/1.1 301 Moved Permanently
Location: http://www.orange.es/
Host: paginasamarillas.es
HTTP/1.0 302 Found
Location: http://m.paginasamarillas.es
Host: loteriasyapuestas.es
HTTP/1.1 302 Found
Location: http://m.loteriasyapuestas.es
Host: bbva.es
HTTP/1.1 200 OK
Host: publico.es
HTTP/1.1 301 Moved Permanently
Location: http://m.publico.es
Host: rincondelvago.com
HTTP/1.1 301 Moved Permanently
Location: http://www.rincondelvago.com/
Host: enfemenino.com
HTTP/1.1 301 Moved Permanently
Location: http://m.enfemenino.com/
Host: movil.bankinter.es
HTTP/1.1 302 Moved Temporarily
Location: https://movil.bankinter.es/
Host: wwwhatsnew.com
HTTP/1.1 200 OK
http: error: Request timed out (30s).
Host: comunio.es
HTTP/1.1 302 Found
Location: http://www.comunio.es/
Host: softonic.com
HTTP/1.1 301 Moved Permanently
Location: http://www.softonic.com/
Results
As a first approximation, we can detect that when the word Android is in the string. fotocasa.es
, paginasamarillas.es
, loteriasyapuestas.es
, publico.es
, enfemino.com
redirect to a mobile site which they don't for Firefox OS. They need to be contacted.
Otsukare!