Monitoring Web page differences on a long term with screenshots

Hallvord has created a very nice system for testing regressions automatically. The results are displayed on Are We Compatible Yet?. I had explained in a blog post how to add or fix tests there.

I was reading yesterday another type of large scale testing for link rot on the UK Web. They share some of the challenges we meet in terms of guessing if the Web page has changed or not meaningfully. It's a very interesting read. They try at a point to determine if something is similar or dissimilar. Here what they say.

The easy case is when the content is exactly the same – we can just record that the resources are identical at the binary level. If not, we extract whatever text we can from the archived and live URLs, and compare them to see how much the text has changed. To do this, we compute a fingerprint from the text contained in each resource, and then compare those to determine how similar the resources are. This technique has been used for many years in computer forensics applications, such as helping to identify ‘bad’ software, and here we adapt the approach in order to find similar web pages.

Specifically, we generate ssdeep ‘fuzzy hash’ fingerprints, and compare them in order to determine the degree of overlap in the textual content of the items. If the algorithm is able to find any similarity at all, we record the result as ‘SIMILAR’. Otherwise, we record that the items are ‘DISSIMILAR’.

I remember in the past for a talk I had given about quality on how to use selenium to take screenshots at different stages of the development and check if the rendering was the same. One way to evaluate the differences is to create a comparison of the images by first taking a screenshot

from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://www.mozilla.org/')
browser.save_screenshot('moz-home.png')
browser.close()

And then, if you get multiple screenshots of the same page, to compare two images. Such as the reference image and the new screenshot:

def diff_ratio(screen_ref, screen_new):
    s = difflib.SequenceMatcher(None, s1, s2)
    return s.quick_ratio()

SequenceMatcher is a tool for comparing pairs of sequences of any type and quick_ratio return an upper bound on .ratio() relatively quickly, which is a measure of the sequences' similarity (float in [0,1]). Just to give an example of the type of results.

import difflib
a = 'Life is a long slow river.'
b = 'Life is among slow rivers.'
s = difflib.SequenceMatcher(None, a, b)
s.quick_ratio()
# returns 0.9230769230769231
s2 = difflib.SequenceMatcher(None, a, a)
s2.quick_ratio()
# returns 1.0

So if the images are quite similar it will return a number close to 1.

And For Web Compatibility Issues?

Most of our use cases are Web sites not sending the right version of the Web site, such as desktop content instead of the mobile version. So I was wondering if we could have a very quick check which would involve less human checking during our surveys of sites for certain countries.

One possible challenge (among maybe many) is Website relying heavily on advertisements and sending different images. In this case, the site would be different even if sending the same version.

Another one is press Web sites, changing the content of the home page quite often.

Maybe it's worth testing. Maybe we would get an acceptable ratio.

Addendum

I had forgotten but Hallvord did that already for quickly selecting the screenshots which were worth testing. He added:

I suppose we could explore stuff like hashing all the CSS code included in a page and compare hashes to find different styling. Although my next project is going to be using Compatipede 2, finding all the elements in a DOM that are styled with -webkit- properties or values, then generate XPath identifiers or JS to locate the same element and see if it has equivalent styles when the page loads in a different rendering engine.

Otsukare.