otsukare Thoughts after a day of work

Webcompat issues and the bots!

Some ideas and contexts around auto-discovering webcompat issues.

Graffiti of a robot on a wall.

Recently Brian Grinstead asked me:

Are you familiar with this?

which I answered: Yes since 2018. And I remembered the challenges and so probably it's worth to do a bit of history on identifying webcompat issues. The objectives being often:

  1. How to massively test websites and their different renderings across browsers?
  2. How to reduce human time spent on manually testing the site?
  3. Can we discover the type of issues?

I have been doing webcompat work since October 2010 (when I started working at Opera Software with the amazing Opera devrel team). There's no perfect technique, but there are a couple of things you can try.

Screenshots Comparison

We often associate webcompat issues with sites which are not looking the same in two different browsers. It's a simplistic approximation but can help in some type of webcompat issues.

With a simple URLs list and using the webdriver API, it is possible to fetch websites for Gecko, WebKit and Blink and take a screenshot for each of them. It becomes very easy to test the top 1000 websites in a specific locale. You can discriminate visually quickly the screenshots which are different.

But we said we wanted to be more effective. We can use a bit of maths for this. Let's make 𝑠¹ and 𝑠², the screenshots we want to compare, then we can use a simple library like difflib in python to compute the similarity of the images.

def diff_ratio(s1, s2):
    s = difflib.SequenceMatcher(None, s1, s2)
    return s.quick_ratio()

Then it becomes easy to define the diff_ratio which is acceptable for the series of tests we run. After fixing a threshold this will identify the sites with potential issues. It will not identify the type of issues. It will not provide a diagnosis.

And the method has some limitations which are interesting to understand if we want to be effective in pre-filtering the issues.

Some Limitations Around Screenshots Comparison

The screenshots might be different but that doesn't necessary mean there is a webcompat issue. Here some cases:

Quick summary about autowebcompat.

autowebcompat, that Brian was mentionning, is a nice project from Marco Castelluccio to attempt to auto-detect web compatibility issues. Basically the code tries to learn if screenshots for a similar set of interactions in two different browsers create the same end result. The silverlining being that if there is a difference, there's probably something to better understand. The project used the issues already reported on webcompat.com. In that sense it's already biaised by the fact that the issues have already been identified as being different. But it make possible to train a model on learning on what creates a webcompat issue.

Training A Bot To Identify Valid Issues

Recently, Ksenia (Mozilla Webcompat team) adjusted BugBug to make it work on GitHub. It helped the webcompat team to move away from the old ML classifier to the BugBug infrastructure.

It identifies already reported issues and closes the ones which have similar features than previous invalid bugs. Invalid here means not a webcompat issue. Some sites are broken in all browsers, that doesn't create a webcompat issue.

Compatipede, Another Project For Auto Webcompat

Compatipede is a project which predates autowebcompat (started in October 2013!) with the intent to identify more parameters and extend the scope of tests.

This was quite interesting as it was trying to explore the unseen issues and avoid the pitfalls of screenshots.

It had also a modular architecture providing a system of custom plugins to run probes on the payloads sent by the website.

SiteCompTester

With the same spirit than Compatipede, SiteCompTester was an extension which made possible to target some type of issues and would surface bugs associated with a specific list of known issues. This makes it easier to diagnose a website.

Template Extraction Mining

The variability of content may be avoided by using a mechanism such as templatemaker. This is a clever little tool which extracts the common features of a series of text and extract a template.

So let's say for a news website, we could imagine running template maker with one browser for a couple of days and extract its templates. And do the same in parallel with another browser. Then we would compare the templates instead of comparing two unique rendering of the websites. That would probably makes it possible to have a better understanding of certain features variability. This could be applied to markup, to JavaScript, to HTTP headers.

Webcompat Auto-Detection Caveats

The issue with auto-detection of webcompat issues is that we don't know what is broken before someone experience it in real life. The level of interactions it requires is really delicate.

And it's why the people working on triaging and diagnosis in the Mozilla webcompat team are top-notch. * Oana and Raul are triaging the issues after poor description by most users. * Ksenia, Dennis and Thomas are diagnosing relentlessly minified obfuscated code to decipher what is breaking in the current site.

Auto-Discovery Of Webcompat

The auto-discovery may work in very specific use cases when we know what we try to identify as an issue. Let's say we already identify a pattern in one bug and we want to understand to which extend this bug is affecting other websites. Then using a framework going through the sites and searching for this pattern might reveal potential webcompat issues.

Targeted surveys are the key to understand the priority of some issues.

Otsukare!