otsukare Thoughts after a day of work

Schools Of Thoughts In Web Standards

Last night, I had the pleasure of reading Daniel Stenberg's blog post about URL Standards. It led me to the discussion happening about the WHATWG URL spec about "It's not immediately clear that "URL syntax" and "URL parser" conflict". As you can expect, the debate is inflammatory on both sides, border line hypocrite at some occasions and with a lot of the arguments I have seen in the last 20 years I have followed discussions around the Web development.

This post has no intent to be the right way to talk about it. It's more a collection of impression I had when reading the thread with my baggage of ex-W3C staff, Web agency work and, ex-Opera and now-Mozilla Web Compatibility work.

"Le chat a bon dos". French expression to basically say we are in the blaming game in that thread. Maybe not that useful.

What is happening?

There are different schools for the Web specifications:

  1. Standards defining a syntax considered ideal and free for implementations to recover with their own strategy when it's broken.
  2. Standards defining how to recover for all the possible ways it is mixed up. By doing that the intent is often to recover from a previous stricter syntax, but in the end it is just defining, expanding the possibilities.
  3. Standards defining a different policy for parsing and producing with certain nuances in between. [Kind of Postel's law.].

I'm swaying in between these three schools all the time. I don't like the number 2 at all, but because of survival it is sometimes necessary. My preferred way it's 3, having a clear strict syntax for producing content, and a recovery parsing technique. And when possible I would prefer a sanitizer version of the Postel's law.

What did he say btw?

RFC 760

The implementation of a protocol must be robust. Each implementation must expect to interoperate with others created by different individuals. While the goal of this specification is to be explicit about the protocol there is the possibility of differing interpretations. In general, an implementation should be conservative in its sending behavior, and liberal in its receiving behavior. That is, it should be careful to send well-formed datagrams, but should accept any datagram that it can interpret (e.g., not object to technical errors where the meaning is still clear).

Then in RFC 1122: The 1.2.2 section, the Robustness Principle

At every layer of the protocols, there is a general rule whose application can lead to enormous benefits in robustness and interoperability [IP:1]:

"Be liberal in what you accept, and conservative in what you send"

Software should be written to deal with every conceivable error, no matter how unlikely; sooner or later a packet will come in with that particular combination of errors and attributes, and unless the software is prepared, chaos can ensue. In general, it is best to assume that the network is filled with malevolent entities that will send in packets designed to have the worst possible effect. This assumption will lead to suitable protective design, although the most serious problems in the Internet have been caused by unenvisaged mechanisms triggered by low-probability events; mere human malice would never have taken so devious a course!

Adaptability to change must be designed into all levels of Internet host software. As a simple example, consider a protocol specification that contains an enumeration of values for a particular header field -- e.g., a type field, a port number, or an error code; this enumeration must be assumed to be incomplete. Thus, if a protocol specification defines four possible error codes, the software must not break when a fifth code shows up. An undefined code might be logged (see below), but it must not cause a failure.

The second part of the principle is almost as important: software on other hosts may contain deficiencies that make it unwise to exploit legal but obscure protocol features. It is unwise to stray far from the obvious and simple, lest untoward effects result elsewhere. A corollary of this is "watch out for misbehaving hosts"; host software should be prepared, not just to survive other misbehaving hosts, but also to cooperate to limit the amount of disruption such hosts can cause to the shared communication facility.

The important point in the discussion of Postel's law is that he is talking about software behavior, not specifications. The new school of thoughts for Web standards is to create specification which are "software-driven", not "syntax-driven". And it's why you can read entrenched debates about the technology.

My sanitizer version of the Postel's law would be something along:

  1. Be liberal in what you accept
  2. Be conservative in what you send
  3. Make conservative what you accepted (aka fixing it)

Basically when you receive something broken, and there is a clear path for fixing it, do it. Normalize it. In the debated version, about accepting http://////, it would be