Google Code Issue 157: Add "escape invisible characters" option by gsnedders · Pull Request #38 · html5lib/html5lib-python
My preference would be something based on unicodedata and blacklisting General Category C* (though that has the problem that you'll end up blacklisting different sets of characters depending on the Python version and the Unicode version, and generating that set is expensive and hence likely should be precomputed at dist build-time, and likely needs to be represented as a segment tree rather than a set of millions of characters out of concern for memory consumption).
We also need to be careful on narrow Python builds and make sure we don't encode surrogate pairs, as \uD800\uDC00 needs to end up unchanged.
It's also notable that AFAICT the origin reason for this patch no longer holds true (the CSS testsuite build system is basically a historical artefact now and hasn't used an html5lib fork with this for years), though as #197 shows other people do care.