Implement latin_1 in Rust by fanninpm · Pull Request #3046 · RustPython/RustPython
Based on the PyPy implementation. The encoding function needs some work with respect to error handling. EDIT: Now based on @coolreader18's ascii codec.
Requesting @coolreader18 as a potential reviewer, as they wrote the architecture for the encodings module and the error handling function(s).
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yea, so far this looks good! That is the correct way to add a new encoding; if that's what you were looking for. I'm not sure if unicode_encode_ucs1 is reusable for other encodings, since anything >ucs1 would be some way of splitting the ucs2/ucs4 into parts, at which point it's just a different encoding.
I'm not sure if
unicode_encode_ucs1is reusable for other encodings, since anything>ucs1would be some way of splitting the ucs2/ucs4 into parts, at which point it's just a different encoding.
unicode_encode_ucs1 is applicable to latin_1 and ascii, so that's why I put that function outside the module.
That is the correct way to add a new encoding; if that's what you were looking for.
While I was implementing this, I came across something that stymied me. In PyPy, the unicode_encode_ucs1 function references an error handler that is not yet implemented here. How would that error handler be implemented in Rust?
Originally posted this in #3118 but it makes more sense here
re: the ucs1 helper function, I think it makes more sense to abstract over utf8/ascii than ascii/latin1, since we use a utf8 str type (so we can just validate + extend_from_slice for utf8/ascii) whereas CPython has UCS1 as a buffer (so they can just validate + memcpy for ascii/latin1). Finishing up #3118 made me remember why I did that; in our implementation of codecs, ascii and utf8 just naturally have a lot of shared code while the alternative is true in CPython.
fanninpm
marked this pull request as ready for review
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me. @coolreader18 could you review this again?