◐ Shell
clean mode source ↗

Implement latin_1 in Rust by fanninpm · Pull Request #3046 · RustPython/RustPython

@fanninpm

Based on the PyPy implementation. The encoding function needs some work with respect to error handling. EDIT: Now based on @coolreader18's ascii codec.

Requesting @coolreader18 as a potential reviewer, as they wrote the architecture for the encodings module and the error handling function(s).

coolreader18

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yea, so far this looks good! That is the correct way to add a new encoding; if that's what you were looking for. I'm not sure if unicode_encode_ucs1 is reusable for other encodings, since anything >ucs1 would be some way of splitting the ucs2/ucs4 into parts, at which point it's just a different encoding.

@fanninpm

I'm not sure if unicode_encode_ucs1 is reusable for other encodings, since anything >ucs1 would be some way of splitting the ucs2/ucs4 into parts, at which point it's just a different encoding.

unicode_encode_ucs1 is applicable to latin_1 and ascii, so that's why I put that function outside the module.

That is the correct way to add a new encoding; if that's what you were looking for.

While I was implementing this, I came across something that stymied me. In PyPy, the unicode_encode_ucs1 function references an error handler that is not yet implemented here. How would that error handler be implemented in Rust?

@coolreader18

Originally posted this in #3118 but it makes more sense here

re: the ucs1 helper function, I think it makes more sense to abstract over utf8/ascii than ascii/latin1, since we use a utf8 str type (so we can just validate + extend_from_slice for utf8/ascii) whereas CPython has UCS1 as a buffer (so they can just validate + memcpy for ascii/latin1). Finishing up #3118 made me remember why I did that; in our implementation of codecs, ascii and utf8 just naturally have a lot of shared code while the alternative is true in CPython.

This implementation is patterned off of the ascii codec.

@fanninpm fanninpm marked this pull request as ready for review

September 24, 2021 01:36

coolreader18

youknowone

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. @coolreader18 could you review this again?