Issue 4387: binascii b2a functions accept strings (unicode) as data
Created on 2008-11-22 00:41 by terry.reedy, last changed 2022-04-11 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| reqbytes.diff | loewis, 2008-11-30 13:48 | |||
| Messages (7) | |||
|---|---|---|---|
| msg76226 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2008-11-22 00:41 | |
Binascii b2a_xxx functions accept 'binary data' and return ascii-encoded bytes. The corresponding a2b_xxx functions turn the ascii-encoded bytes back to 'binary data' (bytes). If the binary data is bytes, these should be inverses of each other. Somewhat surprisingly to me (because the corresponding base64 module functions raise "TypeError: expected bytes, not str") 3.0 strings (unicode) are accepted as 'binary data', though they will not 'round-trip'. Ascii chars almost do >>> a='aaaa' >>> c=b.b2a_base64(a) >>> c b'YWFhYQ==\n' >>> d=b.a2b_base64(c) >>> d b'aaaa' But general unicode chars generate nonsense. >>> a='\u1000' >>> c=b.b2a_base64(a) >>> c b'4YCA\n' >>> d=b.a2b_base64(c) >>> d b'\xe1\x80\x80' I also tried b2a_uu. Is this a bug? |
|||
| msg76233 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2008-11-22 08:16 | |
I vote yes. |
|||
| msg76628 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2008-11-29 22:48 | |
It's not /exactly/ nonsense, it seems to assume an utf8 encoding pass is
necessary:
>>> b'\xe1\x80\x80'.decode('utf8') == '\u1000'
True
IMO, while accepting unicode strings instead of bytes for the a2b_xx
functions is understandable (because in practice only ASCII characters
are allowed), it is not acceptable for b2a_xx functions to accept
unicode strings instead of bytes.
In other words, it might/should be ok for
`binascii.a2b_base64('YWFh\n')` to return the same as
`binascii.a2b_base64('YWFh\n')` (that is, b'aaa'), but
`binascii.b2a_base64('aaa')` should raise a TypeError rather than
applying an utf8 encoding pass before doing the actual b2a encoding.
I think this must be fixed before 3.0 final, and is therefore a release
blocker.
|
|||
| msg76629 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2008-11-29 22:49 | |
Hmm, I obviously meant:
[...] In other words, it might/should be ok for
`binascii.a2b_base64('YWFh\n')` to return the same as
`binascii.a2b_base64(b'YWFh\n')` (that is, b'aaa') [...]
|
|||
| msg76639 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2008-11-30 13:48 | |
Here is a patches that fixes the problem. Notice that this affects the email API; base64mime.body_encode now also requires bytes (whereas quoprimime remains unchanged). There are probably more functions that still incorrectly accept strings, e.g. zlib.crc32. |
|||
| msg76662 - (view) | Author: Barry A. Warsaw (barry) * ![]() |
Date: 2008-11-30 20:14 | |
Martin, the patch looks okay to me. I vote for applying it. |
|||
| msg76724 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2008-12-02 06:00 | |
Committed as r67472. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:56:41 | admin | set | github: 48637 |
| 2008-12-02 06:00:34 | loewis | set | status: open -> closed messages: + msg76724 |
| 2008-11-30 20:14:38 | barry | set | nosy:
+ barry resolution: accepted messages: + msg76662 |
| 2008-11-30 13:58:56 | loewis | set | keywords: + needs review |
| 2008-11-30 13:48:30 | loewis | set | files:
+ reqbytes.diff nosy: + loewis messages: + msg76639 keywords: + patch |
| 2008-11-29 22:49:42 | pitrou | set | messages: + msg76629 |
| 2008-11-29 22:48:03 | pitrou | set | priority: release blocker nosy: + pitrou messages: + msg76628 |
| 2008-11-22 08:16:40 | georg.brandl | set | nosy:
+ georg.brandl messages: + msg76233 |
| 2008-11-22 00:41:18 | terry.reedy | create | |
