◐ Shell
clean mode source ↗

Handling of BOM (byte order mark) in source files

Feature

PEP 263 specifies that

To aid with platforms such as Windows, which add Unicode BOM marks to the beginning of Unicode files, the UTF-8 signature \xef\xbb\xbf will be interpreted as ‘utf-8’ encoding as well (even if no magic encoding comment is given).

However, RustPython currently fails on files that include a BOM.

$ printf '\xEF\xBB\xBF' > f.py
$ cargo run f.py
SyntaxError: Got unexpected token  at line 1 column 1

^

Python Documentation

Not familiar with CPython source code, but some pointers:

https://github.com/python/cpython/blob/3.11/Lib/tokenize.py#L299 (detect_encoding)
https://github.com/python/cpython/blob/3.11/Lib/test/test_importlib/source/test_source_encoding.py