Handling of BOM (byte order mark) in source files
Feature
PEP 263 specifies that
To aid with platforms such as Windows, which add Unicode BOM marks to the beginning of Unicode files, the UTF-8 signature \xef\xbb\xbf will be interpreted as ‘utf-8’ encoding as well (even if no magic encoding comment is given).
However, RustPython currently fails on files that include a BOM.
$ printf '\xEF\xBB\xBF' > f.py $ cargo run f.py SyntaxError: Got unexpected token at line 1 column 1 ^
Python Documentation
Not familiar with CPython source code, but some pointers:
https://github.com/python/cpython/blob/3.11/Lib/tokenize.py#L299 (detect_encoding)
https://github.com/python/cpython/blob/3.11/Lib/test/test_importlib/source/test_source_encoding.py