◐ Shell
clean mode source ↗

Issue 21996: gettarinfo method does not handle files without text string names

It looks like if you pass a “fileobj” argument to “gettarinfo”, it assumes it can use the “name” as a text string.

>>> import tarfile
>>> with tarfile.open("/dev/null", "w") as tar, open("/bin/sh", "rb") as file: tar.gettarinfo(fileobj=file)
... 
<TarInfo 'bin/sh' at 0x7f13cc937f20>
>>> with tarfile.open("/dev/null", "w") as tar, open(b"/bin/sh", "rb") as file: tar.gettarinfo(fileobj=file)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/disk/home/proj/python/cpython/Lib/tarfile.py", line 1767, in gettarinfo
    arcname = arcname.replace(os.sep, "/")
TypeError: expected bytes, bytearray or buffer compatible object
>>> with tarfile.open("/dev/null", "w") as tar, open(0, "rb", closefd=False) as file: tar.gettarinfo(fileobj=file)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/disk/home/proj/python/cpython/Lib/tarfile.py", line 1766, in gettarinfo
    drv, arcname = os.path.splitdrive(arcname)
  File "Lib/posixpath.py", line 133, in splitdrive
    return p[:0], p
TypeError: 'int' object is not subscriptable

In my case, my code always sets the final TarInfo.name attribute later on, so the initial name does not matter. Perhaps at least the documentation should say that “fileobj.name” must be a real unencoded file name string unless “arcname” is also given. My workaround was to add a dummy arcname argument, a bit like this:

# Explicit dummy name to avoid using file name of bytes
tarinfo = self.tar.gettarinfo(fileobj=file, arcname="")
# . . .
tarinfo.name = "{}/{}".format(self.pkgname, name)