Issue 44116: The _csv module can't be garbage-collected after _csv.register_dialect is called
Issue44116
Created on 2021-05-12 13:34 by petr.viktorin, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| gc.diff | erlendaasland, 2021-05-12 14:51 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 26068 | closed | petr.viktorin, 2021-05-12 13:59 | |
| PR 26074 | merged | erlendaasland, 2021-05-12 15:51 | |
| PR 26081 | merged | miss-islington, 2021-05-12 18:19 | |
| Messages (16) | |||
|---|---|---|---|
| msg393508 - (view) | Author: Petr Viktorin (petr.viktorin) * ![]() |
Date: 2021-05-12 13:34 | |
After `_csv.register_dialect` is called, the csv module is alive even after it's removed from sys.modules. It should be garbage-collected.
(It's not that big a deal: unloading _csv isn't something users should do. But it might be hiding a deeper issue.)
The following reproducer (for a debug build of Python) shows an increasing number of refcounts. (Importing `csv` is the easiest way to call _csv._register_dialect with a proper argument):
import sys
import gc
for i in range(10):
import csv
del sys.modules['_csv']
del sys.modules['csv']
del csv
gc.collect()
print(sys.gettotalrefcount())
|
|||
| msg393512 - (view) | Author: Petr Viktorin (petr.viktorin) * ![]() |
Date: 2021-05-12 13:58 | |
Hm, a similar thing apparently happens with urllib.request. |
|||
| msg393518 - (view) | Author: Erlend E. Aasland (erlendaasland) * ![]() |
Date: 2021-05-12 14:42 | |
Could it be that the _csv heap types are not garbage collected? Ref. bpo-42972. |
|||
| msg393520 - (view) | Author: Erlend E. Aasland (erlendaasland) * ![]() |
Date: 2021-05-12 14:50 | |
Adding GC to _csv types:
$ cat
import sys
import gc
for i in range(10):
import csv
del sys.modules['_csv']
del sys.modules['csv']
del csv
gc.collect()
print(sys.gettotalrefcount())
$ ./python.exe bug.py
73164
73164
73166
73166
73166
73166
73166
73166
73166
73166
|
|||
| msg393521 - (view) | Author: Erlend E. Aasland (erlendaasland) * ![]() |
Date: 2021-05-12 14:52 | |
I created a quick-and-dirty patch. I can clean it up and make it into a PR if you want. |
|||
| msg393522 - (view) | Author: Petr Viktorin (petr.viktorin) * ![]() |
Date: 2021-05-12 15:07 | |
*facepalm* Yes, that's it. If you have the time now, could you send he PR? |
|||
| msg393523 - (view) | Author: Erlend E. Aasland (erlendaasland) * ![]() |
Date: 2021-05-12 15:10 | |
Sure, I’ll do it after dinner :) |
|||
| msg393525 - (view) | Author: Petr Viktorin (petr.viktorin) * ![]() |
Date: 2021-05-12 15:46 | |
The urllib.request one was caused by _hashlib, see GH-26072. |
|||
| msg393526 - (view) | Author: Erlend E. Aasland (erlendaasland) * ![]() |
Date: 2021-05-12 15:56 | |
I don't understand this. After applying PR-26074, test_csv now leaks memory/refs: $ ./python.exe -m test -R : test_csv 0:00:00 load avg: 1.18 Run tests sequentially 0:00:00 load avg: 1.18 [1/1] test_csv beginning 9 repetitions 123456789 ......... test_csv leaked [3928, 3924, 3924, 3924] references, sum=15700 test_csv leaked [1666, 1664, 1664, 1664] memory blocks, sum=6658 test_csv failed == Tests result: FAILURE == 1 test failed: test_csv Total duration: 4.9 sec Tests result: FAILURE Any ideas? |
|||
| msg393527 - (view) | Author: Erlend E. Aasland (erlendaasland) * ![]() |
Date: 2021-05-12 15:59 | |
Also, for some reason two first iterations of the reproducer prints 2 less ref counts. |
|||
| msg393529 - (view) | Author: Petr Viktorin (petr.viktorin) * ![]() |
Date: 2021-05-12 16:18 | |
Changes to _csv.Error should not be necessary, there everything is handled by the superclass. |
|||
| msg393530 - (view) | Author: Erlend E. Aasland (erlendaasland) * ![]() |
Date: 2021-05-12 16:19 | |
> Changes to _csv.Error should not be necessary, there everything is handled by the superclass. Got it; thanks. |
|||
| msg393531 - (view) | Author: Erlend E. Aasland (erlendaasland) * ![]() |
Date: 2021-05-12 16:21 | |
Yeah, that helped a lot: test_csv leaked [487, 487, 487, 487] memory blocks, sum=1948 Thanks! :) |
|||
| msg393543 - (view) | Author: Pablo Galindo Salgado (pablogsal) * ![]() |
Date: 2021-05-12 18:56 | |
New changeset ba260acb22f3a6de434dc7a159ddb94a6a8c9b7c by Miss Islington (bot) in branch '3.10': bpo-44116: Add GC support to _csv heap types (GH-26074) (GH-26081) https://github.com/python/cpython/commit/ba260acb22f3a6de434dc7a159ddb94a6a8c9b7c |
|||
| msg393545 - (view) | Author: Erlend E. Aasland (erlendaasland) * ![]() |
Date: 2021-05-12 19:41 | |
Pablo, as mentioned in bpo-42972, this is an issue with all heap allocated types; it is not _csv specific. I know that work with heap types have been halted by the SC, as you've pointed out a couple of times already, but shouldn't this heap type issue be fixed before 3.10 is released? There's plenty of time in the beta phase to do this. |
|||
| msg393546 - (view) | Author: Pablo Galindo Salgado (pablogsal) * ![]() |
Date: 2021-05-12 19:51 | |
> I know that work with heap types have been halted by the SC, as you've pointed out a couple of times already, but shouldn't this heap type issue be fixed before 3.10 is released? It should absolutely be fixed for all types that have been already converted. The halting is for more types. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:45 | admin | set | github: 88282 |
| 2021-05-12 20:31:24 | corona10 | set | nosy:
+ corona10 |
| 2021-05-12 19:51:11 | pablogsal | set | messages: + msg393546 |
| 2021-05-12 19:41:47 | erlendaasland | set | messages: + msg393545 |
| 2021-05-12 19:32:53 | pablogsal | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2021-05-12 18:56:28 | pablogsal | set | nosy:
+ pablogsal messages: + msg393543 |
| 2021-05-12 18:19:07 | miss-islington | set | nosy:
+ miss-islington pull_requests:
+ pull_request24720 |
| 2021-05-12 16:21:43 | erlendaasland | set | messages: + msg393531 |
| 2021-05-12 16:19:47 | erlendaasland | set | messages: + msg393530 |
| 2021-05-12 16:18:54 | petr.viktorin | set | messages:
+ msg393529 stage: patch review -> (no value) |
| 2021-05-12 15:59:51 | erlendaasland | set | messages: + msg393527 |
| 2021-05-12 15:56:13 | erlendaasland | set | messages: + msg393526 |
| 2021-05-12 15:51:51 | erlendaasland | set | stage: patch review pull_requests: + pull_request24714 |
| 2021-05-12 15:46:32 | petr.viktorin | set | messages:
+ msg393525 stage: patch review -> (no value) |
| 2021-05-12 15:10:39 | erlendaasland | set | messages: + msg393523 |
| 2021-05-12 15:07:51 | petr.viktorin | set | messages: + msg393522 |
| 2021-05-12 14:52:29 | erlendaasland | set | messages: + msg393521 |
| 2021-05-12 14:51:26 | erlendaasland | set | files: + gc.diff |
| 2021-05-12 14:50:55 | erlendaasland | set | messages: + msg393520 |
| 2021-05-12 14:42:05 | erlendaasland | set | nosy:
+ erlendaasland messages: + msg393518 |
| 2021-05-12 13:59:23 | petr.viktorin | set | keywords:
+ patch stage: patch review pull_requests: + pull_request24709 |
| 2021-05-12 13:58:53 | petr.viktorin | set | messages: + msg393512 |
| 2021-05-12 13:34:11 | petr.viktorin | create | |

