Issue 4738: Patch to make zlib-objects better support threads
Created on 2008-12-24 16:29 by ebfe, last changed 2022-04-11 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| zlib_threads-3.diff | ebfe, 2009-01-02 10:38 | |||
| zlibtest2.py | ebfe, 2009-01-02 10:39 | |||
| Messages (15) | |||
|---|---|---|---|
| msg78266 - (view) | Author: Lukas Lueg (ebfe) | Date: 2008-12-24 16:29 | |
My application needs to pack and unpack workunits all day long and does this using multiple threading.Threads. I've noticed that the zlib module seems to use only one thread at a time when using [de]compressobj(). As the comment in the sourcefile zlibmodule.c already says the module uses a global lock to protect different threads from accessing the object. While the c-functions release the GIL while waiting for the global lock, only one thread at a time can use zlib. My app ends up using only one CPU to compress/decompress it's workunits... The patch (svn diff to ) attached here fixes this problem by extending the compressobj-structure by an additional member to create object-specific locks and removes the global lock. The lock protects each compressobj individually and allows multiple python threads to use zlib in parallel, utilizing all available CPUs. |
|||
| msg78282 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2008-12-25 23:14 | |
A call to PyThread_free_lock(this->lock) in Comp_dealloc() and Decomp_dealloc(). Comp_dealloc() and Decomp_dealloc() code may also be factorized (write a common function to free unused_data, unconsumed_tail and self). |
|||
| msg78287 - (view) | Author: Lukas Lueg (ebfe) | Date: 2008-12-26 00:42 | |
new svn diff |
|||
| msg78289 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2008-12-26 01:30 | |
Ok, the patch looks fine and I like finer locks ;-) |
|||
| msg78318 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2008-12-26 23:34 | |
New comments about the last patch: - GIL is not released for adler() or crc32() whereas these functions may be slow for long strings: just add Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS before / after adler(...) and crc32(...) - (As ENTER_HASHLIB, issue #4751) I think that Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS are useless in ENTER_ZLIB - You might add explicit self to ENTER/LEAVE_ZLIB because the macros are now dependent of self (and not the whole module) => ENTER_ZLIB(self) and LEAVE_ZLIB(self) Are deflateCopy() and inflateCopy() slow enough to release the GIL? |
|||
| msg78326 - (view) | Author: Lukas Lueg (ebfe) | Date: 2008-12-27 01:15 | |
new svn diff attached - GIL is now released for adler32 and crc32 if the buffer is larger than 5kb (we don't want to risk burning cpu cycles by GIL-stuff) - adler32 got it's param by s# but now does s* - why s# anyway? - ENTER_ZLIB no longer gives away the GIL. It's dangerous and useless as there is no pressure on the object's lock. - deflateCopy() and inflateCopy() are not worth the trouble.u |
|||
| msg78329 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2008-12-27 01:38 | |
Comments on zlib_threads-2.diff: - the indentation is strange: don't mix spaces and tabs! - I prefer ";" after a call to a macro: "ENTER_ZLIB(self);" instead of "ENTER_ZLIB(self)". It makes vim happy (auto indent code correctly) and it works for ENTER_ZLIB and LEAVER_ZLIB. - ENTER_ZLIB and LEAVER_ZLIB prototype is wrong if WITH_THREAD is not defined - oh yeah, s* is needed to protect the buffer with a lock - why 5kb? is it a random value? I prefer power of two, like 4096 bytes :-) |
|||
| msg78331 - (view) | Author: Lukas Lueg (ebfe) | Date: 2008-12-27 02:02 | |
new svn diff attached
the indentation in this file is not my fault, it has tabs all over it...
The 5kb limits protects from the overhead of releasing the GIL. With
very small buffers the overall runtime in my benchmark tends to double.
I set it based on my testing and it remains being arbitrary to a certain
degree. Set the limit to 1 and try 1.000.000 times b'abc'...
May I also suggest to change the zlib module not to accept s* but y*:
- Internally zlib operates on bytes, characters don't mean a thing in
zlib-land.
- We rely on s* performing the encoding into default for us. This
behaviour is hidden from the programmer and somewhat violates the rule
of least surprise.
- type(zlib.decompress(zlib.compress('abc'))) == bytes
- Changing from s* to y* forces the programmer to use .encode() on his
strings (e.g. zlib.compress('abc'.encode()) which very clearly shows
what's happening.
|
|||
| msg78350 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2008-12-27 10:38 | |
> May I also suggest to change the zlib module not to accept s* but y* You are probably right, but this would also break the API and can't be done lightheartedly. You can open a new bug entry about this though. |
|||
| msg78359 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2008-12-27 13:01 | |
I opened a separate issue for the unicode problem: #4757. |
|||
| msg78669 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2009-01-01 00:07 | |
Same comment about potential deadlocks as in #4751. |
|||
| msg78771 - (view) | Author: Lukas Lueg (ebfe) | Date: 2009-01-02 10:38 | |
Here is a small test-script with concurrent access to a single compressosbj. The original patch will immediately deadlock. The patch attached releases the GIL before trying to get the zlib-lock. This allows the other thread to release the zlib-lock but comes at the cost of one additional GIL lock/unlock. |
|||
| msg78772 - (view) | Author: Lukas Lueg (ebfe) | Date: 2009-01-02 10:39 | |
test-script |
|||
| msg78849 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2009-01-02 17:35 | |
Checked in r68165. Thanks! |
|||
| msg78850 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2009-01-02 17:49 | |
@pitrou: You added "Also, the GIL is now released when computing the CRC of a large buffer." in the NEWS. The GIL released for crc32 but also for adler32. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:56:43 | admin | set | github: 48988 |
| 2009-01-02 17:49:52 | vstinner | set | messages: + msg78850 |
| 2009-01-02 17:35:18 | pitrou | set | status: open -> closed resolution: fixed messages: + msg78849 |
| 2009-01-02 10:39:36 | ebfe | set | files:
+ zlibtest2.py messages: + msg78772 |
| 2009-01-02 10:38:17 | ebfe | set | files:
+ zlib_threads-3.diff messages: + msg78771 |
| 2009-01-02 10:33:46 | ebfe | set | files: - zlib_threads-2.diff |
| 2009-01-01 00:07:07 | pitrou | set | messages: + msg78669 |
| 2008-12-27 13:01:01 | vstinner | set | messages: + msg78359 |
| 2008-12-27 10:38:39 | pitrou | set | messages: + msg78350 |
| 2008-12-27 02:03:02 | ebfe | set | files: - zlib_threads-2.diff |
| 2008-12-27 02:02:45 | ebfe | set | files:
+ zlib_threads-2.diff messages: + msg78331 |
| 2008-12-27 01:38:06 | vstinner | set | messages: + msg78329 |
| 2008-12-27 01:15:30 | ebfe | set | files: - zlib_threads.diff |
| 2008-12-27 01:15:22 | ebfe | set | files:
+ zlib_threads-2.diff messages: + msg78326 |
| 2008-12-26 23:34:43 | vstinner | set | messages: + msg78318 |
| 2008-12-26 22:28:57 | pitrou | set | priority: normal nosy: + pitrou stage: patch review versions: - Python 2.6, Python 2.5, Python 3.0 |
| 2008-12-26 01:30:24 | vstinner | set | messages: + msg78289 |
| 2008-12-26 01:29:19 | vstinner | set | files: - zlib_threads.diff |
| 2008-12-26 00:42:35 | ebfe | set | files:
+ zlib_threads.diff messages: + msg78287 |
| 2008-12-25 23:14:30 | vstinner | set | nosy:
+ vstinner messages: + msg78282 |
| 2008-12-24 16:29:30 | ebfe | create | |
