memoryview for raw bytes elements · msgpack/msgpack-python · Discussion #550
Hi,
A bit related to #547, I spent a bit of time investigating some weeks ago the deserialisation side of msgpack, specifically in the context of serialing xarrays, or even more general numpy arrays. At first I expected to find a zero-copy mechanism to be available when using unpackb with a buffer, but I quickly realised that when deserialisation happens, raw bytes are inevitably copied into a bytes object, which is what users get back, resulting in a copy.
I played locally adding a new bytes_memoryview option to the unpackb method to get zero-copy for numpy array deserialisation, and I see it working as expected, with numpy arrays pointing directly to the memory underlying the input for unpackb, and thus it has near-to-zero cost. This also requires a change to msgpack-numpy, but OTOH we could use our own default/object_hook functions and not depend on msgpack-numpy.
Questions:
- Given this speedup, in general would there be willingness to accept a patch to allow for this use case?
Unpackerwouldn't benefit from this, or at least not without heavy refactoring, is that correct? My current diff contains a small change forUnpacker, but in retrospective I realise I don't need it, and probably would actually confuse people.
See below for the benchmark code, results in Python 3.10 and 3.11 (noisy, this is my work laptop), and the current diff.
import msgpack import msgpack_numpy as mnp import numpy as np import xarray as xa import timeit import functools import seaborn as sns import pandas as pd import matplotlib.pyplot mser = lambda x: msgpack.packb(x, default=mnp.encode) packer = msgpack.Packer(default=mnp.encode, autoreset=False) def mser_packer(x): packer.reset() packer.pack(x) return packer munser = lambda x: msgpack.unpackb(x, object_hook=mnp.decode, bytes_memoryview=bytes_memoryview) munser_packer = lambda x: msgpack.unpackb(x.getbuffer(), object_hook=mnp.decode, bytes_memoryview=bytes_memoryview) all_benchmarks = { "write_only": { 'msgpack.packb': [mser], 'msgpack.Packer': [mser_packer], }, "write_read": { 'msgpack.packb + msgpack.unpackb': [mser, munser], 'msgpack.Packer + msgpack.unpackb(Packer.getbuffer())': [mser_packer, munser_packer], } } def benchmark_for_sizes(functions, nitems): results = [] for nitem in nitems: arr = np.random.rand(nitem) xarr = xa.Dataset({"x": arr, "y": arr + 30}) size = xarr.nbytes print(f" {nitem=}, {size=}") timer = timeit.Timer('functools.reduce(lambda v, f: f(v), functions, xarr.to_dict("array"))', setup="import functools", globals=locals()) n_executions, total_duration = timer.autorange() duration = total_duration / n_executions results.append((size, duration)) return results def run_benchmarks(nitems, benchmarks): results = {} for name, functions in benchmarks.items(): print(f"Benchmarking {name}") results[name] = benchmark_for_sizes(functions, nitems) flat_results = list((name, size, duration) for name, values in results.items() for size, duration in values) df = pd.DataFrame(flat_results, columns=("Benchmark", "Size", "Duration")) df['Size [MB]'] = df['Size'] / 1024 / 1024 df['Speed [MB/s]'] = df['Size'] / 1024 / 1024 / df["Duration"] return df def run_all(): global bytes_memoryview sns.set_theme() nitems = list(range(10000000, 2000000, -120000)) dfs = [] for use_memoryview in (False, True): bytes_memoryview = use_memoryview for group, benchmarks in all_benchmarks.items(): df = run_benchmarks(nitems, benchmarks) df["Group"] = group df["bytes_memoryview"] = bytes_memoryview dfs.append(df) df = pd.concat(dfs) sns.relplot(data=df, x='Size [MB]', y='Speed [MB/s]', kind='line', hue='Benchmark', col='Group', row="bytes_memoryview") matplotlib.pyplot.savefig(f'benchmark_results.png') if __name__ == '__main__': run_all()
After producing these plots, I realised that the bottom-left panel isn't relevant, as it's mostly a copy of the top-left panel, but hopefully doesn't distract too much from the results being shown.
Current diff: rtobar@76b2888

