◐ Shell
clean mode source ↗

memoryview for raw bytes elements · msgpack/msgpack-python · Discussion #550

Hi,

A bit related to #547, I spent a bit of time investigating some weeks ago the deserialisation side of msgpack, specifically in the context of serialing xarrays, or even more general numpy arrays. At first I expected to find a zero-copy mechanism to be available when using unpackb with a buffer, but I quickly realised that when deserialisation happens, raw bytes are inevitably copied into a bytes object, which is what users get back, resulting in a copy.

I played locally adding a new bytes_memoryview option to the unpackb method to get zero-copy for numpy array deserialisation, and I see it working as expected, with numpy arrays pointing directly to the memory underlying the input for unpackb, and thus it has near-to-zero cost. This also requires a change to msgpack-numpy, but OTOH we could use our own default/object_hook functions and not depend on msgpack-numpy.

Questions:

  • Given this speedup, in general would there be willingness to accept a patch to allow for this use case?
  • Unpacker wouldn't benefit from this, or at least not without heavy refactoring, is that correct? My current diff contains a small change for Unpacker, but in retrospective I realise I don't need it, and probably would actually confuse people.

See below for the benchmark code, results in Python 3.10 and 3.11 (noisy, this is my work laptop), and the current diff.

import msgpack
import msgpack_numpy as mnp
import numpy as np
import xarray as xa
import timeit
import functools
import seaborn as sns
import pandas as pd
import matplotlib.pyplot

mser = lambda x: msgpack.packb(x, default=mnp.encode)
packer = msgpack.Packer(default=mnp.encode, autoreset=False)
def mser_packer(x):
    packer.reset()
    packer.pack(x)
    return packer
munser = lambda x: msgpack.unpackb(x, object_hook=mnp.decode, bytes_memoryview=bytes_memoryview)
munser_packer = lambda x: msgpack.unpackb(x.getbuffer(), object_hook=mnp.decode, bytes_memoryview=bytes_memoryview)

all_benchmarks = {
    "write_only": {
        'msgpack.packb': [mser],
        'msgpack.Packer': [mser_packer],
    },
    "write_read": {
        'msgpack.packb + msgpack.unpackb': [mser, munser],
        'msgpack.Packer + msgpack.unpackb(Packer.getbuffer())': [mser_packer, munser_packer],
    }
}

def benchmark_for_sizes(functions, nitems):
    results = []
    for nitem in nitems:
        arr = np.random.rand(nitem)
        xarr = xa.Dataset({"x": arr, "y": arr + 30})
        size = xarr.nbytes
        print(f"  {nitem=}, {size=}")
        timer = timeit.Timer('functools.reduce(lambda v, f: f(v), functions, xarr.to_dict("array"))', setup="import functools", globals=locals())
        n_executions, total_duration = timer.autorange()
        duration = total_duration / n_executions
        results.append((size, duration))
    return results


def run_benchmarks(nitems, benchmarks):
    results = {}
    for name, functions in benchmarks.items():
        print(f"Benchmarking {name}")
        results[name] = benchmark_for_sizes(functions, nitems)
    flat_results = list((name, size, duration) for name, values in results.items() for size, duration in values)
    df = pd.DataFrame(flat_results, columns=("Benchmark", "Size", "Duration"))
    df['Size [MB]'] = df['Size'] / 1024 / 1024
    df['Speed [MB/s]'] = df['Size'] / 1024 / 1024 / df["Duration"]
    return df

def run_all():
    global bytes_memoryview
    sns.set_theme()
    nitems = list(range(10000000, 2000000, -120000))
    dfs = []
    for use_memoryview in (False, True):
        bytes_memoryview = use_memoryview
        for group, benchmarks in all_benchmarks.items():
            df = run_benchmarks(nitems, benchmarks)
            df["Group"] = group
            df["bytes_memoryview"] = bytes_memoryview
            dfs.append(df)

    df = pd.concat(dfs)
    sns.relplot(data=df, x='Size [MB]', y='Speed [MB/s]', kind='line', hue='Benchmark', col='Group', row="bytes_memoryview")
    matplotlib.pyplot.savefig(f'benchmark_results.png')

if __name__ == '__main__':
    run_all()

After producing these plots, I realised that the bottom-left panel isn't relevant, as it's mostly a copy of the top-left panel, but hopefully doesn't distract too much from the results being shown.

Results in Python 3.10
benchmark_results-310

Results in Python 3.11
benchmark_results-311

Current diff: rtobar@76b2888