faster JSObjectSpace (JS runtime retain / release) by sliemeobn · Pull Request #676 · swiftwasm/JavaScriptKit
I added a tiny benchmark to measure the JSObjectSpace in isolation and made a few changes to speed things up.
results (two example runs)
JSObjectSpaceOriginal retain: 31.53ms release: 23.57ms mixed(1k): 12.32ms mixed(10k): 15.83ms mixed(50k): 21.63ms
JSObjectSpace (current) retain: 10.68ms release: 14.31ms mixed(1k): 7.20ms mixed(10k): 6.37ms mixed(50k): 9.19ms
JSObjectSpaceOriginal retain: 27.85ms release: 24.78ms mixed(1k): 12.44ms mixed(10k): 15.76ms mixed(50k): 22.55ms
JSObjectSpace (current) retain: 10.96ms release: 13.62ms mixed(1k): 6.18ms mixed(10k): 6.32ms mixed(50k): 9.63ms
I included the benchmark script (AI slop) but I can remove it if we want a clean commit.
Also, please note that there is a behavior difference: refs are not no longer "globally unique" - ie: they will be reused once freed. a "use after free" currently causes an exception - after this PR it could now theoretically go undetected.
ok, I just ran this in the ElementaryUI performance benchmark, the results are ... let's say "subtle"
current:
Summary
───────
01_run1k 151.81 ms
02_replace1k 298.86 ms
03_update10th1k_x16 84.39 ms
04_select1k 12.84 ms
05_swap1k 27.64 ms
06_remove-one-1k 19.76 ms
08_create1k-after1k_x2 164.12 ms
09_clear1k_x8 50.1 ms
21_ready-memory 1.03 MB
22_run-memory 2.56 MB
25_run-clear-memory 3.09 MB
with new JSObjectSpace:
Summary
───────
01_run1k 147.78 ms
02_replace1k 293.08 ms
03_update10th1k_x16 82.4 ms
04_select1k 12.72 ms
05_swap1k 27.76 ms
06_remove-one-1k 19.44 ms
08_create1k-after1k_x2 168.13 ms
09_clear1k_x8 47.01 ms
21_ready-memory 1.03 MB
22_run-memory 2.01 MB
25_run-clear-memory 2.68 MB
sliemeobn
changed the title
faster JSObjectSpace (JS runtime retain / release)
WIP: faster JSObjectSpace (JS runtime retain / release)
JSObjectSpaceOriginal retain: 76.50ms release: 80.87ms mixed(1k): 38.98ms mixed(10k): 48.31ms mixed(50k): 74.57ms
JSObjectSpace_v2 (ref++, single map) retain: 45.96ms release: 88.34ms mixed(1k): 124.55ms mixed(10k): 42.04ms mixed(50k): 48.61ms
JSObjectSpace_v3 (ref++, all maps) retain: 101.86ms release: 107.52ms mixed(1k): 40.93ms mixed(10k): 52.84ms mixed(50k): 89.15ms
JSObjectSpace (current) retain: 47.55ms release: 38.01ms mixed(1k): 22.95ms mixed(10k): 22.64ms mixed(50k): 32.93ms
---
JSObjectSpaceOriginal retain: 116.74ms release: 128.48ms mixed(1k): 42.58ms mixed(10k): 51.84ms mixed(50k): 77.80ms
JSObjectSpace_v2 (ref++, single map) retain: 57.26ms release: 102.22ms mixed(1k): 116.85ms mixed(10k): 44.35ms mixed(50k): 53.19ms
JSObjectSpace_v3 (ref++, all maps) retain: 101.29ms release: 180.11ms mixed(1k): 43.42ms mixed(10k): 49.50ms mixed(50k): 97.06ms
JSObjectSpace (current) retain: 55.56ms release: 46.38ms mixed(1k): 22.96ms mixed(10k): 25.23ms mixed(50k): 35.36ms
@kateinoigakukun please advise if you still want to go ahead with the JSObjectSpace (current) implementation even though we lose the "use after free" runtime detection, and about what to do with the benchmark code.
@MaxDesiatov @kateinoigakukun
ok, I think a have a good compromise
the version currently used here (v5) does the following
- use 10 bits of the reference to track a "generation"
- reuses slots (22 bits) for dense arrays
- increases the "generation" for each ref on last release
- basically use after free / stale ref detection (unless freak accident)
- still ~2x faster then the original
limitations:
- this version never trims the allocated slots back down (ie: the array memory or the max used objects at any given time sticks around)
- limits the count of concurrently retained objects to 4.194.304
- generations wrap after 1.024
JSObjectSpaceOriginal retain: 91.22ms release: 65.35ms mixed(1k): 41.16ms mixed(10k): 47.76ms mixed(50k): 70.56ms
JSObjectSpace_v1 (reused refs, single map) retain: 50.58ms release: 34.77ms mixed(1k): 18.58ms mixed(10k): 20.70ms mixed(50k): 44.11ms
JSObjectSpace_v2 (ref++, single map) retain: 44.70ms release: 82.40ms mixed(1k): 127.82ms mixed(10k): 42.00ms mixed(50k): 47.61ms
JSObjectSpace_v3 (ref++, all maps) retain: 69.19ms release: 81.49ms mixed(1k): 45.38ms mixed(10k): 48.28ms mixed(50k): 108.62ms
JSObjectSpace_v4 (gen-tagged refs, single map) retain: 48.92ms release: 39.08ms mixed(1k): 22.85ms mixed(10k): 20.34ms mixed(50k): 31.49ms
JSObjectSpace_v5 (gen-tagged refs, typed state) retain: 44.63ms release: 32.05ms mixed(1k): 22.03ms mixed(10k): 27.12ms mixed(50k): 34.31ms
JSObjectSpace (current) retain: 44.84ms release: 33.33ms mixed(1k): 20.34ms mixed(10k): 20.66ms mixed(50k): 30.03ms
JSObjectSpaceOriginal retain: 84.76ms release: 82.04ms mixed(1k): 40.88ms mixed(10k): 44.15ms mixed(50k): 63.75ms
JSObjectSpace_v1 (reused refs, single map) retain: 53.21ms release: 32.64ms mixed(1k): 18.76ms mixed(10k): 19.02ms mixed(50k): 24.46ms
JSObjectSpace_v2 (ref++, single map) retain: 42.13ms release: 72.61ms mixed(1k): 119.19ms mixed(10k): 35.57ms mixed(50k): 47.40ms
JSObjectSpace_v3 (ref++, all maps) retain: 65.45ms release: 93.43ms mixed(1k): 45.75ms mixed(10k): 54.23ms mixed(50k): 84.92ms
JSObjectSpace_v4 (gen-tagged refs, single map) retain: 46.16ms release: 37.13ms mixed(1k): 19.50ms mixed(10k): 19.50ms mixed(50k): 49.24ms
JSObjectSpace_v5 (gen-tagged refs, typed state) retain: 43.59ms release: 49.00ms mixed(1k): 18.92ms mixed(10k): 21.46ms mixed(50k): 35.35ms
JSObjectSpace (current) retain: 47.27ms release: 48.48ms mixed(1k): 19.23ms mixed(10k): 19.28ms mixed(50k): 27.84ms
and, just for the record: using a typed array (Uint32Array) with an initial size and manual growing does not really move the needle.
unfortunately, all this micro-optimizing does not fully materialize in the brower... : /
here is a comparison with the current ElementaryUI benchmark script
original version:
Summary
───────
01_run1k 416.74 ms
02_replace1k 636.72 ms
03_update10th1k_x16 250.94 ms
04_select1k 50.67 ms
05_swap1k 88.94 ms
06_remove-one-1k 69.74 ms
08_create1k-after1k_x2 473.19 ms
09_clear1k_x8 178.77 ms
21_ready-memory 1.03 MB
22_run-memory 2.56 MB
25_run-clear-memory 3.09 MB
current branch version:
Summary
───────
01_run1k 367.53 ms
02_replace1k 622.95 ms
03_update10th1k_x16 244.43 ms
04_select1k 48.72 ms
05_swap1k 87.9 ms
06_remove-one-1k 61.54 ms
08_create1k-after1k_x2 400.66 ms
09_clear1k_x8 124.82 ms
21_ready-memory 1.03 MB
22_run-memory 2.01 MB
25_run-clear-memory 2.68 MB
now, why this does not slash the retain times in half in the browser I don't know yet (maybe the JS engine in there applies different optimizations)?
here is me thinking maybe we should stop improving this old nonsense and "just":
a) implement unretained string passing, and
b) get rid of this JS-side ref table and use an externref-based system...
@MaxDesiatov not really, no. but I haven't been very scientific with it.
I ran the benchmark in bun a few times, things looked roughly the same as in v8 (relative to each other).
the ElementaryUI suite runs in chromium, but a few manual tests in Safari showed very similar behavior.
sliemeobn
changed the title
WIP: faster JSObjectSpace (JS runtime retain / release)
faster JSObjectSpace (JS runtime retain / release)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a time to review this but it looks good to me!
This change makes the maximum number of JSObjects that can live at the same time limited under (1 << 22) - 1 (around 4M), but it should be an acceptable constraint practically
LGTM, one concern from our side - we have couple of operations that traverse graphs and collect all matching nodes into an array of live JS object references. We're testing with graphs with few millions nodes, which means a single walk could hold millions of concurrent JSObjectSpace references. With SLOT_BITS = 22, the limit is ~4.19M concurrent objects. AFAIU, when you exceed it, retain() throws a RangeError and the app crashes with no recovery path?
Could we consider bumping SLOT_BITS to 24 (16M objects, 256 generations) or even 26 (64M objects, 64 generations)? The performance characteristics should be identical - it's just changing integer constants for the masks and shifts. The generation counter is a nice safety net but even 64 generations would catch the vast majority of stale ref bugs in practice 🙏🏻
@krodak thanks for the review, I bumped the bitmask to 24 bits