Push 42 (84/144 → 85/144 emit methods, 59%): bundles SLOT-specialization
conversion + bridge-guard fix per supervisor 01:39:54Z + theologian
01:40:13Z + gatekeeper 01:39:47Z delegation-approach concurrence.
================================================================
PART 1: BRIDGE GUARD FIX (delegation, not port)
================================================================
hir_c_api.cpp: replace broken findTypeByVersionTagWalk with 1-line
delegation to jit::hir::findTypeByVersionTag (builder.cpp:3829).
The prior in-file walk stripped 6 invariants from the C++ source +
missed the static-builtin indirection (getTypeSubclasses helper).
testkeeper caught it as SIGSEGV during cinderjit.force_compile(attr_probe)
post-warmup at 00:41Z (gate-binary contamination investigation).
Delegation chosen over verbatim port because invariants cannot be
stripped from a delegation: there is no copy. Future C++ retrofits
propagate automatically (e.g., D-1775642552 collision diagnostic when
implemented). Full BRIDGE SPEC TEMPLATE below.
----------------------------------------------------------------
BRIDGE SPEC TEMPLATE (theologian 01:39:09Z, archived as W7 reference)
----------------------------------------------------------------
Bridge: hir_find_type_by_version_tag
Purpose: Walk PyType subclass tree from PyBaseObject_Type, return type
matching version_tag (or NULL). Used by LOAD_ATTR_SLOT to
resolve slot_type from inline-cache version.
C++ source: builder.cpp:3795-3833 (findTypeByVersionTagImpl +
findTypeByVersionTag + getTypeSubclasses)
PRIOR DECISIONS (scribe 01:37:08Z):
D-1775580885: pymalloc reentrancy via Python __subclasses__ callback
during HIR build. Bridge MUST NOT invoke Python during
compile. (Inherited via delegation: C++ uses PyDict_Next
+ PyWeakref_GetObject, no Python call.)
D-1775636801, D-1775636859: tp_subclasses for static builtin types
(TPFLAGS_STATIC_BUILTIN: object, int, str) is a 1-based
index into PyInterpreterState array, NOT a dict pointer.
Direct tp_subclasses access = dead code on PyBaseObject_Type
(the walk root). MUST use _PyStaticType_GetState. (Inherited
via delegation: C++ getTypeSubclasses helper handles this.)
D-1775637417: Zero-allocation walk crashed on ARM64 in prior session
due to _PyType_GetSubclasses() PyList_New temp allocation.
Current C++ uses PyDict_Next directly (no temp alloc).
(Inherited via delegation; ARM64 pre-flight at 01:43Z
confirms the fix is holding under RelWithDebInfo.)
D-1775642552: Version-tag collision is an UNRESOLVED hypothesis class.
If two types share a version_tag, walk returns wrong type
→ in auto-compile receiver matches → LoadField at wrong
slot_offset → corruption. Diagnostic logging is a separate
W-tracked workstream, not bridge-impl scope. (Inherited.)
INVARIANTS PRESERVED (via delegation, no copy — strongest preservation):
1. Depth limit 50 (builder.cpp:3799) — prevents stack overflow on
cyclic / very-deep subclass graphs.
2. version == 0 short-circuit (builder.cpp:3830) — sentinel handling
for the unspecialized version-tag value.
3. PyDict_Check before PyDict_Next (builder.cpp:3810) — tp_subclasses
may be NULL or non-dict for some types; PyDict_Next on non-dict is UB.
4. PyType_Check on weakref-resolved subtype (builder.cpp:3817) — weakref
may resolve to non-type if the subclass was gc'd or list contaminated.
5. PyWeakref_GetObject (G form, NULL-safe) instead of _GET_OBJECT macro
(builder.cpp:3816) — assumes-weakref macro returns Py_None on NULL
but UB if argument isn't a weakref.
6. _PyStaticType_GetState indirection for TPFLAGS_STATIC_BUILTIN
(builder.cpp:3786-3793) — PyBaseObject_Type IS static-builtin; without
this, walk dies at root.
7. PyDict_Next direct iteration — zero pymalloc allocation in walk body
(NOT _PyType_GetSubclasses which allocates via PyList_New). Required
for ARM64 safety per D-1775637417.
Falsifier:
- cinderjit.force_compile(attr_probe-with-warmup) on x86_64 completes
without SIGSEGV. (Verified compile-clean; gate result follows.)
- Same on ARM64 RelWithDebInfo per testkeeper pre-flight 01:43:22Z PASS:
'JIT_ENABLE=1 ./python /tmp/g1_6_force.py → FORCE_COMPILE_OK ✓'
'JIT_ENABLE=1 ./python /tmp/g1_5_auto_compile_capture.py → AUTO_COMPILE_OK ✓'
W7 — full C port of findTypeByVersionTag (post-Tier-5, MEDIUM, queued):
Owner: theologian design (this spec) + generalist implement
Trigger: when builder.cpp deletion approaches AND findTypeByVersionTag
chain is the next C++ surface to convert
Acceptance: this 7-invariant spec satisfied + ARM64 pydebug pre-flight
PASS (W8 must complete first) + post-port differential
JIT_DCHECK PASS
================================================================
PART 2: LOAD_ATTR_SLOT WIRING
================================================================
builder_emit_c.c: hir_builder_emit_load_attr_slot_c, ~42 lines.
builder.cpp: emitLoadAttr LOAD_ATTR_SLOT case shrinks to delegating stub
(46 lines deleted, 6 lines added).
----------------------------------------------------------------
PER-SPECIALIZATION RUBRIC (theologian 00:24:00Z, 8 common + 4 SLOT)
----------------------------------------------------------------
Common items (cited per gatekeeper review):
1. Source structural diff: C body in builder_emit_c.c mirrors C++ logic
line-for-line (cache lookup → type guard → load slot → check NULL).
C++ stub in builder.cpp passes (tc, func, builder, receiver, code,
name_idx, instr_idx); C body returns 1 on success / 0 to fall back.
2. Bridge usage audit:
- hir_builder_get_attr_cache(builder, instr_idx, &type_version, &slot_offset)
[hir_c_api.cpp:2653, reads _PyAttrCache via Preloader's code]
- hir_find_type_by_version_tag(type_version) [delegation, see Part 1]
- hir_func_add_reference(func, slot_type) [hir_c_api.cpp:168, roots
the type pointer through GC for compiled-code lifetime]
- hir_type_from_pytype(slot_type, 1) [exact-type HirType construction]
- hir_c_create_guard_type_reg(receiver, type, receiver) [no-FS variant
matches C++ tc.emitGuardType 3-arg form at builder.cpp:3964]
- hir_func_alloc_register, hir_c_create_load_field_reg,
hir_c_create_check_field_reg, hir_c_set_guilty_reg
All bridges exist; ordering matches C++.
3. Reference annotation: receiver is BORROWED (read from stack),
guarded result is the same register (in-place narrow), LoadField
result is owned-or-NULL (TOptObject = OBJECT|NULLPTR), CheckField
raises AttributeError on NULL via guilty_reg=receiver. Matches C++.
4. FrameState capture: GuardType uses no-FS variant (matches C++ which
also calls 3-arg tc.emitGuardType for SLOT). CheckField uses FS-aware
variant (hir_c_create_check_field_reg with &tc->frame). Snapshot is
not explicitly emitted; the existing tc.frame supplies state for
CheckField deopt.
5. Wiring exercise: existing emitCond wiring at gate_phoenix.sh:347
exercises Point class with .x/.y/.z dict-attrs (not __slots__). For
__slots__ specifically, the existing test_phoenix_jit_loadattr_golden
harness (Lib/test/test_phoenix_jit_loadattr_golden.py) exercises the
exact attr_probe pattern (Pt with __slots__) — gate_phoenix.sh runs
this in PHOENIX_MODULES.
6. Golden diff exact match (HARD GATE): docs/golden/loadattr_hir.txt
captured at C++ baseline (push 35 aa430e6). After this push, the
SLOT block of the golden must remain byte-identical. testkeeper to
re-run test_phoenix_jit_loadattr_golden and post the diff result —
if non-empty, HALT push 42.
7. Py_REF_DEBUG delta: this is the FIRST LoadAttr conversion; baseline
from C++ = +1772 (per supervisor 00:00:17Z). After SLOT lands, run
pydebug Python with sys.gettotalrefcount() before/after the LoadAttr
test suite, post delta. Must be within 5% of +1772 baseline.
8. ARM64 commit-match + 4 stash markers: per gatekeeper item python#11,
testkeeper's gate excerpt must show BOTH BINARY_MATCH (clean) lines
+ commit-match + stash markers. Standard discipline.
SLOT-specific items:
S1. Slot offset comes from _PyAttrCache->index via hir_builder_get_attr_cache
bridge (NOT hand-coded). The cache is the canonical CPython source for
LOAD_ATTR_SLOT specialization data. Matches C++ behavior at builder.cpp:3946.
S2. LoadField uses TOptObject (OBJECT|NULLPTR) for the slot output type,
matching C++ tc.emitLoadField(attr, receiver, "slot", slot_offset, TOptObject)
at builder.cpp:3970. Constructed via hir_type_union(HIR_TYPE_OBJECT,
HIR_TYPE_NULLPTR) — semantic equivalent of C++ TOptObject.
S3. Subclass check: C version returns 0 (fallback to generic LoadAttr) if
slot_type has subclasses (PyDict_GET_SIZE(tp_subclasses) > 0). Matches
C++ guard at builder.cpp:3956 — without the subclass check, GuardType
against a parent type would fail when receiver is a subclass instance,
causing unnecessary deopt.
S4. dk_version Guard (per theologian rubric): for SLOT specifically, the
GuardType against slot_type SERVES the version-check role. CPython's
_PyAttrCache->version is checked at adaptive-specializer time to set
LOAD_ATTR_SLOT; the JIT GuardType then ensures the runtime receiver
type matches what was cached. This satisfies S4 in spirit (type-version
guard precedes LoadField), even though the constant materialization is
not via separate Guard<dk_version> instruction.
================================================================
VERIFICATION (compile-clean pre-commit)
================================================================
cmake --build Python/jit_build/build --target phoenix_jit:
[ 96%] Built target jit
[100%] Built target phoenix_jit
0 errors. Pre-existing warnings only.
Pre-flight ARM64 (testkeeper 01:43:22Z, RelWithDebInfo PASS at HEAD ae8224e):
- JIT_ENABLE=1 ./python /tmp/g1_6_force.py → FORCE_COMPILE_OK ✓
- JIT_ENABLE=1 ./python /tmp/g1_5_auto_compile_capture.py → AUTO_COMPILE_OK ✓
- D-1775637417 fix is holding for production C++ findTypeByVersionTag
on ARM64. Delegation inherits this safety.
Pydebug ARM64 deferred to W8 per supervisor 01:45:32Z (pre-existing
libfmt.a linker issue, separate from SLOT scope).
Diff stat: 3 files changed, 56 insertions(+), 62 deletions(-).
builder.cpp: -46 +6 (LOAD_ATTR_SLOT case → delegating stub)
builder_emit_c.c: +42 (hir_builder_emit_load_attr_slot_c)
hir_c_api.cpp: -16 +6 (broken walk → delegation)