◐ Shell
clean mode source ↗

Dev-X25874 - Overview

Pinned Loading

  1. LoRA fine-tuned a 120B LLM to classify LLVM compiler pass interactions. 5.4% → 75.0% accuracy.

    Python

  2. GPU-resident persistent kernel with CUDA Dynamic Parallelism. Zero CPU intervention, lock-free task queue.

    Cuda

  3. Hybrid KDA+MLA attention architecture with 75% memory reduction and 6x faster long-context inference.

    Python

  4. JAX-native reward modelling toolkit for RL fine-tuning of LLMs. Composable rewards, distributed via pjit.

    Python

  5. Quantisation-native LLM inference engine for 1-bit and ternary models. All matrix ops run via XNOR+POPCNT, never dequantized.

    Rust

  6. High-performance CUDA kernel library — matrix ops, fused attention, parallel reductions. 3+ TFLOPS on Ampere.

    Cuda