GitHub - sign-language-processing/segmentation: Sign language pose segmentation model on both the sentence and sign level
Pose segmentation model for sign language — signs and sentences — using CNN + Transformer with RoPE.
Usage
# Install pip install git+https://github.com/sign-language-processing/segmentation # Acquire a MediaPipe Holistic pose file and the corresponding video wget -O example.pose https://datasets.sigma-sign-language.com/poses/holistic/dgs_corpus/1413451-11105600-11163240_a.pose wget -O example.mp4 https://www.sign-lang.uni-hamburg.de/meinedgs/videos/1413451-11105600-11163240/1413451-11105600-11163240_1a1.mp4 # Run the model (video is linked into the ELAN file so it plays in ELAN) pose_to_segments --pose example.pose --elan output.eaf --video example.mp4
The model reads a .pose file and writes an ELAN (.eaf) annotation file with SIGN and SENTENCE tiers.
from pose_format import Pose from sign_language_segmentation.bin import segment_pose with open("example.pose", "rb") as f: pose = Pose.read(f) eaf, tiers = segment_pose(pose) # tiers["SIGN"] and tiers["SENTENCE"] are lists of {"start": int, "end": int} frame dicts
Server
# Build and run the inference server docker build -t segmentation-serve . docker run -p 8080:8080 -e PORT=8080 segmentation-serve # Segment a pose file (input/output are file paths or gs:// URIs) curl -X POST http://localhost:8080 \ -H "Content-Type: application/json" \ -d '{"input": "/path/to/input.pose", "output": "/path/to/output.eaf"}' # Health check curl http://localhost:8080/health
Training
Prerequisites
Requires the DGS Corpus and MediaPipe Holistic poses (internal datasets).
Docker (recommended)
# Build the training image docker build -f Dockerfile.train -t segmentation-train . # Train docker run --rm --gpus all \ -v /path/to/dgs-corpus:/data/dgs-corpus:ro \ -v /path/to/mediapipe-poses:/data/poses:ro \ -v $(pwd)/models:/app/models \ segmentation-train \ python -m sign_language_segmentation.train \ --corpus /data/dgs-corpus \ --poses /data/poses \ --hidden_dim 384 --encoder_depth 4 --attn_nhead 8 \ --batch_size 8 --num_frames 1024 \ --dice_loss_weight 1.5 \ --epochs 500 --patience 100 # Evaluate on dev split docker run --rm --gpus all \ -v /path/to/dgs-corpus:/data/dgs-corpus:ro \ -v /path/to/mediapipe-poses:/data/poses:ro \ -v $(pwd)/models:/app/models \ segmentation-train \ python -m sign_language_segmentation.evaluate \ --checkpoint /app/models/<run_name>/best.ckpt \ --corpus /data/dgs-corpus \ --poses /data/poses \ --split dev
Best hyperparameters and architecture details: dist/2026/README.md.
Local (development)
conda create --name segmentation python=3.12 -y
conda activate segmentation
pip install ".[dev]"
python -m sign_language_segmentation.train --corpus /path/to/dgs-corpus --poses /path/to/posesArchitecture
CNN-medium-attn + RoPE (2026):
- Stage 1: Two-stage UNet CNN — spatial compression over joints, then temporal context
- Stage 2: N-layer pre-norm Transformer with Rotary Position Embedding (RoPE)
- Two output heads: sign (gloss) BIO and phrase (sentence) BIO
See dist/2026/README.md for what worked, what didn't, and key bug fixes.
2023 Version (v2023)
Exact code for the paper Linguistically Motivated Sign Language Segmentation.
Citation
@inproceedings{moryossef-etal-2023-linguistically, title = "Linguistically Motivated Sign Language Segmentation", author = {Moryossef, Amit and Jiang, Zifan and M{\"u}ller, Mathias and Ebling, Sarah and Goldberg, Yoav}, editor = "Bouamor, Houda and Pino, Juan and Bali, Kalika", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023", month = dec, year = "2023", address = "Singapore", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.findings-emnlp.846", doi = "10.18653/v1/2023.findings-emnlp.846", pages = "12703--12724", }