Kernels · quantization · a runtime trust layer · no API key

Run the VLA on the robot,
not just in the demo.

A vision-language-action model folds laundry in the lab. Put it on the actual robot and it stalls, and not because it can't do the task. It can't do it fast enough. The capability is there. What's left is engineering: get it inside a latency budget, then keep it safe once it's running.

Set a deploy budget Read the thesis

The deploy gap control rate, Hz

End-to-end VLA today 3–5 Hz

A robot arm, to move smoothly 50–100 Hz

A 10–30× gap. This repo closes it with the levers that actually pay off at batch 1, then adds a supervisor so the fast policy is also one you can leave running.

5.9×

CUDA-graph speedup

measured on a T4, beats torch.compile

0.089

ms / action, best case

bf16 + graph + action-chunking

experiments on low-bit

the win and the negative, same rigor

API keys, 0 GPU needed

the console runs the real code, free

Deploy-compiler

Set a budget. It picks the best config off the real-L4 frontier, live.

Deployment budget

Minimize

Max latency 12.4 ms/action

Max footprint 51 MB

Max action error (fidelity) rMSE ≤ 0.05

Max staleness 49 steps

Action-chunking runs many actions per sampler call: cheaper per action, but the last one is more stale. This knob sets how stale you'll allow.

Real-L4 frontier · 27 configs low error high error

Safety supervisor

The runtime trust layer. It vets every action before it reaches a motor, live, on this server.

measured, not asserted On real DROID robot actions + labelled faults: the drift detector scores AUC 0.99, and tuned to a 1% false-alarm budget it catches 91% of faults. Eval in the repo.

Send it an action to vet

The policy is calibrated on a normal posture. Pick what to throw at it.

Pick an action and hit Vet this action.

The verdict, the action actually sent, and the running governance log show up here.

Efficiency gets it onto the robot. The supervisor lets it stay.

Everything here is measured and reproducible, on the hardware robots actually carry. The code, the four-experiment low-bit study, and the full write-up are public.