embodied-efficiency deploy console for VLAs
The compiler runs on a real L4 Pareto frontier (latencies measured, not modelled). The supervisor below runs the actual governance code, vetting each action live.
Kernels · quantization · a runtime trust layer · no API key

Run the VLA on the robot, not just in the demo.

A vision-language-action model folds laundry in the lab. Put it on the actual robot and it stalls, and not because it can't do the task. It can't do it fast enough. The capability is there. What's left is engineering: get it inside a latency budget, then keep it safe once it's running.

The deploy gap control rate, Hz
End-to-end VLA today 3–5 Hz
A robot arm, to move smoothly 50–100 Hz

A 10–30× gap. This repo closes it with the levers that actually pay off at batch 1, then adds a supervisor so the fast policy is also one you can leave running.

5.9×
CUDA-graph speedup
measured on a T4, beats torch.compile
0.089
ms / action, best case
bf16 + graph + action-chunking
4
experiments on low-bit
the win and the negative, same rigor
0
API keys, 0 GPU needed
the console runs the real code, free

Deploy-compiler

Set a budget. It picks the best config off the real-L4 frontier, live.

Deployment budget

12.4 ms/action
51 MB
rMSE ≤ 0.05
49 steps

Action-chunking runs many actions per sampler call: cheaper per action, but the last one is more stale. This knob sets how stale you'll allow.

Real-L4 frontier · 27 configs low error high error

Safety supervisor

The runtime trust layer. It vets every action before it reaches a motor, live, on this server.

measured, not asserted On real DROID robot actions + labelled faults: the drift detector scores AUC 0.99, and tuned to a 1% false-alarm budget it catches 91% of faults. Eval in the repo.

Send it an action to vet

The policy is calibrated on a normal posture. Pick what to throw at it.

Pick an action and hit Vet this action.

The verdict, the action actually sent, and the running governance log show up here.

Efficiency gets it onto the robot. The supervisor lets it stay.

Everything here is measured and reproducible, on the hardware robots actually carry. The code, the four-experiment low-bit study, and the full write-up are public.