Today, we are releasing embedl/Cosmos-Reason2-2B-NVFP4A16, a new Blackwell-optimized variant of Cosmos Reason 2. This model specifically targets AGX Thor of the NVIDIA Jetson family. With W4A16 and NVFP4, we are now supporting the full NVIDIA Jetson family.
This release marks the first public availability of NVFP4 quantization for Cosmos Reason 2, while maintaining the original model's reasoning quality.
Why NVFP4, and Why Now?
Cosmos Reason 2 is a multimodal reasoning VLM designed for physical AI workloads: text, image, and video inputs producing structured textual reasoning. These workloads are increasingly moving to the edge, where memory bandwidth and latency matter more than raw datacenter throughput.
With Blackwell, NVIDIA introduced NVFP4 tensor core acceleration, enabling efficient 4-bit floating-point weight execution. NVFP4A16 (FP4 weights, FP16 activations) significantly reduces model weight memory while leveraging native Blackwell hardware paths.
Important to note:
- NVFP4 acceleration is Blackwell-only. Platforms prior to Blackwell (Ampere, Ada, Hopper) do not have native NVFP4 tensor core support.
- Backend support still matters. NVFP4 performance depends on inference stack maturity (e.g., vLLM version and kernel implementations).
If you're not deploying on Blackwell hardware, our W4A16 variant; https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16 remains the recommended option.
Benchmarked on Jetson AGX Thor
We benchmarked NVFP4A16 on Jetson AGX Thor across text, image, and video workloads.
On the official Physical AI Bench Reason Task evaluation (https://huggingface.co/spaces/shi-labs/physical-ai-bench-leaderboard), NVFP4A16 slightly outperforms W4A16 overall. Overall score remains close to the FP16 baseline
For edge robotics and real-time systems, on-device latency and throughput are often the critical metrics. You can find our detailed benchmarks in the model card https://huggingface.co/embedl/Cosmos-Reason2-2B-NVFP4A16.
What’s Next: W4A16-Edge2
Our next release will be a new mixed precision embedl/Cosmos-Reason2-2B-W4A16-Edge2 variant.
The goal:
- Preserve the reasoning accuracy of the original Bfloat16 model
- Match the maximally quantized W4A16 latency
- Balance weight precision where it matters most
This upcoming release targets the sweet spot between accuracy and responsiveness for robotics and real-time systems.
Try It Today
You can try the new NVFP4A16 model here:
https://huggingface.co/embedl/Cosmos-Reason2-2B-NVFP4A16
The model card includes a Jetson vLLM container example to get started quickly on Thor.
If you’re deploying Cosmos Reason 2 on Blackwell edge platforms, NVFP4A16 is now the most efficient and accuracy-robust option in the lineup.
More edge-optimized variants are coming soon.
.png?width=1080&height=200&name=SLM%20Models%20Hugging%20Face%20(1).png)

