Blackwell-optimized Cosmos Reason 2

Today, we are releasing embedl/Cosmos-Reason2-2B-NVFP4A16, a new Blackwell-optimized variant of Cosmos Reason 2. This model specifically targets AGX Thor of the NVIDIA Jetson family. With W4A16 and NVFP4, we are now supporting the full NVIDIA Jetson family.

This release marks the first public availability of NVFP4 quantization for Cosmos Reason 2, while maintaining the original model's reasoning quality.

Why NVFP4, and Why Now?

Cosmos Reason 2 is a multimodal reasoning VLM designed for physical AI workloads: text, image, and video inputs producing structured textual reasoning. These workloads are increasingly moving to the edge, where memory bandwidth and latency matter more than raw datacenter throughput.

With Blackwell, NVIDIA introduced NVFP4 tensor core acceleration, enabling efficient 4-bit floating-point weight execution. NVFP4A16 (FP4 weights, FP16 activations) significantly reduces model weight memory while leveraging native Blackwell hardware paths.

Important to note:

NVFP4 acceleration is Blackwell-only. Platforms prior to Blackwell (Ampere, Ada, Hopper) do not have native NVFP4 tensor core support.
Backend support still matters. NVFP4 performance depends on inference stack maturity (e.g., vLLM version and kernel implementations).

If you're not deploying on Blackwell hardware, our W4A16 variant; https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16 remains the recommended option.

Benchmarked on Jetson AGX Thor

We benchmarked NVFP4A16 on Jetson AGX Thor across text, image, and video workloads.

On the official Physical AI Bench Reason Task evaluation (https://huggingface.co/spaces/shi-labs/physical-ai-bench-leaderboard), NVFP4A16 slightly outperforms W4A16 overall. Overall score remains close to the FP16 baseline

For edge robotics and real-time systems, on-device latency and throughput are often the critical metrics. You can find our detailed benchmarks in the model card https://huggingface.co/embedl/Cosmos-Reason2-2B-NVFP4A16.

What’s Next: W4A16-Edge2

Our next release will be a new mixed precision embedl/Cosmos-Reason2-2B-W4A16-Edge2 variant.

The goal:

Preserve the reasoning accuracy of the original Bfloat16 model
Match the maximally quantized W4A16 latency
Balance weight precision where it matters most

This upcoming release targets the sweet spot between accuracy and responsiveness for robotics and real-time systems.

Try It Today

You can try the new NVFP4A16 model here:

https://huggingface.co/embedl/Cosmos-Reason2-2B-NVFP4A16

The model card includes a Jetson vLLM container example to get started quickly on Thor.

If you’re deploying Cosmos Reason 2 on Blackwell edge platforms, NVFP4A16 is now the most efficient and accuracy-robust option in the lineup.

More edge-optimized variants are coming soon.

Blackwell-optimized Cosmos Reason 2

Why NVFP4, and Why Now?

Benchmarked on Jetson AGX Thor

What’s Next: W4A16-Edge2

Try It Today

Like it? Share it:

You may also like

Cosmos Reason 2, Quantized for the Edge

Ultra-Efficient SLMs: Embedl’s Breakthrough for On-Device AI

Llama 4 on the Edge: Overcoming Limitations of Deploying Mixture-of-Experts on Edge Devices