Qwen 3.5 Optimized with FlashHead
Qwen 3.5 Optimized with FlashHead

Qwen 3.5 is a new generation of large language models designed for high-quality reasoning and multimodal tasks. FlashHea...

May 20, 2026 10:10:10 AM
Faster Multi-Modal Reasoning with FlashHead Triton Kernel
Faster Multi-Modal Reasoning with FlashHead Triton Kernel

FlashHead is built to reduce the cost of the LM head during inference. This update makes that path faster. The change is...

May 12, 2026 9:21:31 AM
Edge AI Application of the Month
Edge AI Application of the Month

What the Challenge Is The Edge AI Application of the Month is a hands on challenge designed to push developers into buil...

May 7, 2026 3:14:41 PM
Introducing hfviewer
Introducing hfviewer

The Hugging Face ecosystem already has model cards, spaces, checkpoints, benchmarks, and demos. What it has still been m...

May 4, 2026 12:01:13 PM
The power of randomness: projections and rotations
The power of randomness: projections and rotations

"Nothing is more practical than a good theory," as Vladimir Vapnik liked to say when developing his theory of SVMs. This...

Apr 30, 2026 10:17:28 AM
FlashHead for vLLM, made simple
FlashHead for vLLM, made simple

Running FlashHead from Embedl with vLLM shouldn’t require any specialized imports or setup procedures. We are excited to...

Apr 20, 2026 7:51:31 AM
Lightning-Fast Multimodal Edge Inference with Under 8GB RAM
Lightning-Fast Multimodal Edge Inference with Under 8GB RAM

Running advanced multi-modal reasoning models on edge hardware has traditionally required large GPUs and tens of gigabyt...

Mar 10, 2026 7:46:19 PM
Cosmos Reason 2 Without the Quantization Trade-Off
Cosmos Reason 2 Without the Quantization Trade-Off

We have just released embedl/Cosmos-Reason2-2B-W4A16-Edge2, a new mixed-precision variant of Cosmos Reason 2 that recove...

Feb 27, 2026 12:50:25 PM