Embedl's Blog on Deep Learning
Qwen 3.5 Optimized with FlashHead
Qwen 3.5 is a new generation of large language models designed for high-quality reasoning and multimodal tasks. FlashHea...
Faster Multi-Modal Reasoning with FlashHead Triton Kernel
FlashHead is built to reduce the cost of the LM head during inference. This update makes that path faster. The change is...
Edge AI Application of the Month
What the Challenge Is The Edge AI Application of the Month is a hands on challenge designed to push developers into buil...
Introducing hfviewer
The Hugging Face ecosystem already has model cards, spaces, checkpoints, benchmarks, and demos. What it has still been m...
The power of randomness: projections and rotations
"Nothing is more practical than a good theory," as Vladimir Vapnik liked to say when developing his theory of SVMs. This...
FlashHead for vLLM, made simple
Running FlashHead from Embedl with vLLM shouldn’t require any specialized imports or setup procedures. We are excited to...
Lightning-Fast Multimodal Edge Inference with Under 8GB RAM
Running advanced multi-modal reasoning models on edge hardware has traditionally required large GPUs and tens of gigabyt...
Cosmos Reason 2 Without the Quantization Trade-Off
We have just released embedl/Cosmos-Reason2-2B-W4A16-Edge2, a new mixed-precision variant of Cosmos Reason 2 that recove...