Another demo of the iPhone 17 Pro’s on-device LLM performance This time with Ling mini 2.0 by @TheInclusionAI, a 16B MoE model with 1.4B active parameters running at ~120tk/s Thanks to @awnihannun for the MLX DWQ 2-bit quants