热门话题
#
Bonk 生态迷因币展现强韧势头
#
有消息称 Pump.fun 计划 40 亿估值发币,引发市场猜测
#
Solana 新代币发射平台 Boop.Fun 风头正劲
顶级开源LLM在架构和训练方法上有一些有趣的差异。
我深入阅读了所有论文,以便在这个视频中进行分析(也是我在YC YouTube的首次亮相😅)
快来看看,告诉我你的想法!

2025年8月29日
OpenAI recently released its first open-weights model since GPT-2, entering a field led by DeepSeek and Alibaba's Qwen.
Ankit (@GuptaAnkitV) breaks down these top OSS models, including what sets them apart under the hood: mixture-of-experts, long-context training, and post-training techniques that shape reasoning and alignment—and how different design choices lead to surprisingly similar performance.
00:00 – OpenAI OSS Launch
01:00 – Comparing Open Source LLM Architectures
01:46 – GPT OSS Overview
02:37 – Under The Hood of GPT OSS
03:25 – Qwen-3 Architecture
04:17 – Qwen-3 Training
05:12 – Qwen-3 Post-Training
06:08 – Qwen-3 Reasoning & RL Innovations
06:52 – DeepSeek V3 Overview
07:40 – DeepSeek V3.1 Updates
08:39 – Attention Mechanism (MLA)
09:39 – Comparing Model Sizes
10:35 – Long Context Strategies
11:25 – Reflections on Methods
12:00 – Takeaways
53.33K
热门
排行
收藏