Tag: LLM

All the articles with the tag "LLM".

从 Online Softmax 到 FlashAttention

1 Feb, 2026

从数值稳定的 Safe Softmax 出发，推导 Online Softmax 的递推公式，最终理解 FlashAttention 如何将注意力计算融合为单轮遍历的 IO 感知算法。
拒绝 GRPO 焦虑！离线强化学习也能造就数学最强基座？PCL-Reasoner-V1.5 技术深度解析

30 Dec, 2025

PCL-Reasoner-V1.5 基于离线强化学习在 Qwen2.5-32B 上实现 AIME 2024/2025 SOTA 成绩，探讨 Offline RL 作为 GRPO 替代方案的技术路径。