ML Systems

Paper Notes: FlashAttention

FlashAttention eliminates the O(N²) memory bottleneck of standard attention by tiling computation in SRAM with an online softmax trick, achieving exact results with no …

Jiangneng Li

• Apr 1, 2026 • 3 min read

ML Systems

Paper Notes: vLLM PagedAttention

PagedAttention revolutionizes LLM inference by applying OS virtual memory concepts to KV cache management, achieving near-zero memory waste.

Jiangneng Li

• Mar 27, 2026 • 2 min read