Join our Paper Club with Letta/UC Berkeley - Sleep-time Compute: Beyond Inference Scaling at Test-time

Paper Club is Back! Join us as we continue our fall series exploring Memory.
Join us on November 18th for another exciting session in our fall Memory series. This week, we’ll dive into Sleep-time Compute: Beyond Inference Scaling at Test-time, a new paradigm that rethinks how large language models reason efficiently.
Instead of spending massive compute resources at test-time, the authors propose “sleep-time compute” — allowing models to think offline before queries are ever presented. By anticipating likely user queries and precomputing useful representations, models can achieve up to 5× lower inference cost and 13–18% accuracy improvements on reasoning benchmarks like Stateful GSM-Symbolic and Stateful AIME.
They also introduce Multi-Query GSM-Symbolic, which amortizes precomputation across related tasks, further cutting per-query cost by 2.5×. Their results point to a compelling future where LLMs balance latency, cost, and reasoning depth through smart, pre-emptive computation — much like how the human brain consolidates memories during sleep. 💤
☝️ Please Register Above for this Live Virtual Meeting with the Researcher! ☝️
Read the paper here - Sleep-time Compute: Beyond Inference Scaling at Test-time
Meet the Researchers:
Charles Packer is the co-founder & CEO at Letta (building an OS for LLMs) and former Ph.D. researcher at UC Berkeley’s BAIR and Sky Computing Lab. His work bridges large language models, memory, and agentic systems — exploring how AI can reason, plan, learn, and remember more like humans.
Kevin Lin is a researcher and member of technical staff at Letta, building agentic systems for large language models. He completed his B.Sc in Computer Science at Columbia University and his Ph.D. in Electrical Engineering and Computer Sciences at UC Berkeley, where his work focused on reasoning and learning in LLM-based systems.
Read more about their work at Letta here
What is Paper Club?
Paper Club is a virtual event series brought to you by the Human Feedback Foundation x Mozilla AI in collaboration with AI Tinkerers, featuring authors of cutting-edge AI and machine learning papers. These online meetups allow attendees to hear about groundbreaking research directly from the authors, participate in live Q&A sessions, and engage in discussions. Open to all, Paper Club offers a regular opportunity to learn and interact with leaders in the rapidly evolving field of artificial intelligence.
Paper Club Organizers
This Paper Club is produced by the Human Feedback Foundation, a Linux Foundation AI & Data nonprofit advancing a human-centric future for AI, in collaboration with Mozilla AI and AI Tinkerers Global.
