Apple Report on Large Reasoning Models and Problem Complexity

REFERENCE: Jonas F. Lotz, António Vilarinho Lopes, Stephan Peitz, Hendra Setiawan, Leonardo Emili. “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity.” Paper. Machine Learning Research, Apple (June 2025).

URL: https://machinelearning.apple.com/research/illusion-of-thinking

ABSTRACT: “Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning traces’ structure and quality. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of compositional complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs “think”. Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counterintuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget. By comparing LRMs with their standard LLM counterparts under equivalent inference compute, we identify three performance regimes: (1) lowcomplexity tasks where standard models surprisingly outperform LRMs, (2) medium-complexity tasks where additional thinking in LRMs demonstrates advantage, and (3) high-complexity tasks where both models experience complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles. We also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the models’ computational behavior, shedding light on their strengths, limitations, and ultimately raising crucial questions about their true reasoning capabilities.”

RESPONSES

  • “Is More Thinking Always Better? Apple Unmasks the Illusion of AI Reasoning”” - Artificial Intelligence on Medium

  • “What If… Apple’s Recent “Illusion of Thinking” Paper Misses the Deeper Meaning of Intelligence?” - Artificial Intelligence on Medium

  • “The Sequence Research #663: The Illusion of Thinking, Inside the Most Controversial AI Paper of Recent Weeks” - TheSequence

  • “"The Illusion of Thinking" – Thoughts on This Important Paper” - Hacker News

  • '‘Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity” - Huggingface Daily Papers

  • “AI ‘The Illusion of Thinking’” - Deric's MindBlog

  • “The Illusion of Thinking: A Reality Check on AI Reasoning” - Artificial Intelligence

  • “The Illusion of "The Illusion of Thinking"" - Data Science

  • “[R] (Anthropic) Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity” - Machine Learning

  • “Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity” - Lobsters

  • ‘The Illusion of Thinking - Paper Walkthrough” - Computer Science: Theory and Application

  • “The Illusion of Thinking: Why Apple’s Findings Hold True for AI Code Reviews” - The Practical Developer

  • “The Illusion of Thinking: Apple’s Groundbreaking Study on Why Reasoning AI Models Break Down” - Generative AI - Medium

  • “Apple's 'The Illusion of Thinking' is shocking - but here's what it missed” - ZDNet | All About Microsoft RSS

Previous
Previous

Anthropic Response to Apple Report on Large Reasoning Models