Anthropic Response to Apple Report on Large Reasoning Models
REFERENCE: Opus,C. and Lawsen, A. "Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity." Arxiv / Computer Science (June 10th, 2025).
URL: https://arxiv.org/html/2506.09250v1
ABSTRACT: “Shojaee et al. (2025) report that Large Reasoning Models (LRMs) exhibit "accuracy collapse" on planning puzzles beyond certain complexity thresholds. We demonstrate that their findings primarily reflect experimental design limitations rather than fundamental reasoning failures. Our analysis reveals three critical issues: (1) Tower of Hanoi experiments systematically exceed model output token limits at reported failure points, with models explicitly acknowledging these constraints in their outputs; (2) The authors' automated evaluation framework fails to distinguish between reasoning failures and practical constraints, leading to misclassification of model capabilities; (3) Most concerningly, their River Crossing benchmarks include mathematically impossible instances for N > 5 due to insufficient boat capacity, yet models are scored as failures for not solving these unsolvable problems. When we control for these experimental artifacts, by requesting generating functions instead of exhaustive move lists, preliminary experiments across multiple models indicate high accuracy on Tower of Hanoi instances previously reported as complete failures. These findings highlight the importance of careful experimental design when evaluating AI reasoning capabilities.”
RESPONSES
When Your Joke Paper Goes Viral: A Cautionary Tale. URL: https://lawsen.substack.com/p/when-your-joke-paper-goes-viral
AI Reasoning Debate: Landmark Study Challenges Apple's Claims of "Cognitive Collapse" in Large Models. URL: https://www.ctol.digital/news/study-challenges-apple-ai-reasoning-limitations/
Research: Did Apple researchers overstate “The Illusion of Thinking” in reasoning models. Opus, Lawsen think so. URL: https://chatgptiseatingtheworld.com/2025/06/13/research-did-apple-researchers-overstate-the-illusion-of-thinking-in-reasoning-models-opus-lawsen-think-so/
Expert debunks Apple study claiming AI models can't really think. URL: https://www.perplexity.ai/page/expert-debunks-apple-study-cla-TBCVTq6kQ5m40URmoEIPlw
Anthropic* fires back – AI reasoning works, Apple’s reasoning doesn’t. URL: https://www.rcrwireless.com/20250616/ai-ml/anthropic-apple-ai-reasoning
Apple vs. Anthropic: The AI Reasoning Showdown. URL: https://www.prompt.security/blog/apple-vs-anthropic-the-ai-reasoning-showdown
Anthropic Fires Back at Apple’s Research Paper. URL: https://generativeai.pub/anthropic-fires-back-at-apples-research-paper-2ace57bd80e8
New paper pushes back on Apple’s LLM ‘reasoning collapse’ study. URL: https://9to5mac.com/2025/06/13/new-paper-pushes-back-on-apples-llm-reasoning-collapse-study/