Gsmplus _verified_ Guide
is an adversarial evaluation benchmark designed to test the mathematical reasoning robustness of Large Language Models (LLMs). It is an extension of the popular GSM8K dataset , which consists of high-quality grade school math word problems. Purpose and Key Features
: Each original question is transformed into eight variations across five specific reasoning perspectives: gsmplus