Gsmplus _verified_ Guide

is an adversarial evaluation benchmark designed to test the mathematical reasoning robustness of Large Language Models (LLMs). It is an extension of the popular GSM8K dataset , which consists of high-quality grade school math word problems. Purpose and Key Features

: Each original question is transformed into eight variations across five specific reasoning perspectives: gsmplus