逻辑推理
数学教育
人工智能
连贯性(哲学赌博策略)
完备性(序理论)
考试(生物学)
生成语法
理性
计算机科学
理解力
生成模型
数学
逻辑连接
主流
归纳推理
代数数
演绎推理
瑞文推理能力测验
数学问题
自然语言处理
定性推理
作者
Shasha Song,Chenyu Meng,Zezhong Yang
标识
DOI:10.56557/jogress/2026/v20i110342
摘要
Currently, generative artificial intelligence has become an important auxiliary tool for teaching and learning mathematics in middle school. However, there is still a lack of systematic evaluation of Chinese large language models' ability to solve middle school mathematics problems in the academic community. This study selects five mainstream generative AI models (Tencent Yuanbao, Deepseek, Doubao, Kimi, and Wenxin Yiyan) as research subjects and uses 18 middle school mathematics problems covering three modules (algebra, geometry, and probability) as test samples. Comparative analysis was conducted from four dimensions: problem-solving efficiency, result accuracy, solution completeness, and logical rigor. The completeness of problem-solving thinking was evaluated by whether it included full problem-solving steps, reasoning processes and necessary verification procedures, while logical rigor was assessed based on the coherence of problem-solving steps and the rationality of reasoning grounds. The results indicated that the overall problem-solving accuracy of the five models ranged from 61.11% to 77.78%, the completeness of problem-solving thinking from 77.78% to 88.89%, and the logical rigor from 83.33% to 94.44%. The study found that domestic generative artificial intelligence demonstrated outstanding performance in solving algebraic and probability problems, yet exhibited poor performance in geometric problems due to such issues as inaccurate image recognition and incomplete comprehension of test questions. There were significant disparities in the problem-solving capabilities of the five models: Doubao and Tencent Yuanbao delivered well-balanced overall performance with detailed problem-solving processes, whereas each of the other models had its own shortcomings.
科研通智能强力驱动
Strongly Powered by AbleSci AI