Aquatic 的部落格

Large Language Models on the Chessboard - A Study on ChatGPT's Formal Language Comprehension and Complex Reasoning Skills

Symposium on Large Language Models (LLM 2023) @ IJCAI 2023 21 Aug 2023

Mu-Tien Kuo Chih-Chung Hsueh Richard Tzong-Han Tsai

While large language models have made strides in natural language processing, their proficiency in complex reasoning tasks requiring formal language comprehension, such as chess, remains less investigated. This paper probes the performance of ChatGPT, a sophisticated language model by OpenAI in tackling such complex reasoning tasks, using chess as a case study. Through robust metrics examining both the legality and quality of moves, we assess ChatGPT’s understanding of the chessboard, adherence to chess rules, and strategic decision-making abilities. Our evaluation identifies limitations within ChatGPT’s attention mechanism that affect its formal language comprehension and uncovers the model’s underdeveloped self-regulation abilities.

NLP Complex Reasoning Large Language Models

詳情

Automated Assessment of Fidelity and Interpretability - An Evaluation Framework for Large Language Models’ Explanations

Proceedings of the AAAI Conference on Artificial Intelligence 24 March 2024

Mu-Tien Kuo Chih-Chung Hsueh Richard Tzong-Han Tsai

As Large Language Models (LLMs) become more prevalent in various fields, it is crucial to rigorously assess the quality of their explanations. Our research introduces a task-agnostic framework for evaluating free-text rationales, drawing on insights from both linguistics and machine learning. We evaluate two dimensions of explainability - fidelity and interpretability. For fidelity, we propose methods suitable for proprietary LLMs where direct introspection of internal features is unattainable. For interpretability, we use language models instead of human evaluators, addressing concerns about subjectivity and scalability in evaluations.

NLP Large Language Models Explainability Fidelity Faithfulness Interpretability

詳情

Aquatic 的小角落

Aquatic

Coding Enthusiast at NTHU IPTH

技能

Data Analysis

Linux

Self-Hosting

Vim

學歷

Chingshin Academy 靜心高中

Graduated high school at 18

NTHU IPTH 清大不分系 28級

Currently studying at Tsinghua University

Publications

Large Language Models on the Chessboard - A Study on ChatGPT's Formal Language Comprehension and Complex Reasoning Skills

Automated Assessment of Fidelity and Interpretability - An Evaluation Framework for Large Language Models’ Explanations

		Chingshin Academy 靜心高中 2021-2024 Graduated high school at 18
		NTHU IPTH 清大不分系 28級 2024- Currently studying at Tsinghua University