作者
Hang Su,Jiaming Cao,Zhuoya Li,Sai Tian,Yuanchang Deng
摘要
OBJECTIVE: This study aims to establish an interpretable framework for analyzing root causes of autonomous vehicle (AV) crashes by leveraging unstructured crash narratives. It addresses critical gaps in existing research, including fragmented causal attribution and limited utilization of textual data for mechanistic insights. METHOD: We propose an integrated framework that combines Large Language Model (LLM) and Chain-of-Thought (CoT) reasoning to analyze the causal mechanisms of AV crashes using original crash narratives. First, this study employs a sentence-level resampling method to oversample the labeled data. Second, the instruction-tuned LLM is used to extract structured Crash Causality Frames (CCFs), quintuple encoding Movement, Impact, Damage, Effect and Location, from 931 California DMV crash reports. Then, a system-theoretic taxonomy maps CCF elements to 64 causal indicators across five domains. Finally, CoT reasoning generates stepwise natural-language explanations to enhance interpretability. RESULTS: The optimized LLaMA-70B + LoRA model achieved 86.64% Accuracy in CCF extraction, while Data_sCR resampling further improved metrics to 97.93%. Analysis revealed five dominant causation patterns: Pattern 1 (30.5%, pure CV anomalies), Pattern 2 (51.9%, AV-CV interaction failures), and Patterns 3-5 (17.7%, integrating human/environment/, and infrastructure factors). Critical cross-domain couplings were identified (A1 and B2), with rear-end collisions (82.06%) predominating in Pattern 2 scenarios. Moreover, the CoT module generates auditable, step-by-step causal chains to enhance interpretability. Under a practical balance between reliability and computational cost, the accuracy of the generated CoT causal chains reaches 91.04%. CONCLUSION: V2X, (2) Developing context-aware sensor fusion for adverse environments, and (3) Implementing standardized tester training protocols for takeover scenarios.