TY - GEN
T1 - Addressing Hallucination in Causal Q&A
T2 - Joint Workshop of the 9th Financial Technology and Natural Language Processing, FinNLP 2025, the 6th Financial Narrative Processing, FNP 2025, and the 1st Workshop on Large Language Models for Finance and Legal, LLMFinLegal 2025, co-located with the 31st International Conference on Computational Linguistics, COLING 2025
AU - Niess, Georg
AU - Razouk, Houssam
AU - Mandic, Stasa
AU - Kern, Roman
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - This paper presents our approach and findings for participating in the FinCausal 2025 competition (Moreno-Sandoval et al., 2025), which addresses causal question answering derived from financial documents, specifically English and Spanish annual reports. We investigate the effectiveness of generative models, such as Llama, in contrast to common extractive methods like BERT-based token classification. While prompt optimization and few-shot learning offer some improvements, they were insufficient for consistently outperforming extractive methods in FinCausal, suffering from hallucinations. In contrast, fine-tuning generative models was shown to be essential for minimizing hallucinations and achieving superior performance. Using our fine-tuned multilingual model for both tasks, we outperform our extractive and monolingual approaches, achieving top results for Spanish and second-best for English in the competition. Our findings indicate that fine-tuned large language models are well-suited for causal Q&A from complex financial narratives, offering robust multilingual capabilities and effectively mitigating hallucinations.
AB - This paper presents our approach and findings for participating in the FinCausal 2025 competition (Moreno-Sandoval et al., 2025), which addresses causal question answering derived from financial documents, specifically English and Spanish annual reports. We investigate the effectiveness of generative models, such as Llama, in contrast to common extractive methods like BERT-based token classification. While prompt optimization and few-shot learning offer some improvements, they were insufficient for consistently outperforming extractive methods in FinCausal, suffering from hallucinations. In contrast, fine-tuning generative models was shown to be essential for minimizing hallucinations and achieving superior performance. Using our fine-tuned multilingual model for both tasks, we outperform our extractive and monolingual approaches, achieving top results for Spanish and second-best for English in the competition. Our findings indicate that fine-tuned large language models are well-suited for causal Q&A from complex financial narratives, offering robust multilingual capabilities and effectively mitigating hallucinations.
UR - https://www.scopus.com/pages/publications/85217784588
M3 - Conference paper
AN - SCOPUS:85217784588
T3 - Proceedings - International Conference on Computational Linguistics, COLING
SP - 253
EP - 258
BT - Joint Workshop of the 9th Financial Technology and Natural Language Processing, FinNLP 2025, the 6th Financial Narrative Processing, FNP 2025, and the 1st Workshop on Large Language Models for Finance and Legal, LLMFinLegal 2025
A2 - Chen, Chung-Chi
A2 - Moreno-Sandoval, Antonio
A2 - Huang, Jimin
A2 - Xie, Qianqian
A2 - Ananiadou, Sophia
A2 - Chen, Hsin-Hsi
PB - Association for Computational Linguistics (ACL)
Y2 - 19 January 2025 through 20 January 2025
ER -