TY - GEN
T1 - FLEX: Fault Localization and Explanation Using Open-Source Large Language Models in Powertrain Systems
AU - Muehlburger, Herbert
AU - Wotawa, Franz
N1 - Publisher Copyright:
© Herbert Muehlburger and Franz Wotawa.
PY - 2024/11/26
Y1 - 2024/11/26
N2 - Cyber-physical systems (CPS) are critical to modern infrastructure, but are vulnerable to faults and anomalies that threaten their operational safety. In this work, we evaluate the use of open-source Large Language Models (LLMs), such as Mistral 7B, Llama3.1:8b-instruct-fp16, and others to detect anomalies in two distinct datasets: battery management and powertrain systems. Our methodology utilises retrieval-augmented generation (RAG) techniques, incorporating a novel two-step process where LLMs first infer operational rules from normal behavior before applying these rules for fault detection. During the experiments, we found that the original prompt design yielded strong results for the battery dataset but required modification for the powertrain dataset to improve performance. The adjusted prompt, which emphasises rule inference, significantly improved anomaly detection for the powertrain dataset. Experimental results show that models like Mistral 7B achieved F1-scores up to 0.99, while Llama3.1:8b-instruct-fp16 and Gemma 2 reached perfect F1-scores of 1.0 in complex scenarios. These findings demonstrate the impact of effective prompt design and rule inference in improving LLM-based fault detection for CPS, contributing to increased operational resilience.
AB - Cyber-physical systems (CPS) are critical to modern infrastructure, but are vulnerable to faults and anomalies that threaten their operational safety. In this work, we evaluate the use of open-source Large Language Models (LLMs), such as Mistral 7B, Llama3.1:8b-instruct-fp16, and others to detect anomalies in two distinct datasets: battery management and powertrain systems. Our methodology utilises retrieval-augmented generation (RAG) techniques, incorporating a novel two-step process where LLMs first infer operational rules from normal behavior before applying these rules for fault detection. During the experiments, we found that the original prompt design yielded strong results for the battery dataset but required modification for the powertrain dataset to improve performance. The adjusted prompt, which emphasises rule inference, significantly improved anomaly detection for the powertrain dataset. Experimental results show that models like Mistral 7B achieved F1-scores up to 0.99, while Llama3.1:8b-instruct-fp16 and Gemma 2 reached perfect F1-scores of 1.0 in complex scenarios. These findings demonstrate the impact of effective prompt design and rule inference in improving LLM-based fault detection for CPS, contributing to increased operational resilience.
KW - anomaly detection
KW - Fault detection
KW - large language models
KW - open-source LLMs
KW - powertrain systems
UR - http://www.scopus.com/inward/record.url?scp=85211914562&partnerID=8YFLogxK
U2 - 10.4230/OASIcs.DX.2024.25
DO - 10.4230/OASIcs.DX.2024.25
M3 - Conference paper
AN - SCOPUS:85211914562
T3 - OpenAccess Series in Informatics
BT - 35th International Conference on Principles of Diagnosis and Resilient Systems, DX 2024
A2 - Pill, Ingo
A2 - Natan, Avraham
A2 - Wotawa, Franz
PB - Schloss Dagstuhl - Leibniz-Zentrum für Informatik
T2 - 35th International Conference on Principles of Diagnosis and Resilient Systems, DX 2024
Y2 - 4 November 2024 through 7 November 2024
ER -