Projekte pro Jahr
Abstract
Since the advent of Large Language Models (LLM[s]) a few years ago, they have not only reached the mainstream but have become a commodity. Their application areas steadily expand because of sophisticated model architectures and enormous training corpora. However, accessible chatbot user interfaces and human-like responses may cause a tendency to overestimate their abilities. This study contributes to demonstrating the strengths and weaknesses of LLMs. In this work, we bridge methods from sub-symbolic and symbolic AI. In particular, we evaluate the capabilities of LLMs to convert textual requirements documents into their logical representation, enabling analysis and reasoning. This task demonstrates a use case close to industry, as requirements analysis is key in requirements and system engineering. Our experiments evaluate the popular model family used in OpenAI's ChatGPT, GPT-3.5, and GPT-4. The underlying goal of testing for the correct abstraction of meaning is not trivial, as the relationship between input and output semantics is not directly measurable. Thus, it is necessary to approximate translation correctness through quantifiable criteria. Most notably, we defined consistency-based metrics for the plausibility and stability of translations. Our experiments give insights into syntactical validity, semantic plausibility, stability of translations, and parameter configurations for LLM translations. We use real-world requirements and test the LLMs' performance out of the box and after pre-training. Experimentally, we demonstrated the strong relation between ChatGPT parameters and the stability of translations. Finally, we showed that even the best model configurations produced syntactically faulty (5%) or semantically implausible (7%) output and are not stable in their results.
Originalsprache | englisch |
---|---|
Titel | Proceedings - 2024 IEEE 24th International Conference on Software Quality, Reliability and Security, QRS 2024 |
Herausgeber (Verlag) | IEEE |
Seiten | 238-249 |
Seitenumfang | 12 |
ISBN (elektronisch) | 9798350365634 |
DOIs | |
Publikationsstatus | Veröffentlicht - 26 Sept. 2024 |
Veranstaltung | 24th IEEE International Conference on Software Quality, Reliability and Security, QRS 2024 - Cambridge, Großbritannien / Vereinigtes Königreich Dauer: 1 Juli 2024 → 5 Juli 2024 |
Publikationsreihe
Name | IEEE International Conference on Software Quality, Reliability and Security, QRS |
---|---|
ISSN (Print) | 2693-9177 |
Konferenz
Konferenz | 24th IEEE International Conference on Software Quality, Reliability and Security, QRS 2024 |
---|---|
Land/Gebiet | Großbritannien / Vereinigtes Königreich |
Ort | Cambridge |
Zeitraum | 1/07/24 → 5/07/24 |
ASJC Scopus subject areas
- Software
- Sicherheit, Risiko, Zuverlässigkeit und Qualität
- Artificial intelligence
Fields of Expertise
- Information, Communication & Computing
Fingerprint
Untersuchen Sie die Forschungsthemen von „Evaluating OpenAI Large Language Models for Generating Logical Abstractions of Technical Requirements Documents“. Zusammen bilden sie einen einzigartigen Fingerprint.Projekte
- 1 Abgeschlossen
-
CD-Labor für Methoden zur Qualitätssicherung von autonomen Cyber-Physikalischen Systemen
Wotawa, F. (Teilnehmer (Co-Investigator))
1/10/17 → 30/09/24
Projekt: Forschungsprojekt
Aktivitäten
- 1 Vortrag bei Konferenz oder Fachtagung
-
Evaluating OpenAI Large Language Models for Generating Logical Abstractions of Technical Requirements Documents
Perko, A. (Redner/in)
3 Juli 2024Aktivität: Vortrag oder Präsentation › Vortrag bei Konferenz oder Fachtagung › Science to science