Abstract
Causal domain knowledge is commonly documented using natural language either in unstructured or semi-structured forms. This study aims to increase the usability of causal domain knowledge in industrial documents by transforming the information into a more structured format. The paper presents our work on developing automated methods for causal information extraction from real-world industrial documents in the semiconductor manufacturing industry, including presentation slides and FMEA (Failure Mode and Effects Analysis) documents. Specifically, we evaluate two types of causal information extraction methods: single-stage sequence tagging (SST) and multi-stage sequence tagging (MST). The presented case study showcases that the proposed MST methods for extracting causal information from industrial documents are suitable for practical applications, especially for semi-structured documents such as FMEAs, with a 93% F1 score. Additionally, the study shows that extracting causal information from presentation slides is more challenging. The study highlights the importance of choosing a language model that is more aligned with the domain and in-domain pre-training.
Original language | English |
---|---|
Article number | 2573 |
Journal | Applied Sciences |
Volume | 15 |
Issue number | 5 |
Early online date | 27 Feb 2025 |
DOIs | |
Publication status | Published - Mar 2025 |
Keywords
- causal information extraction
- causal relation extraction
- FMEA
- natural language processing
- presentation slides
- semiconductor manufacturing
ASJC Scopus subject areas
- General Materials Science
- Instrumentation
- General Engineering
- Process Chemistry and Technology
- Computer Science Applications
- Fluid Flow and Transfer Processes