Abstract
This paper investigates prominence-aware automatic speech recognition (ASR) by combining prominence detection and speech recognition for conversational Austrian German. First, prominence detectors were developed by fine-tuning wav2vec2 models to classify word-level prominence. The detector was then used to automatically annotate prosodic prominence in a large corpus. Based on those annotations, we trained novel prominence-aware ASR systems that simultaneously transcribe words and their prominence levels. The integration of prominence information did not change performance compared to our baseline ASR system, while reaching a prominence detection accuracy of 85.53% for utterances where the recognized word sequence was correct. This paper shows that transformer-based models can effectively encode prosodic information and represents a novel contribution to prosody-enhanced ASR, with potential applications for linguistic research and prosody-informed dialogue systems.
| Original language | English |
|---|---|
| Publisher | arXiv |
| Number of pages | 5 |
| DOIs | |
| Publication status | Published - 12 Sept 2025 |
Fields of Expertise
- Information, Communication & Computing
Fingerprint
Dive into the research topics of 'Prominence-aware automatic speech recognition for conversational speech'. Together they form a unique fingerprint.Projects
- 1 Finished
-
FWF - Spontansprache - Cross-layer language models for conversational speech
Schuppler, B. (Consortium manager resp. coordinator with external organisations) & Schuppler, B. (Project manager on research unit)
1/11/19 → 31/10/24
Project: Research project
Research output
- 1 Paper
-
Using word-level features for prosodic prominence detection in conversational speech
Linke, J., Kubin, G. & Schuppler, B., 2023, p. 3101. 3105 p.Research output: Contribution to conference › Paper › peer-review
Open Access
Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS