Skip to main navigation Skip to search Skip to main content

Prominence-aware automatic speech recognition for conversational speech

Research output: Working paperPreprint

Abstract

This paper investigates prominence-aware automatic speech recognition (ASR) by combining prominence detection and speech recognition for conversational Austrian German. First, prominence detectors were developed by fine-tuning wav2vec2 models to classify word-level prominence. The detector was then used to automatically annotate prosodic prominence in a large corpus. Based on those annotations, we trained novel prominence-aware ASR systems that simultaneously transcribe words and their prominence levels. The integration of prominence information did not change performance compared to our baseline ASR system, while reaching a prominence detection accuracy of 85.53% for utterances where the recognized word sequence was correct. This paper shows that transformer-based models can effectively encode prosodic information and represents a novel contribution to prosody-enhanced ASR, with potential applications for linguistic research and prosody-informed dialogue systems.
Original languageEnglish
PublisherarXiv
Number of pages5
DOIs
Publication statusPublished - 12 Sept 2025

Fields of Expertise

  • Information, Communication & Computing

Fingerprint

Dive into the research topics of 'Prominence-aware automatic speech recognition for conversational speech'. Together they form a unique fingerprint.

Cite this