Abstract
Despite the rapid advancement of automatic speech recognition (ASR) systems, spontaneous conversations still pose a major challenge, which is even more of an obstacle for low-resourced languages, dialects or non-dominant varieties. What is more, lively turn-changes in conversational speech cause short utterances that have been found to be error prone for transformer-based ASR systems, requiring larger context. The question thus arises which type of context is useful: rather more from the same speaker, providing acoustically relevant context, or more from the conversation - mixing utterances from both speakers - providing semantically relevant context. Comparing seven ASR systems on conversational Austrian German, we find the best performance with a minimum of 20s of context, independent of whether it was from the same or from the other speaker. Systems fine-tuned with data from the same variety and speaking style require less context and perform overall better than zero-shot systems.
| Translated title of the contribution | Ist Kontext alles was zählt? Ressourcenarme Spracherkennung für Konversationssprache profitiert von Kontext, der vom selben oder vom anderen Sprecher stammt |
|---|---|
| Original language | English |
| Title of host publication | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Publisher | ISCA, International Speech Communication Association |
| Pages | 3199 - 3203 |
| Number of pages | 5 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | Interspeech 2025 - Rotterdam, Netherlands Duration: 17 Aug 2025 → 21 Aug 2025 https://www.interspeech2025.org/home |
Publication series
| Name | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
|---|---|
| ISSN (Print) | 2308-457X |
Conference
| Conference | Interspeech 2025 |
|---|---|
| Country/Territory | Netherlands |
| City | Rotterdam |
| Period | 17/08/25 → 21/08/25 |
| Internet address |
Keywords
- turn-taking
- context
- automatic speech recognition
- conversational speech
ASJC Scopus subject areas
- Software
- Signal Processing
- Language and Linguistics
- Modelling and Simulation
- Human-Computer Interaction
Fields of Expertise
- Information, Communication & Computing
Fingerprint
Dive into the research topics of 'Context is all you need? Low-resource conversational ASR profits from context, coming from the same or from the other speaker'. Together they form a unique fingerprint.Activities
- 1 Talk at conference or symposium
-
Context is all you need? Low-resource conversational ASR profits from context, coming from the same or from the other speaker
Linke, J. (Contributor), Winkler, J. A. (Contributor) & Schuppler, B. (Speaker)
20 Aug 2025Activity: Talk or presentation › Talk at conference or symposium › Science to science
-
What’s so complex about conversational speech? A comparison of HMM-based and transformer-based ASR architectures
Linke, J., Geiger, B., Kubin, G. & Schuppler, B., Mar 2025, In: Computer Speech and Language . 90, 101738.Research output: Contribution to journal › Article › peer-review
Open AccessFile -
Towards Improving ASR Outputs of Spontaneous Speech with LLMs
Karner, M., Linke, J., Kroell, M., Schuppler, B. & Geiger, B., 2024, Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024). Association for Computational Linguistics (ACL), p. 339-348 10 p.Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review
Open Access -
Using Kaldi for Automatic Speech Recognition of Conversational Austrian German
Linke, J., Wepner, S., Kubin, G. & Schuppler, B., 2023.Research output: Working paper › Preprint
Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS