Skip to main navigation Skip to search Skip to main content

Context is all you need? Low-resource conversational ASR profits from context, coming from the same or from the other speaker

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

Despite the rapid advancement of automatic speech recognition (ASR) systems, spontaneous conversations still pose a major challenge, which is even more of an obstacle for low-resourced languages, dialects or non-dominant varieties. What is more, lively turn-changes in conversational speech cause short utterances that have been found to be error prone for transformer-based ASR systems, requiring larger context. The question thus arises which type of context is useful: rather more from the same speaker, providing acoustically relevant context, or more from the conversation - mixing utterances from both speakers - providing semantically relevant context. Comparing seven ASR systems on conversational Austrian German, we find the best performance with a minimum of 20s of context, independent of whether it was from the same or from the other speaker. Systems fine-tuned with data from the same variety and speaking style require less context and perform overall better than zero-shot systems.
Translated title of the contributionIst Kontext alles was zählt? Ressourcenarme Spracherkennung für Konversationssprache profitiert von Kontext, der vom selben oder vom anderen Sprecher stammt
Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherISCA, International Speech Communication Association
Pages3199 - 3203
Number of pages5
DOIs
Publication statusPublished - 2025
EventInterspeech 2025 - Rotterdam, Netherlands
Duration: 17 Aug 202521 Aug 2025
https://www.interspeech2025.org/home

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Conference

ConferenceInterspeech 2025
Country/TerritoryNetherlands
CityRotterdam
Period17/08/2521/08/25
Internet address

Keywords

  • turn-taking
  • context
  • automatic speech recognition
  • conversational speech

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Language and Linguistics
  • Modelling and Simulation
  • Human-Computer Interaction

Fields of Expertise

  • Information, Communication & Computing

Fingerprint

Dive into the research topics of 'Context is all you need? Low-resource conversational ASR profits from context, coming from the same or from the other speaker'. Together they form a unique fingerprint.

Cite this