Skip to main navigation Skip to search Skip to main content

Speechcake: Version Control for Speech Corpora

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

While the audio recordings of a corpus represent the ground truth, transcriptions are – in the case of manual annotations – subject to human error, and subject to changes related to technology improvements underpinning automated annotation methods. In order to facilitate the dynamic extension of speech corpora, we introduce Speechcake, a tool for centralized version control for speech corpora, enabling the automatic check-in and merging of annotations. It considers typical workflows of phoneticians, linguists and speech technologists, and enables the development of dynamic, collaborative, and perpetually-improving speech corpora.

Original languageEnglish
Title of host publication20th Conference on Natural Language Processing, KONVENS 2024 - Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages303-308
Number of pages6
Publication statusPublished - 2024
Event20th Conference on Natural Language Processing, KONVENS 2024 - Vienna, Austria
Duration: 10 Sept 202413 Sept 2024

Conference

Conference20th Conference on Natural Language Processing, KONVENS 2024
Country/TerritoryAustria
CityVienna
Period10/09/2413/09/24

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Speechcake: Version Control for Speech Corpora'. Together they form a unique fingerprint.

Cite this