The Sign Language Dataset Compendium: Creating an Overview of Digital Linguistic Resources

Autor/a: KOPF, Maria; SCHULDER, Marc; HANKE, Thomas
Año: 2022
Editorial: ELRA
Tipo de código: Copyright
Soporte: Digital


Lingüística, Lingüística » Sistemas de transcripción de las Lenguas de Signos


One of the challenges that sign language researchers face is the identification of suitable language datasets, particularly for cross-lingual studies. There is no single source of information on what sign language corpora and lexical resources exist or how they compare. Instead, they have to be found through extensive literature review or word-of-mouth. The amount of information available on individual datasets can also vary widely and may be distributed across different publications, data repositories and (potentially defunct) project websites. This article introduces the Sign Language Dataset Compendium, an extensive overview of linguistic resources for sign languages. It covers existing corpora and lexical resources, as well as commonly used data collection tasks. Special attention is paid to covering resources for many different languages from around the globe. All information is provided in a standardised format to make entries comparable, but kept flexible enough to allow for differences in content. The compendium is intended as a growing resource that will be updated regularly.

En Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources.