Annotating dependency syntax for treebank development

Description

This tutorial will focus on dependency syntax annotation with a view to developing treebanks for NLP applications. It seeks to promote human resources development through discussion and activities aimed at building skills for treebank annotation and curation. The tutorial comprises two parts. The first part will start with a very brief introduction to dependency syntax and its potential for NLP. Next, we will introduce the main tenets of the Universal Dependencies (UD)1 framework and present the UD tagsets for morphosyntactic annotation. Subsequently, an overview of available treebanks and state-of-the art dependency parsers will be provided. In the second part of the tutorial, attendees will take part in a hands-on session using an online parser and revising its output in an online platform for dependency syntax annotation. Finally, validation of dependency syntax annotations will be explored as a step to build resources that can be contributed to the UD repository and shared among the NLP community.
While a special focus will be given to the computational processing of Portuguese, all delegates pursuing research on NLP are welcome to attend, regardless of the languages they are proficient in or are currently working on. Attendees will be able to choose the language(s) they would like to explore during the hands-on session.

Bibliography

de Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., & Manning, C. D. (2014). Universal Stanford Dependencies : A Cross-Linguistic Typology. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC), 4585–4592.
Duran, M.S. 2021. Manual de Anotação de PoS tags: Orientações para anotação de etiquetas morfossintáticas em Língua Portuguesa, seguindo as diretrizes da abordagem Universal Dependencies (UD). Relatório Técnico do ICMC 434. Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo.
Duran, M.S. 2022. Manual de Anotação de Relações de Dependência – Versão Revisada e Estendida: Orientações para anotação de relações de dependência sintática em Língua Portuguesa, seguindo as diretrizes da abordagem Universal Dependencies (UD). Relatório Técnico do ICMC 440. Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo.
Pagano, A., Duran, M.S. and Pardo, T.A.S. 2023. Enhanced dependencies para o português brasileiro. In Proceedings of the 2nd Edition of the Universal Dependencies Brazilian Festival, pp 461–470, Belo Horizonte, Brazil. Association for Computational Linguistics.
Pagano, A., Rassi, A., Pagano, A.C. 2023. A ordem e a função das palavras em uma sentença. In Caseli, H.M.; Nunes, M.G.V. (ed.) Processamento de Linguagem Natural: Conceitos, Técnicas e Aplicações em Português. BPLN, 2023. Disponível em: https://brasileiraspln.com/livro-pln.
Universal Dependencies annotation guidelines. Online documentation. Available at: https://universaldependencies.org/guidelines.html.

Syllabus

Part 1
A brief history of dependency syntax: theoretical and methodological assumptions
The Universal Dependency framework for annotation
Guidelines and tagsets
Dependency parsers
Annotation tools

Coffee-break

Part 2
Hands-on activity
Sample text pre-processing and parsing
Annotation
Validation of annotation output

Tutorial Instructors

Adriana S Pagano – Universidade Federal de Minas Gerais (UFMG)- apagano@ufmg.br

Dates

  • Workshop day: 12/03/2024