Volltext-Downloads (blau) und Frontdoor-Views (grau)

Universal Dependencies are hard to parse – or are they?

  • Universal Dependency (UD) annotations, despite their usefulness for cross-lingual tasks and semantic applications, are not optimised for statistical parsing. In the paper, we ask what exactly causes the decrease in parsing accuracy when training a parser on UD-style annotations and whether the effect is similarly strong for all languages. We conduct a series of experiments where we systematically modify individual annotation decisions taken in the UD scheme and show that this results in an increased accuracy for most, but not for all languages. We show that the encoding in the UD scheme, in particular the decision to encode content words as heads, causes an increase in dependency length for nearly all treebanks and an increase in arc direction entropy for many languages, and evaluate the effect this has on parsing accuracy.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Ines Rehbein, Julius Steen, Bich-Ngoc Do, Anette Frank
URN:urn:nbn:de:bsz:mh39-80232
URL:http://aclweb.org/anthology/W17-6525
ISBN:978-91-7685-467-9
Parent Title (English):Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017). September 18-20, 2017 Università di Pisa Istituto di Linguistica Computazionale “A. Zampolli”, CNR Pisa
Series (Serial Number):Linköping Electronic Conference Proceedings (139)
Publisher:Linköping University Electronic Press
Place of publication:Linköping, Schweden
Editor:Simonetta Montemagni, Joakim Nivre
Document Type:Conference Proceeding
Language:English
Year of first Publication:2017
Date of Publication (online):2018/10/02
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
GND Keyword:Annotation; Parser; Syntax; Universalgrammatik
First Page:218
Last Page:228
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Computerlinguistik
Program areas:Digitale Sprachwissenschaft
Licence (English):License LogoCreative Commons - Attribution 4.0 International