How to Identify Speech When Translating Unpunctuated Poetry

  • A large proportion of (post)-modern poetry contains no or hardly any punctuation. In our contribution, we will investigate how well punctuation information can be recovered for postmodern poetry based on the information contained in the text and speech of free verse poems. We use the world's largest corpus of spoken (post-)modern poetry from our partner lyrikline which contains the corresponding audio recording of each poem as spoken by the original author and features translations for many of the poems. We identify lines that contain a phrase break in the middle of the poetic line, which may already be helpful for philological analysis on one hand, and identify the position of the break in the line on the other hand. We select those poetic lines that contain one or more punctuation characters that typically indicate a phrase break in poetry (.,;:!?/) somewhere in the middle (rather than only at the end of the line) as our target class. We train a neural network (bidirectional recurrent neural network (RNN) based on gated recurrent units (GRU) with attention) that combines audio and textual features to identify the punctuation with the goal of applying it to reconstruct them within a corpus of unpunctuated poems. Our results clearly indicate that speech is helpful for recovering the constituency structure of post-modern poetry that is partially obfuscated by missing punctuation.

Export metadata

Additional Services

Share in Twitter Search Google Scholar Statistics
Metadaten
Author:Timo BaumannORCiDGND, Burkhard Meyer-SickendiekORCiD, Hussein Hussein
URN:urn:nbn:de:gbv:18-228-7-2587
URL / DOI:http://www.essv.de/paper.php?id=452
Parent Title (English):Proceedings of Elektronische Sprachsignalverarbeitung: Tagungsband der 31. Konferenz, Magdeburg, 4.-6. März 2020
Publisher:Förderverein Elektronische Sprachsignalverabeitung e.V.
Place of publication:Magdeburg, Germany
Document Type:conference proceeding (article)
Language:English
Year of first Publication:2020
Release Date:2022/04/09
First Page:165
Last Page:172
Institutes:Fakultät Informatik und Mathematik
Begutachtungsstatus:peer-reviewed
Publication:Externe Publikationen
research focus:Information und Kommunikation
Licence (German):Keine Lizenz - Es gilt das deutsche Urheberrecht: § 53 UrhG