Deciphering Undersegmented Ancient Scripts Using Phonetic Prior
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
URI (zitierfähiger Link)
DOI (zitierfähiger Link)
Internationale Patentnummer
Link zur Lizenz
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Sammlungen
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges: (1) the scripts are not fully segmented into words; (2) the closest known language is not determined. We propose a decipherment model that handles both of these challenges by building on rich linguistic constraints reflecting consistent patterns in historical sound change. We capture the natural phonological geometry by learning character embeddings based on the International Phonetic Alphabet (IPA). The resulting generative framework jointly models word segmentation and cognate alignment, informed by phonological constraints. We evaluate the model on both deciphered languages (Gothic, Ugaritic) and an undeciphered one (Iberian). The experiments show that incorporating phonetic geometry leads to clear and consistent gains. Additionally, we propose a measure for language closeness which correctly identifies related languages for Gothic and Ugaritic. For Iberian, the method does not show strong evidence supporting Basque as a related language, concurring with the favored position by the current scholarship.1
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
LUO, Jiaming, Frederik HARTMANN, Enrico SANTUS, Regina BARZILAY, Yuan CAO, 2021. Deciphering Undersegmented Ancient Scripts Using Phonetic Prior. In: Transactions of the Association for Computational Linguistics. MIT Press. 2021, 9, pp. 69-81. eISSN 2307-387X. Available under: doi: 10.1162/tacl_a_00354BibTex
@article{Luo2021Decip-57257, year={2021}, doi={10.1162/tacl_a_00354}, title={Deciphering Undersegmented Ancient Scripts Using Phonetic Prior}, volume={9}, journal={Transactions of the Association for Computational Linguistics}, pages={69--81}, author={Luo, Jiaming and Hartmann, Frederik and Santus, Enrico and Barzilay, Regina and Cao, Yuan} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/57257"> <dc:contributor>Luo, Jiaming</dc:contributor> <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/57257/1/Luo_2-74mf4c553nsp9.pdf"/> <dc:creator>Cao, Yuan</dc:creator> <dcterms:issued>2021</dcterms:issued> <dc:language>eng</dc:language> <dc:contributor>Hartmann, Frederik</dc:contributor> <dcterms:title>Deciphering Undersegmented Ancient Scripts Using Phonetic Prior</dcterms:title> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/> <dcterms:abstract xml:lang="eng">Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges: (1) the scripts are not fully segmented into words; (2) the closest known language is not determined. We propose a decipherment model that handles both of these challenges by building on rich linguistic constraints reflecting consistent patterns in historical sound change. We capture the natural phonological geometry by learning character embeddings based on the International Phonetic Alphabet (IPA). The resulting generative framework jointly models word segmentation and cognate alignment, informed by phonological constraints. We evaluate the model on both deciphered languages (Gothic, Ugaritic) and an undeciphered one (Iberian). The experiments show that incorporating phonetic geometry leads to clear and consistent gains. Additionally, we propose a measure for language closeness which correctly identifies related languages for Gothic and Ugaritic. For Iberian, the method does not show strong evidence supporting Basque as a related language, concurring with the favored position by the current scholarship.1</dcterms:abstract> <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/57257/1/Luo_2-74mf4c553nsp9.pdf"/> <dc:contributor>Barzilay, Regina</dc:contributor> <dc:contributor>Cao, Yuan</dc:contributor> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2022-04-11T14:42:07Z</dc:date> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/57257"/> <dc:creator>Barzilay, Regina</dc:creator> <dc:rights>terms-of-use</dc:rights> <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/> <dc:contributor>Santus, Enrico</dc:contributor> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dc:creator>Hartmann, Frederik</dc:creator> <dc:creator>Santus, Enrico</dc:creator> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2022-04-11T14:42:07Z</dcterms:available> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/> <dc:creator>Luo, Jiaming</dc:creator> </rdf:Description> </rdf:RDF>