Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter : Machine Learning Approach

Lade...
Vorschaubild
Dateien
Metzler_2-1rphbc597ilgk9.pdf
Metzler_2-1rphbc597ilgk9.pdfGröße: 1.22 MBDownloads: 62
Datum
2022
Autor:innen
Metzler, Hannah
Baginski, Hubert
Niederkrotenthaler, Thomas
Herausgeber:innen
Kontakt
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
DOI (zitierfähiger Link)
ArXiv-ID
Internationale Patentnummer
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Open Access Gold
Core Facility der Universität Konstanz
Gesperrt bis
Titel in einer weiteren Sprache
Forschungsvorhaben
Organisationseinheiten
Zeitschriftenheft
Publikationstyp
Zeitschriftenartikel
Publikationsstatus
Published
Erschienen in
Journal of Medical Internet Research. International Committee of Medical Journal Editors. 2022, 24(8), e34705. ISSN 1439-4456. eISSN 1438-8871. Available under: doi: 10.2196/34705
Zusammenfassung

Background: Research has repeatedly shown that exposure to suicide-related news media content is associated with suicide rates, with some content characteristics likely having harmful and others potentially protective effects. Although good evidence exists for a few selected characteristics, systematic and large-scale investigations are lacking. Moreover, the growing importance of social media, particularly among young adults, calls for studies on the effects of the content posted on these platforms.
Objective: This study applies natural language processing and machine learning methods to classify large quantities of social media data according to characteristics identified as potentially harmful or beneficial in media effects research on suicide and prevention. Methods: We manually labeled 3202 English tweets using a novel annotation scheme that classifies suicide-related tweets into 12 categories. Based on these categories, we trained a benchmark of machine learning models for a multiclass and a binary classification task. As models, we included a majority classifier, an approach based on word frequency (term frequency-inverse document frequency with a linear support vector machine) and 2 state-of-the-art deep learning models (Bidirectional Encoder Representations from Transformers [BERT] and XLNet). The first task classified posts into 6 main content categories, which are particularly relevant for suicide prevention based on previous evidence. These included personal stories of either suicidal ideation and attempts or coping and recovery, calls for action intending to spread either problem awareness or prevention-related information, reporting of suicide cases, and other tweets irrelevant to these 5 categories. The second classification task was binary and separated posts in the 11 categories referring to actual suicide from posts in the off-topic category, which use suicide-related terms in another meaning or context.
Results: In both tasks, the performance of the 2 deep learning models was very similar and better than that of the majority or the word frequency classifier. BERT and XLNet reached accuracy scores above 73% on average across the 6 main categories in the test set and F1-scores between 0.69 and 0.85 for all but the suicidal ideation and attempts category (F1=0.55). In the binary classification task, they correctly labeled around 88% of the tweets as about suicide versus off-topic, with BERT achieving F1-scores of 0.93 and 0.74, respectively. These classification performances were similar to human performance in most cases and were comparable with state-of-the-art models on similar tasks.
Conclusions: The achieved performance scores highlight machine learning as a useful tool for media effects research on suicide. The clear advantage of BERT and XLNet suggests that there is crucial information about meaning in the context of words beyond mere word frequencies in tweets about suicide. By making data labeling more efficient, this work has enabled large-scale investigations on harmful and protective associations of social media content with suicide rates and help-seeking behavior.

Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
320 Politik
Schlagwörter
suicide prevention; Twitter; social media; machine learning; deep learning
Konferenz
Rezension
undefined / . - undefined, undefined
Zitieren
ISO 690METZLER, Hannah, Hubert BAGINSKI, Thomas NIEDERKROTENTHALER, David GARCIA, 2022. Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter : Machine Learning Approach. In: Journal of Medical Internet Research. International Committee of Medical Journal Editors. 2022, 24(8), e34705. ISSN 1439-4456. eISSN 1438-8871. Available under: doi: 10.2196/34705
BibTex
@article{Metzler2022Detec-59967,
  year={2022},
  doi={10.2196/34705},
  title={Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter : Machine Learning Approach},
  number={8},
  volume={24},
  issn={1439-4456},
  journal={Journal of Medical Internet Research},
  author={Metzler, Hannah and Baginski, Hubert and Niederkrotenthaler, Thomas and Garcia, David},
  note={Article Number: e34705}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/59967">
    <dc:creator>Baginski, Hubert</dc:creator>
    <dc:rights>terms-of-use</dc:rights>
    <dc:language>eng</dc:language>
    <dc:creator>Garcia, David</dc:creator>
    <dcterms:issued>2022</dcterms:issued>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/42"/>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/59967/1/Metzler_2-1rphbc597ilgk9.pdf"/>
    <dc:contributor>Garcia, David</dc:contributor>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dc:creator>Metzler, Hannah</dc:creator>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:abstract xml:lang="eng">Background: Research has repeatedly shown that exposure to suicide-related news media content is associated with suicide rates, with some content characteristics likely having harmful and others potentially protective effects. Although good evidence exists for a few selected characteristics, systematic and large-scale investigations are lacking. Moreover, the growing importance of social media, particularly among young adults, calls for studies on the effects of the content posted on these platforms.&lt;br /&gt;Objective: This study applies natural language processing and machine learning methods to classify large quantities of social media data according to characteristics identified as potentially harmful or beneficial in media effects research on suicide and prevention. Methods: We manually labeled 3202 English tweets using a novel annotation scheme that classifies suicide-related tweets into 12 categories. Based on these categories, we trained a benchmark of machine learning models for a multiclass and a binary classification task. As models, we included a majority classifier, an approach based on word frequency (term frequency-inverse document frequency with a linear support vector machine) and 2 state-of-the-art deep learning models (Bidirectional Encoder Representations from Transformers [BERT] and XLNet). The first task classified posts into 6 main content categories, which are particularly relevant for suicide prevention based on previous evidence. These included personal stories of either suicidal ideation and attempts or coping and recovery, calls for action intending to spread either problem awareness or prevention-related information, reporting of suicide cases, and other tweets irrelevant to these 5 categories. The second classification task was binary and separated posts in the 11 categories referring to actual suicide from posts in the off-topic category, which use suicide-related terms in another meaning or context.&lt;br /&gt;Results: In both tasks, the performance of the 2 deep learning models was very similar and better than that of the majority or the word frequency classifier. BERT and XLNet reached accuracy scores above 73% on average across the 6 main categories in the test set and F1-scores between 0.69 and 0.85 for all but the suicidal ideation and attempts category (F1=0.55). In the binary classification task, they correctly labeled around 88% of the tweets as about suicide versus off-topic, with BERT achieving F1-scores of 0.93 and 0.74, respectively. These classification performances were similar to human performance in most cases and were comparable with state-of-the-art models on similar tasks.&lt;br /&gt;Conclusions: The achieved performance scores highlight machine learning as a useful tool for media effects research on suicide. The clear advantage of BERT and XLNet suggests that there is crucial information about meaning in the context of words beyond mere word frequencies in tweets about suicide. By making data labeling more efficient, this work has enabled large-scale investigations on harmful and protective associations of social media content with suicide rates and help-seeking behavior.</dcterms:abstract>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-01-30T09:13:38Z</dc:date>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/59967/1/Metzler_2-1rphbc597ilgk9.pdf"/>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/59967"/>
    <dc:creator>Niederkrotenthaler, Thomas</dc:creator>
    <dc:contributor>Niederkrotenthaler, Thomas</dc:contributor>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/42"/>
    <dc:contributor>Baginski, Hubert</dc:contributor>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-01-30T09:13:38Z</dcterms:available>
    <dcterms:title>Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter : Machine Learning Approach</dcterms:title>
    <dc:contributor>Metzler, Hannah</dc:contributor>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
  </rdf:Description>
</rdf:RDF>
Interner Vermerk
xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter
Kontakt
URL der Originalveröffentl.
Prüfdatum der URL
Prüfungsdatum der Dissertation
Finanzierungsart
Kommentar zur Publikation
Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Ja
Diese Publikation teilen