Extending the OLAP Technology for Social Media Analysis

Lade...
Vorschaubild
Dateien
Rehman_0-290919.pdf
Rehman_0-290919.pdfGröße: 9.44 MBDownloads: 5421
Datum
2015
Herausgeber:innen
Kontakt
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
DOI (zitierfähiger Link)
ArXiv-ID
Internationale Patentnummer
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Open Access Green
Core Facility der Universität Konstanz
Gesperrt bis
Titel in einer weiteren Sprache
Forschungsvorhaben
Organisationseinheiten
Zeitschriftenheft
Publikationstyp
Dissertation
Publikationsstatus
Published
Erschienen in
Zusammenfassung

Contemporary decision support and information systems have been fundamental to the smooth operation and growth of successful businesses across the globe for over two decades now. Data warehousing and OLAP are at the core of these systems and have been instrumental in encyclopedic data analysis in multifarious domains like manufacturing industry, retail sector, financial services, transportation, telecommunications, utilities, healthcare, education, research and government. With the emergence of new data problems and domains e.g., spatial, sequence and multimedia data etc., data warehouse systems the underlying technology, methods and techniques have been extended to provide the same standard performance they are known for.
A relatively new problem domain is that of social media that has shaped the last couple of years of the 21st century. The revolution social media has brought about, has impacted almost all walks of life. The ever expanding Internet and cheap hand-held electronic devices have contributed to the popularity of social media and have added millions of users to these web sites. Social media have been playing an important role in politics, disasters, sports, entertainment, health, education, government and business domain. These websites exist by the virtue of users and their activity. The user-generated content on these sites amounts to huge volumes and is generated at high pace and attracts research and commercial interests of many.
The aim of this thesis is to extend the OLAP framework for social media analysis and to provide enabling environment for social business intelligence. Data warehouses and OLAP operate on strictly structured data objects and the pre-established relationships among these objects in order to provide multidimensional analysis e fficiently. While data originating from social media is semi-structured and unstructured and exhibit a degree of dynamism. In this thesis, we bridge the gap between OLAP and social media by enabling the former to operate and deal with the latter by proposing a set of methods from modeling, to storing and querying user-generated data on the social media.
We survey the data models of the social media and propose the corresponding transformations in the multidimensional data modeling landscape. Specifically, we obtained the multidimensional view of the data originating from social media based on the metadata. The underlying dataset is enriched by using numerous methods from Natural Language Processing, Text Mining and Data Mining. These methods include language detection, sentiment analysis, named entity recognition, topic extraction and the classical data mining algorithms like classification and clustering. The outcome of these methods include objects like facts, dimensions, dimensional hierarchies, hierarchy levels and cubes. We resorted to the X-DFM (Extended Dimensional Fact) Modeling as it supports data modeling of the newly discovered and dynamic data elements in the dimensionality landscape. Dimensionality modeling is based on the static dimensions and changing facts principle, however, social media pose the challenge of even changing dimension . We investigate proposals in the literature on storing, maintaining and querying such dynamic dimensions. Our recommendations are based on slowly changing dimensions (SCD) and argue it's applicability with the help of examples. We further propose a three layered business intelligence framework that obtains data from social media and stores it in the data warehouse along with the enterprise business data. The user-generated content from social media undergoes semantic enrichment and is then modeled in accordance with the OLAP standards. Having social media data and enterprise data in this format, makes provisions for social-medium specific analysis, cross-media analysis and business analysis with respect to the social media, e.g., Social OLAP, Social CRM etc.
Taming user-generated data from social media and integrating it into the OLAP environment allows for multidimensional analysis of social media and business from useful and newly discovered perspectives. To the best of our knowledge, other relevant works only focus on a smaller and targeted problem, while our work focuses on multiple problems and applications. However, we do not claim that it covered all aspects of this complex problem and understand the fact that it is unworkable in a single PhD.

Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
004 Informatik
Schlagwörter
Social Media, Data Warehouse, OLAP, Database
Konferenz
Rezension
undefined / . - undefined, undefined
Zitieren
ISO 690REHMAN, Nafees Ur, 2015. Extending the OLAP Technology for Social Media Analysis [Dissertation]. Konstanz: University of Konstanz
BibTex
@phdthesis{Rehman2015Exten-31013,
  year={2015},
  title={Extending the OLAP Technology for Social Media Analysis},
  author={Rehman, Nafees Ur},
  address={Konstanz},
  school={Universität Konstanz}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/31013">
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-05-21T11:03:55Z</dcterms:available>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:issued>2015</dcterms:issued>
    <dc:creator>Rehman, Nafees Ur</dc:creator>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/31013/3/Rehman_0-290919.pdf"/>
    <dc:rights>terms-of-use</dc:rights>
    <dc:contributor>Rehman, Nafees Ur</dc:contributor>
    <dcterms:title>Extending the OLAP Technology for Social Media Analysis</dcterms:title>
    <dcterms:abstract xml:lang="eng">Contemporary decision support and information systems have been fundamental to the smooth operation and growth of successful businesses across the globe for over two decades now. Data warehousing and OLAP are at the core of these systems and have been instrumental in encyclopedic data analysis in multifarious domains like manufacturing industry, retail sector,  financial services, transportation, telecommunications, utilities, healthcare, education, research and government. With the emergence of new data problems and domains e.g., spatial, sequence and multimedia data etc., data warehouse systems  the underlying technology, methods and techniques  have been extended to provide the same standard performance they are known for.&lt;br /&gt;A relatively new problem domain is that of social media that has shaped the last couple of years of the 21st century. The revolution social media has brought about, has impacted almost all walks of life. The ever expanding Internet and cheap hand-held electronic devices have contributed to the popularity of social media and have added millions of users to these web sites. Social media have been playing an important role in politics, disasters, sports, entertainment, health, education, government and business domain. These websites exist by the virtue of users and their activity. The user-generated content on these sites amounts to huge volumes and is generated at high pace and attracts research and commercial interests of many.&lt;br /&gt;The aim of this thesis is to extend the OLAP framework for social media analysis and to provide enabling environment for social business intelligence. Data warehouses and OLAP operate on strictly structured data objects and the pre-established relationships among these objects in order to provide multidimensional analysis e fficiently. While data originating from social media is semi-structured and unstructured and exhibit a degree of dynamism. In this thesis, we bridge the gap between OLAP and social media by enabling the former to operate and deal with the latter by proposing a set of methods from modeling, to storing and querying user-generated data on the social media.&lt;br /&gt;We survey the data models of the social media and propose the corresponding transformations in the multidimensional data modeling landscape. Specifically, we obtained the multidimensional view of the data originating from social media based on the metadata. The underlying dataset is enriched by using numerous methods from Natural Language Processing, Text Mining and Data Mining. These methods include language detection, sentiment analysis, named entity recognition, topic extraction and the classical data mining algorithms like classification and clustering. The outcome of these methods include objects like facts, dimensions, dimensional hierarchies, hierarchy levels and cubes. We resorted to the X-DFM (Extended Dimensional Fact) Modeling as it supports data modeling of the newly discovered and dynamic data elements in the dimensionality landscape. Dimensionality modeling is based on the  static  dimensions and  changing  facts principle, however, social media pose the challenge of even  changing dimension . We investigate proposals in the literature on storing, maintaining and querying such dynamic dimensions. Our recommendations are based on slowly changing dimensions (SCD) and argue it's applicability with the help of examples. We further propose a three layered business intelligence framework that obtains data from social media and stores it in the data warehouse along with the enterprise business data. The user-generated content from social media undergoes semantic enrichment and is then modeled in accordance with the OLAP standards. Having social media data and enterprise data in this format, makes provisions for social-medium specific analysis, cross-media analysis and business analysis with respect to the social media, e.g., Social OLAP, Social CRM etc.&lt;br /&gt;Taming user-generated data from social media and integrating it into the OLAP environment allows for multidimensional analysis of social media and business from useful and newly discovered perspectives. To the best of our knowledge, other relevant works only focus on a smaller and targeted problem, while our work focuses on multiple problems and applications. However, we do not claim that it covered all aspects of this complex problem and understand the fact that it is unworkable in a single PhD.</dcterms:abstract>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-05-21T11:03:55Z</dc:date>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/31013"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/31013/3/Rehman_0-290919.pdf"/>
    <dc:language>eng</dc:language>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
  </rdf:Description>
</rdf:RDF>
Interner Vermerk
xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter
Kontakt
URL der Originalveröffentl.
Prüfdatum der URL
Prüfungsdatum der Dissertation
March 16, 2015
Hochschulschriftenvermerk
Konstanz, Univ., Diss., 2015
Finanzierungsart
Kommentar zur Publikation
Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Begutachtet
Diese Publikation teilen