Evolutionary Tree-Structured Storage : Concepts, Interfaces, and Applications

Kramis, Marc

Evolutionary Tree-Structured Storage : Concepts, Interfaces, and Applications

Dateien

Kramis_276959.pdfGröße: 5.48 MBDownloads: 2097

Datum

2014

Open Access-Veröffentlichung

Open Access Green

Sammlungen

Informatik und Informationswissenschaft

Publikationstyp

Dissertation

Publikationsstatus

Published

Zusammenfassung

Life is subdued to constant evolution. So is our data, be it in research, business or personal information management. From a natural, evolutionary perspective, our data evolves through a sequence of fine-granular modifications resulting in myriads of states, each describing our data at a given point in time. From a technical, anti-evolutionary perspective, mainly driven by technological and financial limitations, we treat the modifications as transient commands and only store the latest state of our data.

It is surprising that the current approach is to ignore the natural evolution and to willfully forget about the sequence of modifications and therefore the past state. Sticking to this approach causes all kinds of confusion, complexity, and performance issues. Confusion, because we still somehow want to retrieve past state but are not sure how. Complexity, because we must repeatedly work around our own obsolete approaches. Performance issues, because confusion times complexity hurts. It is not surprising, however, that intelligence agencies notoriously try to collect, store, and analyze what the broad public willfully forgets.

Significantly faster and cheaper random-access storage is the key driver for a paradigm shift towards remembering the sequence of modifications. We claim that (1) faster storage allows to efficiently and cleverly handle finer-granular modifications and (2) that mandatory versioning elegantly exposes past state, radically simplifies the applications, and effectively lays a solid foundation for backing up, distributing and scaling of our data. This work shows, using the example of tree-structured XML, that the characteristics and advantages of the evolutionary approach have been recognized and consistently implemented - something, which on its own is an important achievement.

We present the concepts of our evolutionary tree-structured storage TreeTank and the general-purpose SlidingSnapshot to prove that (3) formerly modification-averse tree encodings can be maintained with logarithmic update complexity, (4) linear read scalability beyond memory limitations is still guaranteed while maintaining logarithmic update characteristics, (5) secure copy-on-write semantics can be extended from the file level to the much finer-granular node level, (6) versioned node-level access is predictable and even realtime-capable, and, that (7) node-level snapshots are as or even more space efficient than page-level or file-level snapshots. In the course of our work, we inspired the Java-based iSCSI implementation jSCSI which proved that (8) high-level language block access is fast and also established the Java benchmark framework PERFIDIX as well as the block touch visualization tool VISIDEFIX.

We extend REST, the cornerstone interface of the web, with the ability to access the full version and modification history of a resource and call it (9) Temporal REST. This interface will not only encourage application developers to make use of our evolutionary approach, but it will also foster interactive and collaborative applications because they are, according to our claim (10), less complex to write and performing so well that users can now interactively work with large-scale data.

Finally, we provide an outlook on how evolutionary (full-text) indices, applications, and schemas can greatly leverage our contributions and how special-purpose hardware can speed-up our tree-structured storage while using far less energy. Especially our suggested approach to schema handling and evolution has the potential to radically simplify ORM-based software development.

Zusammenfassung in einer weiteren Sprache

Nicht nur das Leben, sondern auch unsere Daten sind einer beständigen Evolution unterworfen, sei es in Forschung, Industrie oder im Privaten. Aus einer natürlichen evolutionären Sicht entwickeln sich unsere Daten durch eine unablässige Reihe fein-granularer Änderungen, die unzählige Versionen hervorbringen. Aus einer technischen anti-evolutionären Sicht, massgeblich durch technologische und finanzielle Einschränkungen entstanden, betrachten wir die Änderungen nur als vorübergehend und speichern vorwiegend nur die letzte Version unserer Daten.

Leider führt das Festhalten am gängigen Ansatz, die natürliche Evolution zu ignorieren und die vergangenen Versionen bewusst zu vergessen, zu Verwirrung, Komplexität und Geschwindigkeitseinbussen. Verwirrung, weil wir trotzdem immer wieder auf vergangene Versionen zugreifen müssen. Komplexität, weil wir wiederholt die Mängel unseres Ansatzes überwinden müssen. Geschwindigkeitseinbussen, weil Verwirrung gepaart mit Komplexität wenig Erfolg verspricht. Interessanterweise versuchen Nachrichtendienste notorisch, genau die Daten zu sammeln, zu speichern und auszuwerten, die die breite Öffentlichkeit bewusst verwirft.

Immer schnellere und günstigere Speicher sind der Haupttreiber für einen Wechsel hin zum Nichtvergessen vergangener Änderungen. Wir halten fest, dass (1) schnellere Speicher den effizienteren Umgang mit fein-granularen Änderungen sowie (2) einen eleganteren Zugriff auf vergangene Versionen ermöglichen, die Anwendungen vereinfachen und eine solide Grundlage für Backup und Verteilung unserer Daten legen. Die vorliegende Arbeit zeigt am Beispiel von XML, dass die Eigenschaften und Vorteile des evolutionären Ansatzes erkannt und konsequent umgesetzt wurden - eine Tatsache, die für sich allein eine wichtige Errungenschaft ist.

Wir beweisen an Hand unseres evolutionären baumstrukturierten Speichers TreeTank sowie des universalen SlidingSnapshot, dass (3) vormals änderungsaverse Kodierungen baumstrukturierter Daten in logarithmischer Zeit geändert werden können, dass (4) die lineare Skalierbarkeit lesender Zugriffe bei gleichzeitig logarithmischem Aufwand für Änderungen sichergestellt bleibt, dass (5) das sichere Kopieren-beim-Schreiben von der Datei- auf die wesentlich feiner-granulare Knoten-Ebene angewendet werden kann, dass (6) der versionierte Zugriff auf Knoten-Ebene vorhersagbar und echtzeitfähig ist, und dass (7) Snapshots auf Knoten-Ebene maximal so viel, oft weniger Platz benötigen, wie Snapshots auf Datei- oder Seiten-Ebene. Wir haben zudem die Entwicklung einer Java-basierten iSCSI Implementation namens jSCSI initiiert, an Hand derer gezeigt werden konnte, dass (8) Hochsprachen einen schnellen Zugriff auf Block-orientierte Speicher ermöglichen und haben zudem das Java Benchmark Framework PERFIDIX sowie das Tool VISIDEFIX zur Visualisierung von Block-Zugriffen etabliert.

Wir erweitern REST, die Kern-Schnittstelle des Internet, (9) um den Zugriff auf die volle Versions- und Änderungshistorie einer Ressource. Temporal REST wird interaktive Anwendungen beflügeln, weil diese, dank unserer Schnittstellen-Erweiterung (10) weniger komplex und so performant in der Ausführung sind, dass Benutzer interaktiv mit grossen Datenmengen arbeiten können.

Schliesslich zeigen wir auf, wie künftig evolutionäre (Volltext-)Indizes, Anwendungen und Schemas von unseren Beiträgen profitieren und wie spezialisierte, energiesparende Hardware unseren baumstrukturierten Speicher beschleunigen kann. Insbesondere unsere Anregung zur Arbeit und Evolution an und von Schemas hat das Potential, die ORM-basierte Softwareentwicklung radikal zu vereinfachen.

Fachgebiet (DDC)

004 Informatik

Schlagwörter

Datenspeicher, Version, Storage, Security, Database, XML database, Versioning

Zitieren

ISO 690

KRAMIS, Marc, 2014. Evolutionary Tree-Structured Storage : Concepts, Interfaces, and Applications [Dissertation]. Konstanz: University of Konstanz

BibTex

@phdthesis{Kramis2014Evolu-27695,
  year={2014},
  title={Evolutionary Tree-Structured Storage : Concepts, Interfaces, and Applications},
  author={Kramis, Marc},
  address={Konstanz},
  school={Universität Konstanz}
}

RDF

<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/27695">
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2014-05-05T05:48:47Z</dc:date>
    <dcterms:title>Evolutionary Tree-Structured Storage : Concepts, Interfaces, and Applications</dcterms:title>
    <dc:language>eng</dc:language>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:creator>Kramis, Marc</dc:creator>
    <dc:rights>terms-of-use</dc:rights>
    <dc:contributor>Kramis, Marc</dc:contributor>
    <dcterms:abstract xml:lang="eng">Life is subdued to constant evolution. So is our data, be it in research, business or personal information management. From a natural, evolutionary perspective, our data evolves through a sequence of fine-granular modifications resulting in myriads of states, each  describing our data at a given point in time. From a technical, anti-evolutionary perspective, mainly driven by technological and financial limitations, we treat the modifications as transient commands and only store the latest state of our data.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;It is surprising that the current approach is to ignore the natural evolution and to willfully forget about the sequence of modifications and therefore the past state. Sticking to this approach causes all kinds of confusion, complexity, and performance issues. Confusion, because we still somehow want to retrieve past state but are not sure how. Complexity, because we must repeatedly work around our own obsolete approaches. Performance issues, because confusion times complexity hurts. It is not surprising, however, that  intelligence agencies notoriously try to collect, store, and analyze what the broad public willfully forgets.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Significantly faster and cheaper random-access storage is the key driver for a paradigm shift towards remembering the sequence of modifications. We claim that (1) faster storage allows to efficiently and cleverly handle finer-granular modifications and (2) that mandatory versioning elegantly exposes past state, radically simplifies the applications, and effectively lays a solid foundation for backing up, distributing and scaling of our data. This work shows, using the example of tree-structured XML, that the characteristics and advantages of the evolutionary approach have been recognized and consistently implemented - something, which on its own is an important achievement.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;We present the concepts of our evolutionary tree-structured storage TreeTank and the general-purpose SlidingSnapshot to prove that (3) formerly modification-averse tree encodings can be maintained with logarithmic update complexity, (4) linear read scalability beyond memory limitations is still guaranteed while maintaining logarithmic update characteristics, (5) secure copy-on-write semantics can be extended from the file level to the much finer-granular node level, (6) versioned node-level access is predictable and even realtime-capable, and, that (7) node-level snapshots are as or even more space efficient than page-level or file-level snapshots. In the course of our work, we inspired the Java-based iSCSI implementation jSCSI which proved that (8) high-level language block access is fast and also established the Java benchmark framework PERFIDIX as well as the block touch visualization tool VISIDEFIX.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;We extend REST, the cornerstone interface of the web, with the ability to access the full version and modification history of a resource and call it (9) Temporal REST. This interface will not only encourage application developers to make use of our evolutionary approach, but it will also foster interactive and collaborative applications because they are, according to our claim (10), less complex to write and performing so well that users can now interactively work with large-scale data.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Finally, we provide an outlook on how evolutionary (full-text) indices, applications, and schemas can greatly leverage our contributions and how special-purpose hardware can speed-up our tree-structured storage while using far less energy. Especially our suggested approach to schema handling and evolution has the potential to radically simplify ORM-based software development.</dcterms:abstract>
    <dcterms:issued>2014</dcterms:issued>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/27695/1/Kramis_276959.pdf"/>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/27695/1/Kramis_276959.pdf"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2014-05-05T05:48:47Z</dcterms:available>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/27695"/>
  </rdf:Description>
</rdf:RDF>

Prüfungsdatum der Dissertation

April 22, 2014

Universitätsbibliographie

Nein