Using Palladio network links to model multicore architecture memory hierarchies

Gruber, Philipp

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: http://dx.doi.org/10.18419/opus-10676

Autor(en):	Gruber, Philipp
Titel:	Using Palladio network links to model multicore architecture memory hierarchies
Erscheinungsdatum:	2019
Dokumentart:	Abschlussarbeit (Bachelor)
Seiten:	ix, 63
URI:	http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-106933 http://elib.uni-stuttgart.de/handle/11682/10693 http://dx.doi.org/10.18419/opus-10676
Zusammenfassung:	This thesis investigates the capabilities of Palladio to predict the performance of software/hardware systems. The Palladio simulations are accurate for systems which run on single core processors. Experiments showed that the predictions are not accurate for multicore systems. The parallelization of programs is complex. In addition a parallelized program executed on four cores is not automatically four times faster than the single core program. There are reasons for this on the software/code side (e.g. Amdahl's law) but also on the hardware side (e.g. memory bandwidth). The so called memory bandwidth is referring to the capacity limit of the memory bus, the bus from the CPU to the memory. The memory bandwidth is theoretically becoming a more important factor by an increasing degree of parallelization. More cores lead to the fact that more data is flowing in shorter time that bus. A consequence is that memory bandwidth becomes a bottleneck because of an over strained memory bus, which leads to idle CPU's. Due to the fact that they have to wait to load data from the memory. Such effects of multicore systems are not taken into account by Palladio. This thesis had the target to find out, if Palladio is able to model the memory bandwidth with existing elements from their component model and subsequently if this modeling leads to more accurate predictions. Our work showed on the basis of an experiment with a matrix multiplication that this is possible, but we are not able to reach the 100% accuracy with our approach. The achieved accuracy of approximately 90 % in average indicates the existence of more factors which contribute to the non-linear speedup of multicore processors. Examples are the synchronization of shared memory or the contention for these resources. In addition our experiments lacked in confidence to determine in which quantity the memory bandwidth was a bottleneck for our specific use-case. This should be the target of future work. Diese Arbeit erforscht die Fähigkeiten von Palladio um die Leistungsfähigkeit von Software/Hardware Systemen vorherzusagen. Die Palladio Vorhersagen/Simulationen haben sich im Fall von Einkernprozessoren als akkurat erwiesen, allerdings haben Experimente gezeigt, dass dies nicht für Mehrkernprozessoren gilt. Die Parallelisierung von Programmen ist komplex. Zusätzlich ist ein parallelisiertes Programm, welches auf vier Kernen ausgeführt wird, nicht automatisch viermal schneller als eines auf einem Kern. Dies hat nicht nur Ursachen auf Code/Softwareebene, sondern wird auch durch Hardware Faktoren, wie unter anderem der Speicherbandbreite ausgelöst. Mit Speicherbandbreite ist die Kapazitätsgrenze des Bus vom CPU Kern bis zum Speicher gemeint, welche zum Flaschenhals wird. Dies bedeutet der Rechenkern muss warten bis er Werte laden oder speichern kann. Dieser Einflussfaktor nimmt theoretisch zu, je mehr Daten in kürzerer Zeit über den Bus fließen, sprich bei einem höheren Grad an Parallelisierung. Dies wird bei der bisherigen Realisierung von Mehrkernsystemen in Palladio nicht berücksichtigt. Diese Arbeit hatte das Ziel herauszufinden, ob Palladio die Speicherbandbreite mit bestehenden Mitteln/Elementen abbilden kann und ob dies zu einer höheren Genauigkeit führt. Das Ergebnis anhand eines Experiments mit einer Matrix Multiplikation hat gezeigt, dass es sowohl möglich ist wie auch zu einer größeren Genauigkeit führt, wenn dieser Ansatz verwendet wird. Dennoch gelang es bei unserem Experiment nicht eine Genauigkeit von annähernd 100 % zu erzielen, sondern im Durchschnitt nur circa 90 %. Dies deutet darauf hin, dass neben der Speicherbandbreite noch andere Faktoren eine Rolle für die nicht lineare Beschleunigung der Programme spielen. Zum Beispiel die Synchronisation der geteilten Speicher oder der Wettbewerb um die geteilten Ressourcen. Ebenso fehlt die Gewissheit, dass und in welcher Form der Speicherbus in unserem Anwendungsfall tatsächlich vollkommen ausgelastet war und zum Flaschenhals wurde. Dies müsste in der weiteren Forschung berücksichtigt und genauer untersucht werden.
Enthalten in den Sammlungen:	05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
ThesisFinal.pdf		912,18 kB	Adobe PDF	Öffnen/Anzeigen

Zur Langanzeige

Alle Ressourcen in diesem Repositorium sind urheberrechtlich geschützt.

Universität Stuttgart

OPUS - Online Publikationen der Universität Stuttgart