Learning Outcomes
Cognitive: Understanding of the fundamental issues in of power consumption in CMOS technology. Understanding of dynamic and speculative instruction execution. Instruction depedancies and processor performance. Understanding of memory hierarchy of modern prcessors. Memory and cache level optimization techniques. Principles of shared memory multicore archictures. Understanding of coherency and consistency problems.
Skills: Design of hardware level optimization techniques of modern processors for increasing the instructiuon parallelism (ILP). Cache optimizations at hardware, software and compiler levels. Design of shared memory multicore processors. Programming models and techniques for taking advantage other types of parallelism like helper threads, thread level speculation via speculative precomputation and/or run-ahead execution) and transactional memories.
Course Content (Syllabus)
Basic principles of 5-stage pipeline (Single-cycle and Multi-cycle pipeline architectures). Analysis of Moore’s law and Dennard’s scaling law. Principal of power consumption in CMOS technology. The transition from unicore to multicore architectures. Dynamic and speculative instruction execution. Tomasullo algorithm. Static and dynamic branch prediction techniques. Two-level branch predictors (m, n). Dynamic register renaming. Predication technique. Case studies: Core Duo and Itanium (Intel). Hardware level cache based optimizations. Victim caches, pseudo-associative caches, elbow caches. Hardware-software optimizations (replacement strategies, prefetching). Analysis of the usage of trace cache in hyperthreading architectures. Compiler level cache optimizations (loop transformations). Instruction and data prefetching techniques at the hardware, compiler, and software levels. Multicore architectures. The transition to multicores (ILP wall +power wall+memory wall = multicores). SISD, SIMD, MISD, MIMD architectures. Shared memory architectures. The cache coherency problem. Directory based and snooping/broadcast protocols. False sharing elimination techniques. Categories and types of multithreaded architectures. The CUDA GPGPU programming model. Memory ordering and memory consistency models (sequential, relaxed, weak consistency models). Memory synchronization through atomic load/stores instructions. Other types of parallelism like helper threads, thread level speculation via speculative precomputation and/or run-ahead execution) and transactional memories.
Keywords
Computer Architecture, Central Proccessing Unit, multicore processors, Software-Hardware co-design, non-von Neumann architecture
Course Bibliography (Eudoxus)
- Αρχιτεκτονική Υπολογιστών. Hennessy John L., Patterson David A. Κωδικός Βιβλίου στον Εύδοξο: 18548925. 4η Έκδοση/2011. ISBN: 978-960-418-326-5. Εκδόσεις Α. ΤΖΙΟΛΑ & ΥΙΟΙ Α.Ε.
- Τίτλος: Αρχιτεκτονική Υπολογιστών, Κωδικός Βιβλίου στον Εύδοξο: 22713808. Συγγραφείς: Δημήτριος Β. Νικολός, Αριθμός Έκδοσης 2η εκδ. Έτος Έκδοσης 2012, ISBN 978-960-93-4168-4.