Next:
Literature
Parallel architectures:
the bare metal
An introductionary course
Andy Pimentel
andy@wins.uva.nl
Literature
Outline
Processors in parallel systems
High performance processors
Modern RISC(y) processors
Modern RISC(y) processors (cont'd)
Very Large Instruction Word (VLIW) processors
VLIW (cont'd)
Vector processors
Vector processors (cont'd)
Caching
Cache implementations
Cache implementations (cont'd)
Cache implementations (cont'd)
Cache strategies
Parallel systems
Parallel systems (cont'd)
MIMD vs SIMD
Interconnection networks
Network properties
Network properties (cont'd)
Direct connection networks
Direct connection networks (cont'd)
Direct connection networks (cont'd)
Direct connection networks (cont'd)
Direct connection networks (cont'd)
Indirect connection networks
Busses
Multistage networks
Multistage networks (cont'd)
Multistage networks (cont'd)
Crossbar switches
Indirect connection networks (cont'd)
Direct vs Indirect networks
Direct vs Indirect networks (cont'd)
Packet switching
Store & forward vs Wormhole
Wormhole routing
Tree saturation
Routing techniques
Routing deadlocks
Routing techniques (cont'd)
Deterministic routing: X-Y
Deterministic routing (cont'd)
Adaptive routing
Deadlock avoidance
West-first routing example
Deadlock avoidance (cont'd)
Virtual channels
Complex communication support
An example: multicast support
Distributed memory MIMDs:
multicomputers
Distributed memory MIMDs (cont'd)
Distributed memory MIMDs (cont'd)
VSM and SVM
VSM and SVM (cont'd)
Real multicomputers: the IBM SP2
The IBM SP2 (cont'd)
Real multicomputers: the Parsytec CC
The Parsytec CC (cont'd)
Shared memory MIMDs:
multiprocessors
Cache coherency in shared memory machines
Cache coherency (cont'd)
Cache coherency (cont'd)
Cache coherency (cont'd)
Cache coherency (cont'd)
Uniform Memory Access (UMA)
UMA architectures (cont'd)
UMA architectures (cont'd)
Message combining
Non Uniform Memory Access (NUMA)
NUMA (cont'd)
Disadvantages of CC-NUMA
Cache Only Memory Architecture (COMA)
COMA (cont'd)
COMA versus CC-NUMA
COMA versus CC-NUMA (cont'd)
Simple COMA (S-COMA)
S-COMA (cont'd)
Reactive NUMA (R-NUMA)
Cache coherency revisited
Directory based protocols
Directory based protocols (cont'd)
Cache coherency (cont'd)
Cache coherency (cont'd)
Cache coherency (cont'd)
Scalable Coherent Interface (SCI)
Synchronization
Synchronization (cont'd)
Synchronization (cont'd)
Synchronization (cont'd)
Disk storage considerations
RAID
RAID (cont'd)
Real multiprocessors: SGI Origin 2000
The SGI Origin 2000 (cont'd)
Real multiprocessors: Cray T3D
The Cray T3D (cont'd)
The Cray T3D (cont'd)
Real multiprocessors: Tera MTA
The Tera MTA (cont'd)
The Tera MTA (cont'd)
The Tera MTA (cont'd)
Supercomputers
The NEC SX4
SIMD architectures
The MasPar 2
Performance evaluation
Modelling of architectures
Analytical modelling
Modelling of architectures (cont'd)
Simulation
Simulation (cont'd)
Simulating a single processor
Instruction level simulation
Direct execution
Direct execution: an example
Trace-driven simulation
Trace-driven simulation (cont'd)
Trace-driven simulation (cont'd)
Trace reduction
Trace shifting
Execution-driven simulation
Execution-driven simulation (cont'd)
About this document ...
Andy Pimentel
1999-05-27