A Scalable Instruction Queue Design for Exploiting Parallelism
Add to Google Calendar
To maximize the performance of wide-issue superscalar out-of-order microprocessors, the issue stage must be able to extract as much instruction-level parallelism (ILP) as possible from the dynamic instruction stream. This dissertation will examine several approaches to increasing available ILP while minimizing the impact on cycle time, including a novel instruction queue design and simultaneous multithreading.
In this work, I describe and evaluate a scalable instruction queue design (the Segmented Instruction Queue) that eliminates the correspondence between IQ size and cycle time. The Segmented IQ can be used as a component of a clustered architecture, another approach to reducing cycle-time penalties in wide-issue machines. The dependence tracking mechanism used by the Segmented IQ can be applied to the problem of instruction placement in clustered architectures.
By changing the mix of instructions present in the IQ, simultaneous multithreading (SMT) can also be used to increase the amount of available ILP. Under SMT, partitioning schemes are needed to distribute resource among threads; however some of these schemes, clustered architectures in particular, can significantly reduce SMT workload performance. If an SMT machine is to use a clustered microarchitecture, the choice of instruction placement policy must be carefully evaluated to avoid performance degradation. I will present data that characterizes the performance of SMT workloads in clustered architectures using both conventional instruction queues and segmented instruction queues.
Individually, these mechanisms represent viable approaches to increasing available ILP. This work shows that they can also be combined to form a more effective approach to increasing processor utilization and performance.