Compile-time recording of machine-level instructions has been very successful at achieving large increases in performance of programs on machines offering fine-grained parallelism. However, because of the interpendences between instructin scheduling and register allocation, it is not clear chich of these two phases of the compiler should run first to generate the most efficient final code. In this paper, wi descrive our investingatin into slight modifications ot key phases of a successful global register allocator to create a schelduler-sensitive register allocator, which is the followed by an "off-the shelf" instructin scheduler. Our experimental studies reveal that this approach achieves speedups comparable and increasingly beter than previous cooperative approaches with an increasingly better than previous cooperative approaches with an increasing number of available register without the complexities of the previous approaches
|
|