One of the most interesting research questions for the microarchitecture community is: How can we effectively use the increasing number of transistors available on a single chip while avoiding wire delay problem? The time needed for a signal to reach the opposite edge of a chip is becoming longer than one cycle, and because of this, it becomes hard to gain more performance improvement with the scaling of superscalar architectures. One viable solution for using all the available chip resources efficiently and effectively, while hiding wire delay as much as possible is to parallelize resource usage through resource clustering and decoupling. Recently, a good number of tiled/clustered architectures have been proposed, indicating that this field is gathering high interest from both academia and industry.