The number of cores per chip keeps increasing in order to improve performance while controlling the power. According to semiconductor roadmaps, future computing systems while reach the scale of 1 Tera devices in a single package. Firstly, such Tera-device systems will expose a large amount of parallelism that cannot be easily and efficiently exploited by current applications and programming models. Secondly, the reliability of Tera-device systems will become a critical issue. Finally, we need to simplify the design of such systems. TERAFLUX aims at providing a framework based on dataflow concepts that could provide a solution for all the three above challenges. We briefly present here our idea on the architectural support for the TERAFLUX execution model.