Decoupled Threaded Architecture (DTA) is designed to exploit Thread Level Parallelism (TLP) by using a sea of simple cores grouped into cluster for providing a scalable solution that copes with wire delay. Our goals are i) to provide an aggressive mechanisms for decoupling memory accesses deriving from simple and complex data structures; ii) to implement a non-blocking execution of the threads. Here we illustrate some of the concepts related to our research in implementing DTA.