Nowadays, the increasing core number benefits many workloads, but programming limitations to exploiting full performance still remain. A Data-Flow execution model is capable of taking advantage of the full parallelism offered by multicore systems. In such model, the execution can be decomposed in fine-grain threads named Data-Flow Threads (DF-Threads) so that each of them can execute only when their inputs are available. The execution overhead and power consumption is lowered thanks to the reduction of the data push-pull, as well as the burden of thread management. In a preliminary phase, we explored different solutions through to the COTSon simulator environment, which provided us with full system simulation and key metrics, such as OS impact. We compared DFthreads against standard parallel programming models such as OpenMPI, Cilk++ and DSM. To further improve the performance and the power consumption, we investigated a hybrid execution model, which relies both on Field Programmable Gate Arrays (FPGAs) and General Purpose Processors (GPPs). GPP cores allow us to support a large set of applications and FPGAs are known for their reconfigurability and power efficiency.