It has been shown that in many instances the run-time overhead of detextion and allocation of dynamic parallelism in a program can easily offset the performance gain. Therefore, to improve performance and reduce run-time overhead. It would be logical to devise an allocation scheme which detects dynamic task of accurate estimation of the run-time parallesim is a strumbling block to this direction. As a compromise, we propose an allocation policy which: i) detects dynamic parallelism for loop concstructs estimated hardware resources in a staggered fashion using a set of heuristic rules. This paper introduces the proposed staggered distributin scheme and addresses its simulation and performance improvement.