TY - GEN
T1 - Dynamic thread partition algorithm based on sharing data on CMP
AU - Zhou, Deng
AU - Tian, Ye
AU - Shen, Hong
PY - 2011
Y1 - 2011
N2 - At the level of multi-core processors that share the same cache, data sharing among threads which belong to different cores may not enjoy the benifit of non-uniform cache access because it is difficult to determine which core should be set as the local position of data block while each cache block is setting as one of the core's local block. Studies have found that the cost of long latency access can be reduced by using a proper thread partition/allocation algorithm [1]. However, at present work, researchers pay little attention to thread partitioning algorithms which can reduce the cost of long latency access. In this paper, we present a dynamic thread partitioning algorithm according to data sharing among threads at the level of cache-shared-multicore processers. In our design, the algorithm makes the best effort to minimize shared block accessed by threads of different cores. Compared with the existing work, our new algorithm achieves a performance improvement. We perform experiments on 4 cores and more than 100 threads and the result show that our algorithm can reduce the interaction of threads belonging to different cores between 30% and 50% over the previously known solutions.
AB - At the level of multi-core processors that share the same cache, data sharing among threads which belong to different cores may not enjoy the benifit of non-uniform cache access because it is difficult to determine which core should be set as the local position of data block while each cache block is setting as one of the core's local block. Studies have found that the cost of long latency access can be reduced by using a proper thread partition/allocation algorithm [1]. However, at present work, researchers pay little attention to thread partitioning algorithms which can reduce the cost of long latency access. In this paper, we present a dynamic thread partitioning algorithm according to data sharing among threads at the level of cache-shared-multicore processers. In our design, the algorithm makes the best effort to minimize shared block accessed by threads of different cores. Compared with the existing work, our new algorithm achieves a performance improvement. We perform experiments on 4 cores and more than 100 threads and the result show that our algorithm can reduce the interaction of threads belonging to different cores between 30% and 50% over the previously known solutions.
KW - Data sharing
KW - Multicore thread
KW - On-chip latency
KW - Thread partition
UR - http://www.scopus.com/inward/record.url?scp=84856674715&partnerID=8YFLogxK
U2 - 10.1109/PDCAT.2011.36
DO - 10.1109/PDCAT.2011.36
M3 - Conference contribution
AN - SCOPUS:84856674715
SN - 9780769545646
T3 - Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings
SP - 122
EP - 127
BT - Proceedings - 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2011
T2 - 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2011
Y2 - 20 October 2011 through 22 October 2011
ER -