Xianglan Piao, Channoh Kim, Younghwan Oh, Huiying Li, Jincheon Kim, Hanjun Kim, and Jae W Lee
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - Poster (PPoPP Poster), February 2015.
sharing between CPU and GPU for data-parallel workloads. Unlike
conventional heterogeneous parallel programming environments for
single kernel, jAWS accelerates kernel execution by exploiting both
devices to realize full performance potential of heterogeneous
multicores. jAWS employs an efficient work partitioning algorithm
that finds an optimal work distribution between the two devices
without requiring offline profiling. The jAWS runtime provides
shared arrays for multiple parallel contexts, hence eliminating
extra copy overhead for input and output data. Our preliminary
evaluation with both CPU-friendly and GPU-friendly benchmarks
demonstrates that jAWS provides good load balancing and efficient
data communication between parallel contexts, to significantly
outperform best single-device execution.