We present a high performance hybrid CPU-GPU implementation that accelerates the Fourier-based stitching of 2D optical microscopy images to less than 1-minute (end-to-end execution times). This implementation takes advantage of coarse-grained parallelism and organizes the computation into a pipeline architecture that spans available CPU and GPU resources and that overlaps computation and data movement. The implementation achieves a nearly 10x performance improvement over a simple approach to GPU-based acceleration. It stitches a 59 x 42 grid of images in 43 s. It also scales up with available GPUs and processes the same workload in 26 s on a system with two GPUs. For comparison purposes, an optimized single-threaded reference implementation takes nearly 10 min for the same workload while ImageJ/Fiji exceeds 3.5 hours!
Walid Keyrouz; Bertrand C. Stivalet; Timothy J. Blattner; Shujia Zhou; Joe Chalfoun; Mary C. Brady; Sixth Workshop on General Purpose Processing Using GPUs; Houston, TX, March 16-20, 2013.