High accuracy digital image correlation powered by GPU-based parallel computing L Zhang, T Wang, Z Jiang, Q Kemao, Y Liu, Z Liu, L Tang, S Dong Optics and Lasers in Engineering 69, 7-12, 2015 | 105 | 2015 |
Matrix engines for high performance computing: A paragon of performance or grasping at straws? J Domke, E Vatai, A Drozd, P ChenT, Y Oyama, L Zhang, S Salaria, ... 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021 | 33 | 2021 |
Heterogeneous parallel computing accelerated iterative subpixel digital image correlation JW Huang, LQ Zhang, ZY Jiang, SB Dong, W Chen, YP Liu, ZJ Liu, ... Science China Technological Sciences 61, 74-85, 2018 | 27 | 2018 |
A study of single and multi-device synchronization methods in Nvidia GPUs L Zhang, M Wahib, H Zhang, S Matsuoka 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2020 | 21 | 2020 |
Scaling distributed deep learning workloads beyond the memory capacity with KARMA M Wahib, H Zhang, TT Nguyen, A Drozd, J Domke, L Zhang, R Takano, ... SC20: International Conference for High Performance Computing, Networking …, 2020 | 20 | 2020 |
PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead L Zhang, C Liu, S Dong Genes 10 (11), 886, 2019 | 11 | 2019 |
Understanding the overheads of launching CUDA kernels L Zhang, M Wahib, S Matsuoka ICPP19, 5-8, 2019 | 11 | 2019 |
At the locus of performance: A case study in enhancing cpus with copious 3d-stacked cache J Domke, E Vatai, B Gerofi, Y Kodama, M Wahib, A Podobas, S Mittal, ... arXiv preprint arXiv:2204.02235, 2022 | 5 | 2022 |
Persistent Kernels for Iterative Memory-bound GPU Applications L Zhang, M Wahib, P Chen, J Meng, X Wang, S Matsuoka arXiv preprint arXiv:2204.02064, 2022 | 4 | 2022 |
PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications L Zhang, M Wahib, P Chen, J Meng, X Wang, T Endo, S Matsuoka Proceedings of the 37th International Conference on Supercomputing, 167-179, 2023 | 3 | 2023 |
Revisiting Temporal Blocking Stencil Optimizations L Zhang, M Wahib, P Chen, J Meng, X Wang, T Endo, S Matsuoka Proceedings of the 37th International Conference on Supercomputing, 251-263, 2023 | 2 | 2023 |
At the locus of performance: Quantifying the effects of copious 3D-stacked cache on HPC workloads J Domke, E Vatai, B Gerofi, Y Kodama, M Wahib, A Podobas, S Mittal, ... ACM Transactions on Architecture and Code Optimization 20 (4), 1-26, 2023 | 1 | 2023 |
Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt) L Zhang, M Wahib, P Chen, J Meng, X Wang, T Endo, S Matsuoka Proceedings of the 15th Workshop on General Purpose Processing Using GPU, 34-35, 2023 | | 2023 |
A Study of Synchronization Methods in Modern GPUs L Zhang, M Wahib, H Zhang, S Matsuoka | | 2019 |
Breaking the limitation of GPU Memory for Deep Learning Workloads H Zhang, M Wahib, L Zhang, Y Tsuji, S Mtsuoka | | 2019 |
GPU Accelerated High Accuracy Digital Volume Correlation T Wang, L Zhang, Z Jiang, K Qian International Digital Imaging Correlation Society: Proceedings of the First …, 2017 | | 2017 |