Phi-3 technical report: A highly capable language model locally on your phone M Abdin, J Aneja, H Awadalla, A Awadallah, AA Awan, N Bach, A Bahree, ... arXiv preprint arXiv:2404.14219, 2024 | 539 | 2024 |
{NeuGraph}: Parallel deep neural network computation on large graphs L Ma, Z Yang, Y Miao, J Xue, M Wu, L Zhou, Y Dai 2019 USENIX Annual Technical Conference (USENIX ATC 19), 443-458, 2019 | 284 | 2019 |
Retentive network: A successor to transformer for large language models Y Sun, L Dong, S Huang, S Ma, Y Xia, J Xue, J Wang, F Wei arXiv preprint arXiv:2307.08621, 2023 | 257 | 2023 |
GraM: scaling graph computation to the trillions M Wu, F Yang, J Xue, W Xiao, Y Miao, L Wei, H Lin, Y Dai, L Zhou Proceedings of the Sixth ACM Symposium on Cloud Computing, 408-421, 2015 | 157 | 2015 |
Rammer: Enabling holistic deep learning compiler optimizations with {rTasks} L Ma, Z Xie, Z Yang, J Xue, Y Miao, W Cui, W Hu, F Yang, L Zhang, ... 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2020 | 152 | 2020 |
The era of 1-bit llms: All large language models are in 1.58 bits S Ma, H Wang, L Ma, L Wang, W Wang, S Huang, L Dong, R Wang, J Xue, ... arXiv preprint arXiv:2402.17764, 2024 | 136 | 2024 |
VoteTrust: Leveraging Friend Invitation Graph to Defend against Social Network Sybils J Xue, Z Yang, X Yang, X Wang, L Chen, Y Dai The 32nd IEEE International Conference on Computer Communications( INFOCOM'2013), 0 | 90* | |
Garaph: Efficient {GPU-accelerated} graph processing on a single machine with balanced replication L Ma, Z Yang, H Chen, J Xue, Y Dai 2017 USENIX Annual Technical Conference (USENIX ATC 17), 195-207, 2017 | 84 | 2017 |
{Tux²}: Distributed Graph Computation for Machine Learning W Xiao, J Xue, Y Miao, Z Li, C Chen, M Wu, W Li, L Zhou 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2017 | 80 | 2017 |
Seraph: an efficient, low-cost system for concurrent graph processing J Xue, Z Yang, Z Qu, S Hou, Y Dai Proceedings of the 23rd international symposium on High-performance parallel …, 2014 | 78 | 2014 |
{ROLLER}: Fast and efficient tensor compilation for deep learning H Zhu, R Wu, Y Diao, S Ke, H Li, C Zhang, J Xue, L Ma, Y Xia, W Cui, ... 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2022 | 71 | 2022 |
VoteTrust: Leveraging friend invitation graph to defend against social network sybils Z Yang, J Xue, X Yang, X Wang, Y Dai IEEE Transactions on dependable and secure computing 13 (4), 488-501, 2015 | 71 | 2015 |
Fast distributed deep learning over rdma J Xue, Y Miao, C Chen, M Wu, L Zhang, L Zhou Proceedings of the Fourteenth EuroSys Conference 2019, 1-14, 2019 | 65 | 2019 |
Towards efficient large-scale graph neural network computing L Ma, Z Yang, Y Miao, J Xue, M Wu, L Zhou, Y Dai arXiv preprint arXiv:1810.08403, 2018 | 35 | 2018 |
Flexmoe: Scaling large-scale sparse pre-trained model training via dynamic device placement X Nie, X Miao, Z Wang, Z Yang, J Xue, L Ma, G Cao, B Cui Proceedings of the ACM on Management of Data 1 (1), 1-19, 2023 | 33 | 2023 |
Evomoe: An evolutional mixture-of-experts training framework via dense-to-sparse gate X Nie, X Miao, S Cao, L Ma, Q Liu, J Xue, Y Miao, Y Liu, Z Yang, B Cui arXiv preprint arXiv:2112.14397, 2021 | 30 | 2021 |
Processing concurrent graph analytics with decoupled computation model J Xue, Z Yang, S Hou, Y Dai IEEE Transactions on Computers 66 (5), 876-890, 2016 | 28 | 2016 |
A topology construct and control model with small-world and scale-free concepts for heterogeneous sensor networks L Liu, X Qi, J Xue, M Xie International Journal of Distributed Sensor Networks 10 (3), 374251, 2014 | 28 | 2014 |
Dense-to-sparse gate for mixture-of-experts X Nie, S Cao, X Miao, L Ma, J Xue, Y Miao, Z Yang, Z Yang, CUI Bin | 27 | 2021 |
Welder: Scheduling deep learning memory access via tile-graph Y Shi, Z Yang, J Xue, L Ma, Y Xia, Z Miao, Y Guo, F Yang, L Zhou 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023 | 21 | 2023 |