Follow
Joshua Hursey
Title
Cited by
Cited by
Year
The design and implementation of checkpoint/restart process fault tolerance for Open MPI
J Hursey, JM Squyres, TI Mattox, A Lumsdaine
2007 IEEE International Parallel and Distributed Processing Symposium, 1-8, 2007
2592007
Why it’s worth the hassle: The value of in-situ studies when designing ubicomp.
Y Rogers, K Connelly, L Tedesco, W Hazlewood, A Kurtz, RE Hall, ...
Springer, 336, 2007
2582007
An Evaluation of User-Level Failure Mitigation Support in MPI.
W Bland, A Bouteiller, T Herault, J Hursey, G Bosilca, JJ Dongarra
EuroMPI 12, 193-203, 2012
1402012
Interconnect agnostic checkpoint/restart in Open MPI
J Hursey, TI Mattox, A Lumsdaine
Proceedings of the 18th ACM international symposium on High Performance …, 2009
852009
PMIx: Process management for exascale environments
RH Castain, J Hursey, A Bouteiller, D Solt
Parallel Computing 79, 9-29, 2018
792018
Run-through stabilization: An MPI proposal for process fault tolerance
J Hursey, RL Graham, G Bronevetsky, D Buntinas, H Pritchard, DG Solt
Recent Advances in the Message Passing Interface: 18th European MPI Users …, 2011
692011
An evaluation of user-level failure mitigation support in MPI
W Bland, A Bouteiller, T Herault, J Hursey, G Bosilca, JJ Dongarra
Computing 95, 1171-1184, 2013
512013
A log-scaling fault tolerant agreement algorithm for a fault tolerant MPI
J Hursey, T Naughton, G Vallee, RL Graham
Recent Advances in the Message Passing Interface: 18th European MPI Users …, 2011
432011
Coordinated checkpoint/restart process fault tolerance for MPI applications on HPC systems
J Hursey
Indiana University, 2010
412010
Locality-aware parallel process mapping for multi-core HPC systems
J Hursey, JM Squyres, T Dontje
2011 IEEE international conference on cluster computing, 527-531, 2011
372011
A checkpoint and restart service specification for Open MPI
J Hursey, JM Squyres, A Lumsdaine
Indiana University, Bloomington, Indiana, USA, Tech. Rep. TR635, 2006
332006
Netloc: Towards a comprehensive view of the HPC system topology
B Goglin, J Hursey, JM Squyres
2014 43rd International Conference on Parallel Processing Workshops, 216-225, 2014
292014
Building a fault tolerant MPI application: A ring communication example
J Hursey, RL Graham
2011 IEEE International Symposium on Parallel and Distributed Processing …, 2011
282011
A composable runtime recovery policy framework supporting resilient HPC applications
J Hursey, A Lumsdaine
Indiana University, Bloomington, Indiana, USA, Tech. Rep. TR686, 2010
182010
Preserving collective performance across process failure for a fault tolerant MPI
J Hursey, RL Graham
2011 IEEE International Symposium on Parallel and Distributed Processing …, 2011
172011
Checkpoint/restart-enabled parallel debugging
J Hursey, C January, M O’Connor, PH Hargrove, D Lecomber, ...
Recent Advances in the Message Passing Interface: 17th European MPI Users …, 2010
162010
An extensible framework for distributed testing of mpi implementations
J Hursey, E Mallove, JM Squyres, A Lumsdaine
Lecture Notes in Computer Science 4757, 64, 2007
152007
A performance analysis and optimization of PMIx-based HPC software stacks
AY Polyakov, BI Karasev, J Hursey, J Ladd, M Brinskii, E Shipunova
Proceedings of the 26th European MPI Users' Group Meeting, 1-10, 2019
142019
Advancing application process affinity experimentation: Open MPI's LAMA-based affinity interface
J Hursey, JM Squyres
Proceedings of the 20th European MPI Users' Group Meeting, 163-168, 2013
142013
Representing unit test data for large scale software development
JA Cottam, J Hursey, A Lumsdaine
Proceedings of the 4th ACM symposium on Software visualization, 57-66, 2008
132008
The system can't perform the operation now. Try again later.
Articles 1–20