1205 W. Clark Street, Urbana, IL 61801
Office 4017 NCSA
Welcome to my website
I am a PhD student in the Computer Science Department of the University of Illinois at Urbana-Champaign.
Here, you will find information about projects I am involved in, my publications and open source tools I have developed. Feel free to check out my work and if you have any questions, comments, or would like to work together, don't hesitate to contact me.
My research interests are parallel and distributed computing on HPC systems or hybrid environments, specifically resilience/fault tolerance using data mining and machine learning. My work focuses on analysing log files generated by large scale systems by extracting the normal and faulty behaviour of the system. Specifically, my work involves building a propagation model for faults and failures that characterizes events generated by systems, and that can be further used for predicting the future state of the system or for root cause analysis for diagnosis purposes.
In the last two years my work was funded by the National Centre for Supercomputer Applications (NCSA) where I am part of the reliability and fault tolerance team for the Blue Waters project, in the context of the Joint Laboratory for Petascale Computing.
Improving the computing efficiency of HPC systems using a combination of proactive and preventive checkpointing - Mohamed Slim Bouguerra, Ana Gainaru, Franck Cappello, Leonardo Bautista Gomez, Naoya Maruyama, Satoshi Matsuoka - IPDPS 2013 (acceptance rate of 21%), Boston, USA
Fault prediction under the microscope: A closer look into HPC systems - Ana Gainaru, Franck Cappello, Marc Snir, William Kramer - Supercomputing 2012 (acceptance rate of 21%), Salt Lake City, USA
Taming of the Shrew: Modeling the Normal and Faulty Behavior of Large-scale HPC Systems - Ana Gainaru, Franck Cappello, William Kramer - IPDPS 2012 (acceptance rate of 21%), Shanghai, China
Coupling failure prediction, proactive and preventive checkpoint for current production HPC systems. - Talk given at the eighth workshop of the Joint Laboratory for Petascale Computing, November 19-21, 2012, Argonne National Laboratory
Fault prediction under the microscope: A closer look into HPC systems - A few online journals mention my paper from SC2012:
Preventive fault tolerance techniques for HPC systems - Talk given at the Tokyo Institute of Technology for my internship, July 6th, 2012, Tokyo, Japan.