arond
1205 W. Clark Street, Urbana, IL 61801
Office 4017 NCSA and 4103 SC

Research

Welcome to my website !!

I am currently a PhD student in the Computer Science Department of the University of Illinois at Urbana-Champaign.

On this website, you will find information about projects I am involved in, my publications and open source tools I have developed. Feel free to check out my work and if you have any questions, comments, or would like to work together with me, don't hesitate to contact me.

My research interests are parallel and distributed computing on HPC systems with a special focus on resilience/fault tolerance. My PhD research aims to discover to what extent online failure prediction is a possibility at petascale/exascale and what are the challenges in achieving an effective fault prevention mechanism for current and future HPC systems.
My quest gave me the opportunity to play with (and sometimes crash) many large systems during my RAs and internships. Here is a summary:
  • May-August 2013: Internship at Argonne National Laboratory (worked on Mira and Intrepid)
  • June-September 2012: Internship at Tokyo Institute of Technology (worked with Tsubame2.0 logs)
  • August 2011-present: Research assistant for National Center for Supercomputing Applications (I am part of the reliability and fault tolerance team for the Blue Waters project)
  • January 2010-August 2011: Visiting scholar at the National Center for Supercomputing Applications (worked on Mercury)
Since 2010, my work was done in the context of the Joint Laboratory for Petascale Computing.

Last publications
Improving the computing efficiency of HPC systems using a combination of proactive and preventive checkpointing - Mohamed Slim Bouguerra, Ana Gainaru, Franck Cappello, Leonardo Bautista Gomez, Naoya Maruyama, Satoshi Matsuoka - IPDPS 2013, Boston, USA

Fault prediction under the microscope: A closer look into HPC systems - Ana Gainaru, Franck Cappello, Marc Snir, William Kramer - Supercomputing 2012, Salt Lake City, USA

Taming of the Shrew: Modeling the Normal and Faulty Behavior of Large-scale HPC Systems - Ana Gainaru, Franck Cappello, William Kramer - IPDPS 2012, Shanghai, China

Latest news
CodingIllini (team composed of NCSA staff and UIUC CS graduate students) won Intel Parallel Universe Coding Competition 2014
Fault prediction under the microscope: A closer look into HPC systems - A few online journals mention my paper from SC2012:
Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems - Interview given for NCSA about my paper at SC11 (link)
   designed by