HPC Architect
Mellanox Technologies

Peer-reviewed publications
Reducing Waste in Large Scale Systems Through Introspective Analysis - Leonardo Bautista-Gomez, Ana Gainaru, Swann Perarnau, Devesh Tiwari, Saurabh Gupta, Franck Cappello, Christian Engelmann, and Marc Snir - IPDPS 2016, Chicago, IL, USA

Scheduling the I/O of HPC applications under congestion - Ana Gainaru, Guillaume Aupy, Anne Benoit, Yves Robert, Franck Cappello, Marc Snir - IPDPS 2015, Hyderabad, India

Failure prediction for HPC systems and applications: current situation and open issues - Ana Gainaru, Franck Cappello, Marc Snir, William Kramer - The International Journal of High Performance Computing, Volume 27 Issue 3 Pages 272 - 281, August 2013

Improving the computing efficiency of HPC systems using a combination of proactive and preventive checkpointing - Mohamed Slim Bouguerra, Ana Gainaru, Franck Cappello, Leonardo Bautista Gomez, Naoya Maruyama, Satoshi Matsuoka - IPDPS 2013, Boston, USA

Fault prediction under the microscope: A closer look into HPC systems - Ana Gainaru, Franck Cappello, Marc Snir, William Kramer - Supercomputing 2012, Salt Lake City, USA

Taming of the Shrew: Modeling the Normal and Faulty Behavior of Large-scale HPC Systems - Ana Gainaru, Franck Cappello, William Kramer - IPDPS 2012, Shanghai, China

Real Time Analysis and Event Prediction Engine - Joshi Fullop, Ana Gainaru, Joel Plutchak - Cray User Group (CUG) 2012, Stuttgart, Germany

Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems - Eric Heien, Derrick Kondo, Ana Gainaru, Dan LaPine, Bill Kramer, Franck Cappello - Supercomputing 2011, Seattle USA

Adaptive Event Prediction Strategy with Dynamic Time Window for Large-Scale HPC Systems - Ana Gainaru, Franck Cappello, Joshi Fullop, Stefan Trausan-Matu, Bill Kramer - Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques 2011, Cascais, Portugal

Event log mining tool for large scale HPC systems - Ana Gainaru, Franck Cappello, Stefan Trausan-Matu, Bill Kramer - EuroPar conference 2011, Bordeaux, France

Master/Undergrad publications
Framework for mapping data mining applications on GPUs - Ana Gainaru, Emil Slusanschi - ISPDC, 6-8 July 2011, Cluj, Romania

Mapping Data Mining Algorithms on a GPU Architecture: A Study - Ana Gainaru, Emil Slusanchi, Stefan Trausan-Matu - ISMIS, 28-30 June 2011, Warsaw, Poland

A Study On Lexical Chain Identification And Word Sense Disambiguation - Stefan Dumitrescu, Ana Gainaru, Stefan Trausan-Matu - UPB Scientific Bulletin, Series C, Vol. 73, Iss. 4, 2011

Toolkit for automatic analysis of chat conversations - Ana Gainaru, Stefan Daniel Dumitrescu, Stefan Trausan-Matu - The 8th International Conference on Communications, 10-12 June 2010, Bucharest, Romania

A Realistic Mobility Model Based on Social Networks for the Simulation of VANETs - Ana Gainaru, Ciprian Dobre and Valentin Cristea - VTC2009-Spring Barcelona - The 69th IEEE Vehicular Technology Conference, 26 - 29 April 2009

Workshops and invited talks
Panel presentations: Fault Tolerance/Resilience at Petascale/Exascale: Is it Really Critical? Are Solutions Necessarily Disruptive? - Supercomputing 2013. Moderator: Franck Cappello
Panelists: Marc Snir, Bronis De Supinski, Al Geist, John Daly, Ana Gainaru, Satoshi Matsuoka

Since the third workshop of the Joint Laboratory for Petascale Computing, I have been giving a talk at each occurrence of the workshop (Link):
  • Dealing with prediction unfriendly failures: the road to specialized predictors - November 2014, Chicago
  • The road to failure prediction on Blue Waters: latest details and future directions - June 2014, Sophia Antipolis, France
  • Topology and behaviour aware failure prediction for Blue Waters - November 2013, NCSA, University of Illinois
  • Challenges in predicting failures on the Blue Waters system - June 12-14, 2013, Lyon
  • Coupling failure prediction, proactive and preventive checkpoint for current production HPC systems - November 19-21, 2012, Argonne National Laboratory
  • A detailed analysis of fault prediction results and impact for HPC systems - June 13-15, 2012, Rennes
  • Signal Analysis for Modeling the Normal and Faulty Behavior of Large-scale HPC Systems - November 21-23, 2011, Urbana
  • Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems - June 27-29, 2011, Grenoble
  • Framework for Event Log Analysis in HPC - November 22- 24, 2010, Urbana
  • Event log classification tool for large-scale systems – June 21-24, 2010, Bordeaux
Scheduling the I/O of HPC applications under congestion - Invited talk at the 9th Scheduling for Large Scale Systems Workshop, Lyon, July 2014

The magic behind failure prediction in the Petascale era - The 3rd Annual Greater Chicago Area System Research Workshop (GCASR), Chicago, May 2014

Log Analysis Framework for the Blue Waters system - Demonstration at the NCSA booth at Supercomputing Conference 2010, New Orleans

Mining event log patterns in HPC systems - Ana Gainaru, Franck Cappello, Bill Kramer - Resilience Summit 2010, Workshop on Resilience for Exascale HPC, October 13, 2010, Santa Fe

Position paper
Challenges addressed: Resilience through failure avoidance: New detectors of failure precursors and improved prediction workflow - Franck Cappello, Ana Gainaru - Position paper Operating Systems and Runtime Software for Exascale Systems, 2012

Program Committee
Member in the PC for: SC 2016, EuroMPI 2016, CCgrid 2016, IPDPS 2014, FTXS 2015/2013/2012, HPC-SYNASC 2012, HPCe 2011

Reviewer for HPDC 2013, IPDPS 2013, CCGrid 2013, HPDC 2012, CCGrid 2012, HPCE 2011, VTC2011-Fall, VTC2011-Spring, VTC2010-Spring
For the following journals:
International Journal of High Performance Computing
Journal of Parallel and Distributed Computing
IEEE Transactions on Parallel and Distributed Systems

