*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Committee:
Dr. Karsten Schwan, Advisor, College of Computing
Dr. Matthew Wolf, College of Computing
Dr. Scott Klasky, Oak Ridge National Laboratory
Dr. Rich Vuduc, College of Computing
Dr. Ron. A. Oldfield, Sandia National Laboratory
Abstract:
The exponential growth of data produced by scientific simulations on leadership class HPC machines has exposed the importance of the I/O bottleneck which can cripple the progress of scientific understanding in many important national interest domains.
The size of the data has also exposed difficulties in current methods of information extraction from this generated data and its reliance on post processing based data exploration. The accelerating growth in computational capability compared to the growth of I/O bandwidth has created a large imbalance. This imbalance is a bottleneck that limits our ability to exploit the performance of current generation machines and will play an even greater role in limiting the efficient utilization of next generation systems.
In this thesis I present a significant shift in how data management and I/O are dealt with on these high end computing systems. In particular, I present the Data Service abstraction, which addresses data management and information extraction as an integral part of the data generation and output process. A data service is a combination of coupled plugins operating on output data to both extract information from the data and to also prepare the data for further analysis. I also address some of the fundamental requirements in creating dynamic functional I/O pipelines such as the ability to extend the output from a stream of bytes to a self describing structure, the overhead of data movement and processing on application performance and the management of available resources for the data service. I also use available technologies such as RDMA and structured serialization along with the development of new abstractions such as data staging resources to address these challenges. The thesis will also present the utility of these data services for real applications in the materials and fusion domain and evaluate the functionality of data services for these domains.
Hasan Abbasi
http:///www.cc.gatech.edu/~habbasi/