Significant growth has been witnessed during the last decade in High-Performance Computing (HPC) clusters with multi-/many-core processors, accelerators, and high-performance interconnects (such as InfiniBand, Omni-Path, iWARP, and RoCE). Many supercomputers in the world are currently being designed with commodity HPC clusters. The Network-Based Computing Laboratory at OSU/CSE in actively engaged in designing software libraries (HPC, Big Data, Deep Learning, and Cloud) for such supercomputers.
During this talk we will focus on the open-source MVAPICH software project. This library supports the Message Passing Interface (MPI) standard which is most commonly used in all supercomputers to design parallel applications. The MVAPICH MPI library from OSU/CSE has been enabling TOP supercomputers in the World (including the latest #2nd ranked) during the last 15 years. The MVAPICH team is a partner of the upcoming Frontera supercomputer, the most powerful supercomputer in for open-science research in the USA. We will discuss about the basics of the MPI standard and its features. Next, we will present the challenges in designing high-performance and scalable MPI libraries for modern supercomputers with multi-core processors, GPUs, and high-performance interconnects.
The talk will follow with an open Q&A session with several members of the Network-Based Computing Laboratory. The session will conclude with a tour of the Laboratory consisting of multiple high-end clusters involving thousands of cores.
DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 450 papers in the area of high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group, are currently being used by more than 2,925 organizations worldwide (in 86 countries). More than 490,000 downloads of this software have taken place from the project’s site. This software is empowering several InfiniBand clusters (including the 2nd, 12th, 15th, and 24th ranked ones) in the TOP500 list. The RDMA packages for Apache Spark, Apache Hadoop and Memcached together with OSU HiBD benchmarks from his group are also publicly available. These libraries are currently being used by more than 290 organizations in 34 countries. More than 27,600 downloads of these libraries have taken place. High-performance and scalable versions of the Caffe and TensorFlow frameworks are available from https://hidl.cse.ohio-state.edu. Prof. Panda is an IEEE Fellow. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda.