Performance Tuning a Windows HPC Cluster for Parallel Applications
< Go Back
This white paper discusses the factors that can affect the performance of your Windows HPC Server 2008 cluster with a focus on tuning and measuring tools and methods.
The paper’s goal is to help you identify the interconnect hardware to choose for your cluster and to explain how to tune that interconnect and software stack for your application and needs. The paper focuses on the specifics of tuning and configuring Windows HPC Server 2008, and it should help you identify the kind of high-performance computing (HPC) cluster you have and help you determine the most effective areas to concentrate your performance-tuning efforts and resources.
This paper does not cover specific models, processors, or brands of computers to buy. It does, however, make specific recommendations about what to consider in your purchasing decisions, the criteria for performance testing tools and methods, and specific configurations for general execution cases.
Topics covered in the paper include general performance tuning tools and methods (NetworkDirect versus TCP/IP, GigE versus specialty networking, InfiniBand and the OpenFabrics alliance, memory and CPU constraints, and large clusters), tuning for messaging-intensive applications (heavy messaging, latency-sensitive messaging, setting the Message Passing Interface (MPI) network, performance tips for message-intensive applications, Microsoft Message Passing Interface (MS-MPI) shared memory, simple multipurpose daemon, and typical microbenchmark network performance), tuning for embarrassingly parallel applications, performance tuning and measurement tools (using built-in diagnostics in the administration console, using mpipingpong, mpipingpong examples, using NetworkDirect PingPong), and performance measurement and tuning procedures.
Two appendices in the white paper cover performance counters for Windows HPC Server 2008, including recommended counters to monitor when assessing cluster performance, an example of collecting data from multiple compute nodes, and an example of how to use HTML to monitor performance.



