Monitoring Performance in Large Scale Computing Clouds with Passive Benchmarking

TitleMonitoring Performance in Large Scale Computing Clouds with Passive Benchmarking
Publication TypeConference Paper
Year of Publication2017
AuthorsNieke, C., and W. - T. Balke
Conference Name10th IEEE International Conference on Cloud Computing CLOUD2017
Date Published06/2017
Conference LocationHonolulu, Hawaii, USA

Providers of computing services such as data science clouds need to maintain large hardware infrastructures often with thousands of nodes. Using commodity hardware leads to rather heterogeneous setups that differ significantly in individual nodes’ performance, which must be understood to allow for accounting, strategic planning, and to identify problems and bottlenecks. Today’s method of choice are active benchmarks, but they disturb normal operations and are too expensive to run continuously. We therefore design a passive benchmarking technique, which computes expressive and accurate performance metrics based on monitoring logs of actual workloads. We prove the quality and performance benefits of our passive benchmark on a practical workload in one of the world’s largest scientific computing infrastructures, the CERN’s Computing Center. In fact, our approach achieves a better prediction quality than current active benchmarking, while avoiding costs in terms of downtime.

2017_CLOUD_PassiveBenchmark.pdf677.92 KB