Monitoring Performance in Large Scale Computing Clouds with Passive Benchmarking

TitleMonitoring Performance in Large Scale Computing Clouds with Passive Benchmarking
Publication TypeConference Paper
Year of Publication2017
AuthorsNieke, C., and W. - T. Balke
Conference Name10th IEEE International Conference on Cloud Computing (CLOUD)
Date Published06/2017
Conference LocationHonolulu, Hawaii, USA
Abstract

Providers of computing services such as data science clouds need to maintain large hardware infrastructures often with thousands of nodes. Using commodity hardware leads to heter-ogeneous setups that differ significantly in individual nodes’ performance, which must be understood to allow for account-ing, strategic planning, and to identify problems and bottle-necks. Today’s method of choice are active benchmarks, but they disturb normal operations and are too expensive to run continuously. They also struggle to be representative of an ever changing workload. We therefore design a passive benchmark-ing technique, which computes expressive and accurate perfor-mance metrics based on actual workloads. We prove the quality and performance benefits of our passive benchmark on a prac-tical workload in one of the world’s largest scientific computing infrastructures, the CERN Computing Center. In fact, our ap-proach allows continuous benchmarking of the active system, while avoiding costs in terms of downtime and achieves predic-tion quality comparable to the state-of-the-art approach of active benchmarking.

AttachmentSize
2017_CLOUD_PassiveBenchmark_Nieke_camera_ready.pdf1.4 MB