Authors
Charles Reiss, Alexey Tumanov CMU, Gregory R Ganger CMU, Randy H Katz, Michael A Kozuch
Publication date
2012/4
Description
With the emergence of large, heterogeneous, shared computing clusters, their efficient use by mixed distributed workloads and tenants remains an important challenge. Unfortunately, little data has been available about such workloads and clusters. This paper analyzes a recent Google release of scheduler request and utilization data across a large (12500+) general-purpose compute cluster over 29 days. We characterize cluster resource requests, their distribution, and the actual resource utilization. Unlike previous scheduler traces we are aware of, this one includes diverse workloads–from large web services to large CPU-intensive batch programs–and permits comparison of actual resource utilization with the user-supplied resource estimates available to the cluster resource scheduler. We observe some under-utilization despite over-commitment of resources, difficulty of scheduling high-priority tasks that specify constraints, and lack of dynamic adjustments to user allocation requests despite the apparent availability of this feature in the scheduler.
Acknowledgements: The authors wish to gratefully acknowledge Google for their public release of this cluster trace dataset. This research is supported in part by Intel as part of the Intel Science and Technology Center for Cloud Computing (ISTC-CC), by gifts from Google, SAP, Amazon Web Services, APC, Blue Goji, Cloudera, EMC, Emulex, Ericsson, Facebook, Fusion-IO, General Electric, Hewlett Packard, Hitachi, Huawei, IBM, Intel, MarkLogic, Microsoft, NEC Labs, NetApp, Oracle, Panasas, Quanta, Riverbed, Samsung, Seagate, Splunk, STEC, Symantec, VMware, by an NSERC Fellowship, and by …
Total citations
Scholar articles
C Reiss, A Tumanov, GR Ganger, RH Katz, MA Kozuch - Intel Science and Technology Center for Cloud …, 2012