Maui Cluster Scheduler, the precursor to Moab HPC Suite, is an open source job scheduler for clusters and supercomputers. It is an optimized, configurable tool capable of supporting an array of scheduling policies, dynamic priorities, extensive reservations, and fairshare capabilities. It is currently in use at hundreds of government, academic, and commercial sites throughout the world. All of the capabilities found in Maui are also found in Moab, while Moab has added features including basic trigger support, graphical administration tools, and a Web-based user portal among many others. To learn more about the features available on Maui and, see the Maui and Moab comparison brief for all the additional capabilities and value Moab offers.
Maui Scheduler is an batch system for cluster systems. It allows site administrators extensive control over which jobs are considered eligible for for scheduling, how the jobs are prioritized, and where these jobs are run. Maui supports advance reservations, QoS levels, backfill, and allocation management.
Its scheduling scheme is based on advanced wall-time reservations with backfill. The main difference from other common batch queue schedulers (e.g. NQS, DQS) is that Maui allows jobs to overtake a job with higher priority only if it does not delay the start of the prioritized job (i.e. backfill).
Opportunistic Scheduling versus Advance Reservation
In theory, unless there is a way of reserving resources in advance, any job that need more than a single piece of allocatable resource, runs the risk of starvation[1,2]. In practice, due to limited workload this is a danger only for jobs that require extensive resources. It is often intervened by periodically putting small, less demanding jobs on hold until the large, starving job have started. This is a rather intrusive operation since it also affects jobs that is not involved in the starvation.
An advance reservation scheme, as in the Maui scheduler, makes it possible to allocate resources in the future. (Compare scheduling a meeting with several participants. It is almost impossible unless there is a calendar available.)
The starvation can also be resolved by using a preemptive job scheduler. Unfortunately not all computer systems can handle this. Moreover, if jobs can be preempted it is more difficult to predict when the job will finish.
Queues versus Quality of Service
When an idle job becomes eligible to run, it is assigned a priority. This priority is used to sort the jobs before the scheduler selects a job to start.
Many batch systems use queues to divide and classify the workload. Each queue is then assigned a priority and sometimes each job is assigned a second priority to sort themselves within the queue. This classifying scheme is often too coarse. To take into account all parameters that set a batch job policy, you may end up with more queues than jobs.
In Maui, "queues" have lost their importance in classification and priority calculations. Instead a Quality-of-Service (QoS) attribute can be used to classify the jobs. However, QoS is not a hierarchical scheme. It is merely a method of setting the parameters of a job when it enters the scheduler. All jobs eligible to run remain in one common idle-queue and their priorities are compared with all others.
Job State
Jobs in Maui can be in one of three major states:
There is a limit on the number of jobs a group/user can have in the Queued state. This prohibit users from acquiring longer queue-time than deserved by submitting large number of jobs.
Job Priority
Maui present numerous factors in the expression used to calculate the job priority to achieve a site's goals of fairness and utilization. Each factor is weighted according to its importance and the sum is used as the total priority of the job. The most important factors are described below together with the importance they have in the current configuration of Maui on Ingvar
The resource factor consist of several terms that describes the required resource to run the job; number of processors, amount of memory, size of empty disk space, and swap size. Depending on what type of jobs is favored, jobs can be pushed the front of the queue. Experience shows that favoring large jobs often improves system utilization.
Ingvar: Fairly low rating. A high utilization is desired but the fairness between users should not be affected.
This factor is based on the time the job has been eligible to run. This factor often has a very low weight in the priority calculation. Instead, more important is the expansion factor
Ingvar: Low rating. A fall-back.
The expansion factor or XFactor is calculated using the equation:
XFactor = (Queue_Time + Job_Time_Limit) / Job_Time_LimitThis relates the job time limit the user request to the total queueing and expected run time. A job with low time limit will increase its priority more quickly than a long job, pushing it to the front of the queue.
Ingvar: The most important factor after QoS. It verbalizes the general job scheduling policy.
If the expansion factor is not enough to meet the scheduling goals, there is a Target factor that is increased exponentially as the actual queue time approach the target queue time.
Ingvar: Not used ...yet.
The fair share value is based on historical usage. It is divided into the user, group, and account associated with the job. Fair Share is a provocative factor. Although the intention is good, the effect of this factor is not easy to understand and rate to achieve fairness[3].
Ingvar: Excluded from any priority calculation.
The QoS factor is a fixed number used to offset jobs with high quality-of-service.
Ingvar: Three different QoS exist; Normal, Bonus, and Disabled. Normal has a ten times higher QoS-factor than Bonus, always pushing bonus jobs to the back of the queue. Another feature in Maui prohibit Disabled jobs to make any reservations.
There are two types of reservations in Maui:
Every scheduling cycle, after the job priority have been calculated, Maui examines the jobs in the queued state and schedules advance reservations.
Also, there is standing reservations. This is user reservations which are scheduled automatically and repeatedly.
Maui is an advanced job scheduler for use on clusters and supercomputers. It is a highly optimized and configurable tool capable of supporting a large array of scheduling policies, dynamic priorities, extensive reservations, and fairshare. It is currently in use at hundreds of leading government, academic, and commercial sites throughout the world. It improves the manageability and efficiency of machines ranging from clusters of a few processors to multi-teraflop supercomputers.Maui is a community project* and may be downloaded, modified, and distributed. It has been made possible by the support of Cluster Resources, Inc and the contributions of many individuals and sites including the U.S. Department of Energy, PNNL, the Center for High Performance Computing at the University of Utah (CHPC), Ohio Supercomputing Center (OSC), University of Southern California (USC), SDSC, MHPCC, BYU, NCSA, and many others.
Maui extends the capabilities of base resource management systems by adding the following features:Maui interfaces with numerous resource management systems supporting any of the following scheduling API's
- Extensive job priority policies and configurations
- Multi-resource admin and job advance reservation support
- Metascheduling interface
- QOS support including service targets and resource and function access control
- Extensive fairness policies
- Multi-attribute fairshare
- Configurable node allocation policies
- Multiple configurable backfill policies
- Detailed system diagnostic support
- Allocation manager support and interface
- Extensive resource utilization tracking and statistics
- Non-intrusive 'Test' modes
- Advanced built-in HPC simulator for analyzing workload, resource, and policy changes
PBS Scheduling API - TORQUE, OpenPBS and PBSPro
Loadleveler Scheduling API - Loadleveler (IBM)
SGE Scheduling API - Sun Grid Engine (Sun)*
BProc Scheduling API - BProc (Scyld)**
SSS XML Scheduling API*
LSF Scheduling API - LSF (Platform)
Wiki FlatText Scheduling API (Wiki)
*partial support or under development
**supported under ClubmaskMaui is currently supported on all known variants of Linux, AIX, OSF/Tru-64, Solaris, HP-UX, IRIX, FreeBSD, and other UNIX platforms.
The Maui scheduler is mature, fully documented, and supported. It continues to be agressively developed and possesses a very active and growing user community. Its legacy of pushing the scheduling envelope continues as we promise to deliver the best possible scheduler supporting systems software will allow.
