SGE Resource Quota Sets (RQS)
Managing Resource Quotas in Grid Engine
By Sinisa Veseli
It is often the case that cluster administrators must impose limits on using certain resources.
Good example here would be preventing a particular user (or a set of users), from utilizing entire
queue (or cluster) at any point. If you've ever tried doing something like that for Grid Engine
(SGE), then you know that it is not immediately obvious how to impose limits on resource usage.
SGE has a concept of "resource quota sets" (RQS), which can be used to limit maximum resource consumption
by any job. The relevant qconf command line switches for manipulating resource quota sets are "-srqs"
and "-srqsl" (show), "-arqs" (add), "-mrqs" (modify) and "-drqs" (delete).
Each RQS must have the following parameters: name, description, enabled and limit. RQS name cannot
have spaces, but its description can be an arbitrary string. The boolean "enabled" flag specifies
whether the RQS is enabled or not, while the "limit" field denotes resource quota rule that consists
of an optional name, filters for a specific job request and the resource quota limit. Note that
one can have multiple "limit" fields associated with a given RQS. For example, the following RQS
prevents user "ahogger" to occupy more than 1 job slot in general, and it also limits the same user
from running jobs in the headnodes.q queue:
$ qconf -srqs ahogger_job_limit
{
name ahogger_job_limit
description "limit ahogger jobs"
enabled TRUE
limit users ahogger to slots=1
limit users {ahogger} queues {headnodes.q} to slots=0
}
The exact format in which RQS have to be specified is, like everything else, well documented in
SGE man pages ("man sge_resource_quota").
Roland Dittel
20 November 2006
1 Introduction
In large enterprise clusters it is necessary to prevent users from consuming all available resources.
In order to achieve this, N1GE6 supports complex attributes which can be configured on a global,
queue or host layer. This feature is sufficient in certain cases, especially in small clusters,
but has shortcomings and drawbacks for enterprise usage.
Customers have asked for a feature to enhance resource limits so that they apply to several kinds
of resources, several kinds of resource consumers, to all jobs in the cluster and to combinations
of consumers. In this context, "resources" are any defined complex attribute (see complex(5)) known
by the Grid Engine configuration. For example this can be slots, arch, mem_total, num_proc, swap_total
or any custom-defined resource like compiler_license. Resource consumers are (per) users, (per)
queues, (per) hosts, (per) projects, (per) parallel environments. This specification describes a
user interface to define such flexible resource limits.
This feature provides a way for administrators to limit the resources used at a single time by
a consumer. However, it is not a way to define priorities by which user should obtain a resource.
Priorities can be defined by using the Share Tree feature released with N1GE6.
2 Project Overview
2.1 Project Aim
The aim of this project is a solution that allows utilization of built-in and user-defined resources
to be managed in a more flexible manner. In particular, this is a means to limit resources on a
per user basis and a per project basis. Similarly, resource limitations on the basis of a user groups
and project groups are also required.
The Issues targeted with this project are:
Issue |
Description |
74 |
Support maxujobs on a per host level
|
1532
|
Max jobs per user on a queue basis
|
1644
|
Per-user slot limits for limiting PE usage
|
CR 6298406
|
Hostgroups should be added as another configuration
layer b/w global and host |
CR 6289250
|
Request for Job limit per User of Queue
|
2.2 Project Benefit
The expectation is that the management of N1GE cluster resources will be possible in a much more
targeted manner. The enhancement must make it easy to freely manage limits for arbitrary resources
in relation to existing N1GE objects, such as project/user/host/queue/pe, without the burden of
doing micro-management with countless projects/users/hosts/queues.
Suggestions for future enhancements are:
- express these resource limits by means of percentages of a wider context (e.g. (a) memory
limit of 4G available for project1 and project2 (b) up to 70 percent available for project1
and (c) up to 60 percent available for project2)
- add a new built-in complex attribute "jobs", that always counts 1 for a job in all resource
containers
- define operators which can modify a set of resource limits as a means to allow hierarchical
management
The cluster is open to all Department of Statistics faculty, grad students, postdocs, and visitors
using their SCF logon. Biostatistics grad students, postdocs, and visitors do not currently have
access to the cluster, but can request access based on grant funding. Class account users do not
have access by default, but instructors can email [email protected]
to discuss access for their class.
Currently users may submit jobs on the following submit hosts:
arwen, beren, bilbo, crow, gimli, heffal, legolas, mole, pooh, rabbit, roo, shelob, springer, toad, treebeard, witch
The cluster has three job queues; high.q, interactive.q, and low.q.
Interactive jobs allow a user to work at the command line of a cluster node; instructions for interactive
jobs are provided in a later section of this document.
We have configured high.q and interactive.q to have priority over low.q,
as low.q jobs have lower priority for obtaining system resources than the other queues.
When the cluster is busy, low.q jobs will run more slowly than other jobs.
A given user can use at most 12 slots in each of high.q and interactive.q at
any one time (this could be a single 12-core job, 12 single core jobs, or any combination in between).
Users can submit jobs requiring additional cores to the queue to be started when a user's previous
job(s) end. The number of jobs and cores per user running in low.q is not restricted, except by
the physical limits of the cluster and jobs being run by other users. low.q jobs are limited to
28 days runtime while high.q jobs and interactive jobs are restricted to 7 days runtime (but see
the Section below on "How to Submit Long Jobs" for jobs you expect to take more than three days,
as jobs lasting longer than 5 days will be killed by default). Jobs on all queues are restricted
to 128GB RAM, and threaded/multi-core jobs will run on at most 32 cores. Jobs exceeding runtime
or memory will be silently killed off by SGE. It is therefore important to try to gauge how long
a job might run. SGE will default submitted jobs to low.q (if no queue is specified) and any job's
default output files to the CWD (current working directory) from which the job was submitted.
Queue |
Max. # cores per user (running) |
Time Limit |
Max. memory/job |
Max. # cores/job |
interactive.q |
12 |
7 days* |
128 GB |
12 |
high.q |
12 |
7 days* |
128 GB |
12 |
low.q |
256 |
28 days* |
128 GB |
32** |
* See the Section on "How to Submit Long Jobs" for jobs you expect to take more than three days.
** If you use MPI (including foreach with doMPI in R), you can run individual jobs on more than
32 cores. See the Section on "How to Submit MPI Jobs" for such cases.
We have implemented a 'fair share' policy that governs the order in which jobs that are waiting
in a given queue start when resources become available. In particular, if two users each have a
job sitting in a queue, the job that will start first will be that of the user who has made less
use of the cluster recently (measured in terms of CPU time). The measurement of CPU time downweights
usage over time, with a half-life of one month, so a job that ran a month ago will count half as
much as a job that ran yesterday. Apart from this prioritization based on recent use, all users
are treated equally.
4 Functional Definition
4.1 Performance
4.2 Reliability, Availability, Serviceability (RAS)
- Enhancement of Scheduler Profiling
4.3 Diagnostics
- qquota (new command)
- qstat -j job_id (enhancm.)
4.4 User Experience
4.4.1 Obsolete Configuration
4.4.2 Command Line (CLI)
4.4.2.1 CLI enhancements
switch |
Description |
-aattr obj_nm attr_nm
val obj_id_lst |
add to a list attribute of an object
|
-Aaatr obj_nm fname obj_id_lst
|
add to a list attribute of an object
|
-dattr obj_nm attr_nm
val obj_id_lst |
delete from a list attribute of an object
|
-Dattr obj_nm fname obj_id_lst
|
delete from a list attribute of an object
|
-mattr obj_nm attr_nm
val obj_id_lst |
modify an attribute (or element in a sublist)
of an object |
-Mattr obj_nm fname obj_id_lst
|
modify an attribute (or element in a sublist)
of an object |
-rattr obj_nm attr_nm
val obj_id_lst |
replace an attribute (or element in a sublist)
of an object |
-Rattr obj_nm fname obj_id_lst
|
replace an attribute (or element in a sublist)
of an object |
obj_nm |
rqs - resource quota set |
attr_nm |
name or enabled or description or limit
|
val |
new value of attr_nm |
obj_id_lst
|
rule set or rule for limit |
switch |
description |
-j job_identifier_list
|
show scheduler job information |
-u user_list
|
view only jobs of this user |
4.4.2.2 CLI additions
switch |
Description |
-arqs [name]
|
add resource quota set(s) |
-Arqs fname
|
add resource quota set(s) from file
|
-mrqs [name]
|
modify resource quota set(s) |
-Mrqs fname [name]
|
modify resource quota set(s) from file
|
-srqs [name_list]
|
show resource quota set(s) |
-srqsl |
show resource quota set list |
-drqs [name_list]
|
delete resource quota set(s) |
switch |
description |
-help |
print this help |
-h host_list
|
display only selected host |
-l resource_attributes
|
request the given resources |
-u user_list
|
display only selected users |
-pe pe_list
|
display only selected parallel environments
|
-P project_list
|
display only selected projects |
-q wc_queue_list
|
display only selected queues |
-xml |
display the information in XML-Format
|
4.4.3 Graphical User Interface (GUI)
4.4.3.1 Configuration
Qmon will be enhanced to allow the configuration of resource resource quota sets. The configuration
will be the same as on CLI with an editor.
4.4.3.2 Diagnose
No Diagnose Support will be provided by qmon
4.5 Manufacturing
- aimk - not affected
- Makefiles - minor changes, new source objects
4.6 Quality Assurance
- New testsuite tests needed
- Modules tests for core module testing
4.7 Security & Privacy
Not affected
4.8 Migration Path
- imposes no need for DB update
- imposes no need for update script
4.9 Documentation
This Specification is used by Doc writer.
4.9.1 Man Page Changes:
- qquota(1) - new man page
- sge_resource_quota(5) - new man page
- qconf(5) - new switches
4.10 Installation
Installation will not not change. For future releases the installation may change if complex
configuration for global/queue/host becomes obsolete.
At installation time no default rules sets are created.
4.11 Packaging
Does not change
5 Component Descriptions
5.1 Component Resource Quota Rules
5.1.1 Overview
According to customers and the filed RFEs it's desired to define a limit only for specific consumers
like users or projects and only for specific providers like hosts or queues. To achieve this administrators
must be able to define a rule set which consists of the limiting resource and the limit value, and
additionally the consumers or providers to whom this rule should apply. Because every rule can be
expressed by a tuple of filter specifiers we decided to implement the rule sets in style of firewall
rules.
In practice a rule is defined by:
- who
- users (list of user or usersets/departments)
- projects (list of project)
- where
- parallel_environments (list of pe's)
- hosts (list of host or hostgroups)
- queues (list of cluster queues)
- what
- resource_attribute=max value
The Resource Quota Rules are separate configuration objects and only used for scheduling decisions.
They don't affect the overall cluster configuration like cluster queues, hosts or projects.
Deliberate use of restrictions in first step of implementation:
- Limits are counted per task as done in the current implementation. For example if a pe job
got 10 slots it will consume 10 licenses.
- Limitation can only be done for fixed and consumable resources, not for load values.
future enhancement: see "Migration Path". That means to have resource configuration for load
values in either global, host for queue level.
5.1.2 Functionality
Integration with current implementation
The Resource Quotas are an addition to the current global, host and queue instance based scheduling
order. The old implementation is still valid and can be used without the new rules. The rules enhances
the old implementation and adds a new order layer on top of global to define a more precise limitation.
The implications of the layer order on resources are described in complexes(5) under "Overriding
attributes". In general the layers are AND associated and if one layer denies the job, then the
next layer is ignored. For example, a limit value of "slots=4" can be overwritten in global, host
or queue layer if the layer value is more restrictive, eg, "slots=2". The exception (see complexes(5))
is for boolean values; for example "is_linux=true" defined in the tree can not be overwritten to
"is_linux=false" in global host or queue definition.
resource quotas
|-DENIED->break
|
global
|-DENIED->break
|
host
|-DENIED->break
|
queue
|-DENIED->break
|
OK
|
Resource Reservation
Resource Reservation will for Resource Quotas analogue to the current global/host/queue resource
configuration. No changes on client side necessary.
5.1.3 Interfaces
Resource Quota Set Syntax
ALL: '*'
SEPARATOR: ','
STRING: [^\n]*
QUOTE: '\"'
S_EXPANDER: '{'
E_EXPANDER: '}'
NOT: '!'
BOOL: [tT][rR][uU][eE]
| 1
| [fF][aA][lL][sS][eE]
| 0
NAME: [a-zA-Z][a-zA-Z0-9_-]*
LISTVALUE: ALL | [NOT]STRING
LIST: LISTVALUE [SEPARATOR VALUE]*
NOTSCOPE: LIST | S_EXPANDER LIST E_EXPANDER
SCOPE: ALL | STRING [SEPARATOR STRING]*
RESOURCEPAIR: STRING=STRING
RESOURCE: RESOURCEPAIR [SEPARATOR RESOURCEPAIR]*
rule: "limit" ["name" NAME] ["users" NOTSCOPE] ["projects" SCOPE] ["pes" SCOPE] \
["queues" SCOPE] ["hosts" NOTSCOPE] "to" RESOURCE NL
ruleset_attributes: ("name" NAME NL)
("enabled" BOOL NL)?
("description" QUOTE STRING QUOTE)?
ruleset: "{"
(ruleset_attributes)
(rule)+
"} NL"
rulesets: (ruleset)*
|
Resource Quota Sets Format
users
Contains a comma separated list of UNIX users or ACLs (see access_list(5)). This parameter filters
for jobs by a user in the list or one of the ACLs in the list. Any user not in the list will not
be considered for the resource quota. The default value is '*' which means any user. An ACL is differentiated
from a UNIX user name by prefixing the ACL name with an '@' sign. To exclude a user or ACL from
the rule the name can be prefixed with the '!' sign. Defined UNIX user or ACL names need not be
known in the Grid Engine Configuration.
projects
Contains a comma separated list of projects (see project(5)). This parameter filters for jobs requesting
a project of the list. Any project not in the list will not be considered for the resource quota.
If no project filter is specified all projects and jobs with no requested project matches the rule.
The value '*' means all jobs with requested projects. To exclude a project from the rule the name
can be prefixed with the '!' sign. The value '!*' means only jobs with no project requested.
pes
Contains a comma separated list of PEs (see sge_pe(5)). This parameter filters for jobs requesting
a pe of the list. Any PE not in the list will not be considered for the resource quota. If no pe
filter is specified all pe and jobs with no requested pe matches the rule. The value '*' means all
jobs with requested pe. To exclude a pe from the rule the name can be prefixed with the '!' sign.
The value '!*' means only jobs with no pe requested.
queues
Contains a comma separated list of cluster queues (see queue_conf(5)). This parameter filters for
jobs may be scheduled in a queue of the list. Any queue not in the list will not considered be for
the resource quota. The default value is '*' which means any queue. To exclude a queue from the
rule the name can be prefixed with the '!' sign.
hosts
Contains a comma separated list of host or hostgroups (see host(5) and hostgroup(5)). This parameter
filters for jobs may be scheduled on a host of the list or a host contained in the hostgroup. Any
host not in the list will not be considered for the resource quota. The default value is '*' which
means any hosts. To exclude a host or hostgroup from the rule, the name can be prefixed with the
'!' sign.
Basic Configuration
Single Resource Quota Rule
Resource Quota rules specify the filter criteria that a job must match and the resulting limit
that is taken when a match is found.
A rule must always begin with the keyword "limit". The order of the filter criteria is not important
to define and input a rule. After sending the new rule set to the qmaster the rules will be ordered
automatically to a human readable form.
Scope Lists
To define a rule for more than one filter scope, it is possible to group scopes to a list. The
defined resource limit counts for all objects listed in the scope in sum.
For example we have a consumable virtual_free defined as:
#name shortcut type relop requestable consumable default urgency
#----------------------------------------------------------------------------------------
virtual_free vf MEMORY <= YES YES 1g 0
|
In the rule defined below, both users can use together only 5g of virtual_free:
limit users roland, andre to virtual_free=5g
|
If the administrator wants to limit each of the two users to 5g virtual_free he could define
two rules:
limit users roland to virtual_free=5g
limit users andre to virtual_free=5g
|
This is very cumbersome for large numbers of users or user groups. For this case a rule can be
defined with an expanded list. This would look like:
limit users {roland, andre} to virtual_free=5g
|
If the scope contains a usergroup then it gets also expanded and the limit counts also for each
member of that group.
For example if a hostgroup @lx_hosts contains host durin and carc both rules are equivalent:
1)
limit users * hosts durin to virtual_free=10g
limit users * hosts carc to virtual_free=10g
2)
limit users * hosts {@lx_hosts} to virtual_free=10g
|
NOT Operator
Sometimes it is necessary to define a rule for a userset but exclude some users of that set.
This can be defined by using the NOT operator ('!' sign) in front of the user name. A rule so defined
will not affect the excluded user, even if the user is explicitly added to the rule.
For example, user "roland" is also member of usergroup "staff". If a resource quota rule looks
like this:
limit users @staff,!roland to slots=10
|
the limit will not be effective for user "roland". Even if the resource quota rule looks like this:
limit users @staff,!roland,roland to slots=10
|
the rule will not be effective for user "roland"
Dynamical Limits
Resource Quota rules always define a maximal value of a resource that can be used. In the most
cases these values are static and equal for all matching filter scopes. If administrators want different
rule limits on different scopes then they have to define multiple rules; this leads to a duplication
of nearly identical rules. With the concept of dynamical limits this kind of duplication can be
avoided.
A dynamical limit is a simple algebraic expression used to derive the rule limit value. To be
dynamical the formula can reference a complex attribute whose value is used for the calculation
of the resulting limit. The limit formula expression syntax is that of a summation weighted complex
values, that is:
{w1|$complex1[*w1]}[{+|-}{w2|$complex2[*w2]}[{+|-}...]]
|
Note, no blanks are allowed in the limit formula.
The following example clarifies the use of dynamical limits: Users are allowed to use 5 slots
per CPU on all linux hosts.
limit hosts {@linux_hosts} to slots=$num_proc*5
|
The complex attribute num_proc is defined on all hosts and its value is the processor count on
every host. The limit is calculated by the formula "$num_proc*5" and so is different on some hosts.
On a 2 CPU host users can run 10 slots whereas on a 1 CPU host users only can run 5 slots.
To be able to set the limitation to a well-defined value some prerequisites must be fulfilled
- limit formulas are only possible for INT and DOUBLE.
- The complex must be already defined in the complex list
- The complex must be defined either on global, or queue, or host layer to resolve the value
- The limitation complex must be the same value definition as the referenced complex value
definition (for example slots=INT, num_proc=INT)
- The resource quota rule must be defined as an expanded list for the layer the complex is
defined. (for example hosts {*} for $num_proc reference) It's not allowed to reference a complex
for a sum of scopes (for example hosts * for $num_proc).
In principle all INT or DOUBLE kind of complex values could be referenced but due to time constrains
the first implementation allows only $num_proc in combination with an expanded host list.
Resource Quota Rules and Resource Quota Set Interaction
In practice administrators define some global limits and some limits that only apply for some
resource consumers. These resource quota rules are equitable. But in some cases it's necessary to
define exceptions for some resource consumers. These resource quota rules are not equal and dominate
some others. As a matter of that fact it is necessary to allow the definition of a prioritized rule
list and a rule list that apply all of the time. This is done by grouping one or more singe rules
into a number of rule sets.
Inside one rule set the rules are ordered and the first rule found is used. This is analogous
to firewall rules and generally understood by administrators and allows the prioritization of some
rules. A rule set always results in one or none effective resource quota for a specific request.
All of the configured rule sets apply all of the time. This means if multiple rule sets are defined
the most restrictive set is used and allows to define equitable limits.
The following example clarifies the combination of rules and rule sets. We have a consumable
defined as:
#name shortcut type relop requestable consumable default urgency
#----------------------------------------------------------------------------------------
compiler_lic cl INT <= YES YES 0 0
|
The resource quota sets are defined as:
{
name ruleset1
limit users roland to compiler_lic=3
limit projects * to compiler_lic=2
limit users * to compiler_lic=1
}
{
name ruleset2
limit users * to compiler_lic=20
}
|
The first rule set ruleset1 express:
- user roland is allowed to use 3 compiler_lic resources
- any request submitted in a project is allowed to use 2 compiler_lic resources
- the default value for all other users is 1 compiler_lic resource
The second rule set ruleset2 express:
- all requests together are only allowed to use 20 compiler_lic resources.
Inside ruleset1 the priority is clear defined, user roland will always get 3 compiler_lic resources
even though he matches to "users *" of the last rule in the rule set and even if he would submit
his request in a project. Also the interaction between ruleset1 and ruleset2 is clear defined and
results in a reject if 20 compiler_lic resources are already in use, even if user roland does not
use all of his 2 compiler_lic resources.
CLI - Command Line Interface
qconf
With qconf it is possible to edit the rule sets in an editor session like with the most qconf
switches. To reduce the amount of data presented to the administrator its possible to select only
one rule set for editing.
It's not possible to edit single rules. Because the rules inside the rule set are ordered, the
meaning of a single rule depends on the context of all other rules. Therefore it doesn't make sense
to edit a single rule without presenting the context of the rule.
Switch Descriptions:
- -Arqs fname (add RQS configuration)
Add the resource quota set (RQS) defined in fname to the Grid Engine cluster. Returns 0 on success
and 1 if rqs is already defined. Requires root or manager privileges.
$ more rule_set.txt
{
name rule_set_2
enabled true
description "rule set 2"
}
$ qconf -Arqs rule_set.txt
rd141302@es-ergb01-01 added "rule_set_2" to resource quota set list
$ qconf -Arqs rule_set.txt
resource quota set "rule_set_2" already exists
|
- -Mrqs fname [rqs_name] (modify RQS configuration)
Same as -mrqs (see below) but instead of invoking an editor to modify the RQS configuration the
file fname is considered to contain a changed configuration. The name of the rule set in fname must
be the same as rqs_name. If rqs_name is empty all rule sets are overwritten by the rule sets in
fname. Refer to sge_rqs(5) for details on the RQS configuration format. Returns 0 on success and
1 on error. Requires root or manager privilege.
$ more rule_set.txt
{
name rule_set_3
enabled true
description "rule set 2"
}
$ qconf -Mrqs rule_set.txt rule_set_3
resource quota set "rule_set_3" does not exist
$ qconf -Mrqs rule_set.txt rule_set_4
resource quota set "rule_set_4" does not match rule set definition
$ qconf -Mrqs rule_set.txt
rd141302@es-ergb01-01 modified resource quota set list
|
- -arqs [name] (add new RQS)
Adds a Resource Quota Set (RQS) description under the name lsr_name to the list of RQSs maintained
by Grid Engine (see sge_rqs(5) for details on the format of a RQS definition). Qconf retrieves a
default RQS configuration and executes vi(1) (or $EDITOR if the EDITOR environment variable is set)
to allow you to customize the RQS configuration. Upon exit from the editor, the RQS is registered
with sge_qmaster(8). Returns 0 on success and 1 if rqs is already defined. Requires root/manager
privileges.
$ qconf -arqs
<- {
<- name template
<- enabled true
<- description ""
<- }
-> :q
resource quota set name "template" is not valid
$ qconf -arqs rule_set_1
<- {
<- name rule_set_1
<- enabled true
<- description ""
<- }
-> :wq
rd141302@es-ergb01-01 added "rule_set_1" to resource quota set list
$ qconf -arqs rule_set_1
resource quota set "rule_set_1" already exists
|
- -mrqs [name] (modify RQS configuration)
Retrieves the whole rules or only the specified current configuration for the resource quota set
(RQS), executes an editor (either vi(1) or the editor indicated by the EDITOR environment variable)
and registers the new configuration with the sge_qmaster(8). Refer to sge_rqs(5) for details on
the RQS configuration format. Returns 0 on success and 1 on error. Requires root or manager privilege.
$ qconf -mrqs rule_set_1
<- {
<- name rule_set_1
<- enabled true
<- description ""
<- }
-> :wq
rd141302@es-ergb01-01 modified "rule_set_1" in resource quota set list
$ qconf -mrqs unknown_set
resource quota set "unknown_set" does not exist
$ qconf -mrqs
<- ...
<- name rule_set_1
<- ...
<- name rule_set_2
<- ...
|
- -srqs [name_list] (show RQS configuration)
Show the definition of the resource quota set (RQS) specified by the argument.
$ qconf -srqs
...
name rule_set_1
...
name rule_set_2
...
$ qconf -srqs rule_set_1
...
name rule_set_1
...
|
- -srqsl (show resource quota sets list)
Show a list of the names of all resource quota sets currently configured.
- -drqs name_list (delete RQS)
Deletes the specified resource quota sets (RQS). Returns 0 on success and 1 if rqs_name is unknown.
Requires root/manager privileges.
$ qconf -drqs rule_set_1
rd141302@es-ergb01-01 removed "rule_set_1" from resource quota set list
$ qconf -drqs unknown_rule_set
denied: resource quota set "unknown_rule_set" does not exist
$ qconf -drqs
rd141302@es-ergb01-01 removed resource quota set list
|
- -aattr obj_nm attr_nm val obj_id_lst
See qconf(1)
$ qconf -srqs ruleset_1
{
name ruleset_1
enabled true
limit users @eng to slots=10
limit name arch_rule users @eng to arch=lx24-amd64
}
$ qconf -aattr resource_quota limit slots=20 ruleset_1/1
No modification because "slots" already exists in "limit" of "ruleset_1/1"
$ qconf -aattr resource_quota limit compiler_lic=5 rule_1/1
rd141302@es-ergb01-01 modified "ruleset_1/1" in rqs list
$ qconf -aattr resource_quota limit arch=sol-sparc64 rule_1/arch_rule
No modification because "arch" already exists in "limit" of "ruleset_1/1"
|
- -Aaatr obj_nm fname obj_id_lst
See qconf(1)
$ more resource.txt
limit slots=20
$ qconf -Aattr resource_quota resource.txt ruleset_1/1
No modification because "slots" already exists in "limit" of "ruleset_1/1"
$ more resource2.txt
limit compiler_lic=5
$ qconf -Aattr resource_quota resource2.txt ruleset_1/1
rd141302@es-ergb01-01 modified "ruleset_1/1" in resource_quota list
|
- -dattr obj_nm attr_nm val obj_id_lst
See qconf(1)
$ qconf -dattr resource_quota limit compiler_lic=5 rule_1/1
rd141302@es-ergb01-01 modified "ruleset_1/1" in rqs list
$ qconf -dattr resource_quota limit compiler_lic=5 rule_1/1
"compiler_lic" does not exist in "limit" of "resource_quota"
|
- -Dattr obj_nm fname obj_id_lst
See qconf(1)
$ more resource.txt
limit compiler_lic=20
$ qconf -Dattr resource_quota resource.txt rule_1/1
rd141302@es-ergb01-01 modified "ruleset_1/1" in resource_quota list
$ qconf -Dattr resource_quota resource.txt rule_1/1
"compiler_lic" does not exist in "limit" of "resource_quota"
|
- -mattr obj_nm attr_nm val obj_id_lst
See qconf(1)
$ qconf -mattr resource_quota limit slots=5 rule_1/1
rd141302@es-ergb01-01 modified "ruleset_1/1" in resource_quota list
$ qconf -mattr resource_quota limit new_resource=5 rule_1/1
Unable to find "new_resource" in "limit" of "resource_quota" - Adding new element.
$ qconf -mattr resource_quota enabled false rule_1
rd141302@es-ergb01-01 modified "ruleset_1" in resource_quota list
|
- -Mattr obj_nm fname obj_id_lst
See qconf(1)
$ more resource.txt
limit slots=20
$ qconf -Mattr resource_quota resource.txt ruleset_1/1
rd141302@es-ergb01-01 modified "ruleset_1/1" in resource_quota list
$ more resource2.txt
limit new_resource=5
$ qconf -Mattr resource_quota resource2.txt ruleset_1/1
Unable to find "new_resource" in "limit" of "resource_quota" - Adding new element.
|
- -rattr obj_nm attr_nm val obj_id_lst
See qconf(1)
- -Rattr obj_nm fname obj_id_lst
See qconf(1)
qstat
Switch Descriptions:
Additional Output
Example:
cannot run on cluster because exceeds limit in rule_set_1
cannot run on host "bla" because exceeds limit in rule_set_1
cannot run on queue instance "all.q@host" because exceeds limit in rule_set_1
|
To be consistent with qquota the default value of user_list changes from * (all users) to the
calling user.
qquota
The qquota command is a diagnose tool for the resource resource quotas. The output is a table
with the following rows:
resource quota rule | limit | filter
|
For each matched rule per rule set a line is printed if the usage count is not 0 for this rule.
If one rule contains more than one resource attribute then one line is printed per resource attribute.
By default it shows the effective limits for the calling user and for all other filter criteria
like project or pe the wildcard "*" is used which means not explicit is used.
The output for the limit table is:
complex=used/limit (for example slots 2/20)
|
complex=value (for example arch lx24-amd64)
|
The administrator and the user may define files (analogue to sge_qstat(5)), which can contain
any of the options described below. A cluster-wide sge_qquota file may be placed under $SGE_ROOT/$SGE_CELL/common/sge_qquota
The user private file is searched at the location $HOME/.sge_qquota. The home directory request
file has the highest precedence over the cluster global file. Command line can be used to override
the flags contained in the files.
Example:
- All users together should never take more than 20 slots
- All users should maximal take 5 slots on all linux hosts
- Every user is restricted to one slot per linux host, only user "roland" is restricted
to 2 slots and all other slots on hosts are set to 0
Rule Set:
{
name maxujobs
limit users * to slots=20
}
{
name max_linux
limit users * hosts @linux to slots=5
}
{
name max_per_host
limit users roland hosts {@linux} to slots=2
limit users {*} hosts {@linux} to slots=1
limit users * hosts * to slots=0
}
|
qstat Output:
$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------
27 0.55500 Sleeper roland r 02/21/2006 15:53:10 all.q@carc 1
29 0.55500 Sleeper roland r 02/21/2006 15:53:10 all.q@carc 1
30 0.55500 Sleeper roland r 02/21/2006 15:53:10 all.q@durin 1
26 0.55500 Sleeper roland r 02/21/2006 15:53:10 all.q@durin 1
28 0.55500 Sleeper user1 r 02/21/2006 15:53:10 all.q@durin 1
|
qquota Output:
$ qquota # as user roland
resource quota rule limit filter
--------------------------------------------------------------------------------
maxujobs/1 slots=5/20 -
max_linux/1 slots=5/5 hosts @linux
max_per_host/1 slots=2/2 users roland hosts durin
max_per_host/1 slots=2/2 users roland hosts carc
$ qquota -h durin # as user roland
resource quota limit filter
--------------------------------------------------------------------------------
maxujobs/1 slots=5/20 -
max_linux/1 slots=5/5 hosts @linux
max_per_host/1 slots=2/2 users roland hosts durin
$ qquota -u user1
resource quota limit filter
--------------------------------------------------------------------------------
maxujobs/1 slots=5/20 -
max_linux/1 slots=5/5 hosts @linux
max_per_host/1 slots=1/2 users user1 hosts durin
$ qquota -u *
resource quota limit filter
--------------------------------------------------------------------------------
maxujobs/1 slots=5/20 -
max_linux/1 slots=5/5 hosts @linux
max_per_host/1 slots=2/2 users roland hosts carc
max_per_host/1 slots=2/2 users roland hosts durin
max_per_host/1 slots=1/2 users user1 hosts durin
|
|
qquota XML Schema:
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xsd:element name="qquota_result">
<xsd:sequence>
<xsd:element name="qquota_rule" type="QQuotaRuleType" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:element>
<xsd:complexType name="QQuotaRuleType">
<xsd:sequence>
<xsd:element name="user" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="xuser" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="project" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="xproject" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="pe" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="xpe" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="queue" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="xqueue" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="host" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="xhost" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="limit" type="ResourceLimitType" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="name" type="xsd:string" use="required"/>
</xsd:complexType>
<xsd:complexType name="ResourceLimitType">
<xsd:attribute name="resource" type="xsd:string" use="required"/>
<xsd:attribute name="limit" type="xsd:string" use="required"/>
<xsd:attribute name="value" type="xsd:string" use="optional"/>
</xsd:complexType>
</xsd:schema>
|
Internal data structures:
Additional Cull Lists
All lists are used by qmaster and scheduler
File sge_resource_quotaL.h
#ifndef __SGE_RESOURCE_QUOTAL_H
#define __SGE_RESOURCE_QUOTAL_H
#include "sge_boundaries.h"
#include "cull.h"
#ifdef __cplusplus
extern "C" {
#endif
/* *INDENT-OFF* */
/* Resource Quota Set */
enum {
RQS_name = RQS_LOWERBOUND,
RQS_description,
RQS_enabled,
RQS_rule
};
LISTDEF(RQS_Type)
JGDI_ROOT_OBJ(ResourceQuotaSet, SGE_RQS_LIST, ADD | MODIFY | DELETE | GET | GET_LIST)
JGDI_EVENT_OBJ(ADD(sgeE_RQS_ADD) | MODIFY(sgeE_RQS_MOD) | DELETE(sgeE_RQS_DEL) | GET_LIST(sgeE_RQS_LIST))
SGE_STRING(RQS_name, CULL_PRIMARY_KEY | CULL_HASH | CULL_UNIQUE | CULL_SPOOL)
SGE_STRING(RQS_description, CULL_DEFAULT | CULL_SPOOL)
SGE_BOOL(RQS_enabled, CULL_DEFAULT | CULL_SPOOL)
SGE_LIST(RQS_rule, RQR_Type, CULL_DEFAULT | CULL_SPOOL)
LISTEND
NAMEDEF(RQSN)
NAME("RQS_name")
NAME("RQS_description")
NAME("RQS_enabled")
NAME("RQS_rule")
NAMEEND
#define RQSS sizeof(RQSN)/sizeof(char*)
/* Resource Quota Rule */
enum {
RQR_name = RQR_LOWERBOUND,
RQR_filter_users,
RQR_filter_projects,
RQR_filter_pes,
RQR_filter_queues,
RQR_filter_hosts,
RQR_limit,
RQR_level
};
LISTDEF(RQR_Type)
JGDI_OBJ(ResourceQuotaRule)
SGE_STRING(RQR_name, CULL_PRIMARY_KEY | CULL_HASH | CULL_UNIQUE | CULL_SPOOL)
SGE_OBJECT(RQR_filter_users, RQRF_Type, CULL_DEFAULT | CULL_SPOOL)
SGE_OBJECT(RQR_filter_projects, RQRF_Type, CULL_DEFAULT | CULL_SPOOL)
SGE_OBJECT(RQR_filter_pes, RQRF_Type, CULL_DEFAULT | CULL_SPOOL)
SGE_OBJECT(RQR_filter_queues, RQRF_Type, CULL_DEFAULT | CULL_SPOOL)
SGE_OBJECT(RQR_filter_hosts, RQRF_Type, CULL_DEFAULT | CULL_SPOOL)
SGE_LIST(RQR_limit, RQRL_Type, CULL_DEFAULT | CULL_SPOOL)
SGE_ULONG(RQR_level, CULL_DEFAULT | CULL_JGDI_RO)
LISTEND
NAMEDEF(RQRN)
NAME("RQR_name")
NAME("RQR_filter_users")
NAME("RQR_filter_projects")
NAME("RQR_filter_pes")
NAME("RQR_filter_queues")
NAME("RQR_filter_hosts")
NAME("RQR_limit")
NAME("RQR_level")
NAMEEND
#define RQRS sizeof(RQRN)/sizeof(char*)
enum {
FILTER_USERS = 0,
FILTER_PROJECTS,
FILTER_PES,
FILTER_QUEUES,
FILTER_HOSTS
};
enum {
RQR_ALL = 0,
RQR_GLOBAL,
RQR_CQUEUE,
RQR_HOST,
RQR_QUEUEI
};
/* Resource Quota Rule Filter */
enum {
RQRF_expand = RQRF_LOWERBOUND,
RQRF_scope,
RQRF_xscope
};
LISTDEF(RQRF_Type)
JGDI_OBJ(ResourceQuotaRuleFilter)
SGE_BOOL(RQRF_expand, CULL_DEFAULT | CULL_SPOOL)
SGE_LIST(RQRF_scope, ST_Type, CULL_DEFAULT | CULL_SPOOL)
SGE_LIST(RQRF_xscope, ST_Type, CULL_DEFAULT | CULL_SPOOL)
LISTEND
NAMEDEF(RQRFN)
NAME("RQRF_expand")
NAME("RQRF_scope")
NAME("RQRF_xscope")
NAMEEND
#define RQRFS sizeof(RQRFN)/sizeof(char*)
/* Resource Quota Rule Limit */
enum {
RQRL_name = RQRL_LOWERBOUND,
RQRL_value,
RQRL_type,
RQRL_dvalue,
RQRL_usage,
RQRL_dynamic
};
LISTDEF(RQRL_Type)
JGDI_OBJ(ResourceQuotaRuleLimit)
SGE_STRING(RQRL_name, CULL_PRIMARY_KEY | CULL_HASH | CULL_UNIQUE | CULL_SPOOL)
SGE_STRING(RQRL_value, CULL_DEFAULT | CULL_SPOOL)
SGE_ULONG(RQRL_type, CULL_DEFAULT | CULL_SPOOL | CULL_JGDI_RO)
SGE_DOUBLE(RQRL_dvalue, CULL_DEFAULT | CULL_SPOOL | CULL_JGDI_RO)
SGE_LIST(RQRL_usage, RUE_Type, CULL_DEFAULT | CULL_JGDI_RO)
SGE_BOOL(RQRL_dynamic, CULL_DEFAULT | CULL_JGDI_RO)
LISTEND
NAMEDEF(RQRLN)
NAME("RQRL_name")
NAME("RQRL_value")
NAME("RQRL_type")
NAME("RQRL_dvalue")
NAME("RQRL_usage")
NAME("RQRL_dynamic")
NAMEEND
#define RQRLS sizeof(RQRLN)/sizeof(char*)
/* *INDENT-ON* */
#ifdef __cplusplus
}
#endif
#endif /* __SGE_RESOURCE_QUOTAL_H */
|
Additional GDI requests
- SGE_GDI_ADD(RQS, resource_quota)
This request allows for adding a new resource quota set. It contains the complete rule set configuration
and is used for implementing the qconf option '-arqs' and '-Arqs'.
- SGE_GDI_MOD(RQS, resource_quota)
This request allows for changing the complete resource quota set configuration. It contains a full rule
set configuration and is used for implementing qconf option '-mrqs' and '-Mrqs'.
- SGE_GDI_DEL(RQS, resource_quota)
This request allows for removing a complete resource quota set configuration. It contains only the name
of the resource quota to be removed and is used for implementing the qconf option '-drqs'.
- SGE_GDI_GET(RQS,where,what)
This request allows for retrieving resource quota sets. CULL 'where' expressions can be used for selecting
particular rule sets, CULL 'what' expressions can be used for selecting particular rule set fields.
The SGE_GDI_GET request is used for implementing the qconf option '-srqs'.
- SGE_GDI_MOD(LISR, resource_quota, fields)
- SGE_GDI_MOD(LISR, resource_quota, fields) + SGE_GDI_SET()
These requests are a SGE_GDI_MOD(LISR, resource_quota) variation and allow for replacing the selected
fields within a resource quota. Field selection is done by means of an incomplete resource quota set
configuration structure. The requests are used for implementing qconf options '-rattr' and '-Rattr'.
- SGE_GDI_MOD(LISR, resource_quota, fields) + SGE_GDI_APPEND(rule_identifiers, list_elements)
This request allows for adding one or more list elements regarding to one or more rule identifiers to
each of the selected list fields within a resource quota set configuration. Field selections are done
by means of an incomplete rule set configuration structure. The rule_identifiers of each tuple below
each selected rule set field are used to define which rule should be modified. All list elements belonging
to each tuple are added. Already existing list elements are silently overwritten, also if the selected
rule configuration is not a list field this silently overwrites the current setting.The request is for
implementing the qconf option '-aatrr' and '-Aattr'.
- SGE_GDI_MOD(LISR, resource_quota, fields) + SGE_GDI_CHANGE(rule_identifiers, list_elements)
This request allows for replacing one or more list elements regarding to one or more rule identifiers
to each of the selected list fields within a resource quota set configuration. Field selections are
done by means of an incomplete rule set configuration structure. The request is for implementing the
qconf option -mattr' and '-Mattr'
- SGE_GDI_MOD(RQS, resource_quota, fields) + SGE_GDI_REMOVE(rule_identifiers, list_elements)
This request allows for removing one or more list elements regarding to one or more rule identifiers
to each of the selected list fields within a resource quota set configuration. Field selections are
done by means of an incomplete rule set configuration structure. The request is for implementing the
qconf option -dattr' and '-dattr'
Additional Event Client requests
This event is sent once directly after event client registration to initialize the resource quota set
list and contains the complete list of all resource quota sets with all configuration.
- sgeE_RQS_ADD(resource_quota)
This event is sent each time when a new resource quota set configuration has been created. It contains
the full resource quota set configuration, but no usage information.
- sgeE_RQS_DEL(resource_quota)
This event is sent each time when an existing resource quota set configuration is removed and contains
only the name of the resource quota to be removed.
- sgeE_RQS_MOD(resource_quota)
This event is sent each time when an existing resource quota set configuration changes. It contains
the full resource quota set configuration.
- sgeE_RQS_ADD(resource_quota, rule_identifier, usage)
- sgeE_RQS_MOD(resource_quota, rule_identifier, usage)
- sgeE_RQS_DEL(resource_quota, rule_identifier, usage)
These events are send each time when a usage object was added, modified or deleted. The resource_quota
and rule_identifier contains only the name of the object to be edited. The usage object is the object
to be modified.
Qmaster additions:
- add cull rule set definition (Internal data structures)
- spooling code for the rule sets (mainly for classic spooling)
- update resource usage in all rules
Scheduler additions:
- Create Resource Reservation Structure
- prepare_resource_schedules()
- Scheduling matching code
- sge_sequential_assignment()
- sge_select_parallel_environment()
- Debit Code
lib additions:
- add code for book keeping of resource usage
Book keeping of usage:
- started jobs
- finished/deleted/running jobs
- (suspended jobs)
- object modify (queue host)
- object add/delete (queue host)
A customer asked about how to limit total number of certain jobs on an SGE cluster. The customer
wants to limit certain jobs running on the cluster in such a way that only one job can be allowed
to run on each execution host but no more than a given number of concurrent jobs can be allowed
at a given time. The reason to limit the total number of concurrent jobs is to avoid that
the jobs may create any resource contention.
The customer is running an old version of SGE, which doesn't have the resource quota set feature
available starting SGE 6.1 release. It is doable with pre-SGE 6.1 release but it requires
a lot of work as compared to what can be done with the SGE resource quota set.
The following demonstrates how easy it is to set up such a customization with the SGE resource
quota set feature.
First thing to do is to create a resource counter that tracks how many jobs are being executed.
Using the SGE complex parameter, one can define:
# qconf -sc
#name
shortcut type relop requestable consumable default
urgency
#------------------------------------------------------------------------------------
concurjob ccj
INT <= FORCED
YES 0 0
...
Now, all these special jobs should be executed on a special queue called "archive" queue.
The archive queue will be configured so that all these special jobs must use the special resource
counter when submitting the job.
# qconf -sq archive
qname
archive
...
complex_values
concurjob=1
...
As shown above, only one job will be scheduled to the archive queue instance per machine.
Now it's time to control the total number of such jobs globally. This can be done very easily
with the resource quota set (RQS). The following command can be used to create such a RQS
rule.
# qconf -arqs
{
name
limit_concur_jobs
description NONE
enabled TRUE
limit to concurjob=10
}
The red-colored, italicized entries are actually modified on the template. This will complete
all the customization that can limit the total number of special jobs running concurrently on the
entire SGE cluster.
Now when you submit a special job to the archive queue, you must use the "-l concurjob=1" resource
request, which in turn, will be used to monitor how many those special jobs are being run.
The following shows an example. For demonstration purpose, the archive queue is modified to accommodate
two jobs per queue instance and the total number of allowed concurrent jobs to be 1.
s4u-80a-bur02# qconf -sq archive |egrep 'host|archive|concur'
qname
archive
hostlist @allhosts
complex_values
concurjob=2
s4u-80a-bur02# qconf -srqs
{
name limit_concur_jobs
description NONE
enabled TRUE
limit to
concurjob=1
}
s4u-80a-bur02# qsub -b y -o /dev/null -j
y -l ccj=1 sleep 3600
Your job 53 ("sleep") has been submitted
s4u-80a-bur02# qsub -b y -o /dev/null -j y -l ccj=1 sleep 3600
Your job 54 ("sleep") has been submitted
s4u-80a-bur02# qstat -f
queuename
qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
archive@s4u-80a-bur02 BIP 0/2/10
0.02 sol-sparc64
53 0.55500 sleep root
r 10/24/2008 15:05:59 1
############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
54 0.00000 sleep root
qw 10/24/2008 15:05:57 1
s4u-80a-bur02# qstat -j 54
...
scheduling info:
cannot run because it exceeds limit "/////" in rule "limit_concur_jobs/1"
As observed here, the job 54 is waiting to be scheduled when the resources become available.
publicsoftwaresge [HPC - Wiki]
Sun Grid Engine
This page describes the scheduling schema that we use on our clusters. While these may not represent
the exact configuration they should give you an idea of how jobs should be processed and how the queues
should behave and general use of SGE.
FIFO Queue
This is the default Queuing system used by our clusters. There is a single Queue and there is no automatic
load balancing, jobs go in, jobs come out. This method is useful for small groups of users that can
effectively communicate and manage the sharing of the cluster in 'real life'
"Share Tree" Queue
This is modeled after AceNet's queuing system (as of 16/04/2009). Many thanks to Ross of AceNet for
helping me get all of the configuration information for this setup. The Share Tree configuration is
generated based on the users and their respective projects. Jobs should be queued based on a shares
system. There are a total number of shares in the system any one group/user is allocated a base number
of shares. As they use them up the priority of their job decreases. Overtime they will slowly regain
shares up to the maximum allotted. This will not allow users to override jobs that have already been
submitted nodes it simply affects who gets the next open spot.
Example
- Nadia submits 1000 jobs, 100 of them are scheduled leaving 900 in the queue
900 `Nadia` jobs are now waiting
- Arnaud submits 500 jobs, there are no open slots leaving 1400 in the queue
900 `Nadia` & 500 `Arnaud` jobs are now waiting
- Nadia's first 10 jobs finish, there are now ten slots open. As Arnaud hasn't
run anything yet his jobs should be given priority
900 `Nadia` & 490 `Arnaud` jobs are now waiting
While this system isn't perfectly fair in the short-term (Nadia got 100% of the cluster) it should
even out Per User CPU time over longer periods of time.
Outage Calendaring
Using the SGE calendars we are able to prevent jobs from starting that would not finish in time for
a scheduled outage. This alows us to gracefully schedule time in the future without preventing all jobs
from being scheduled. If a job would run into the scheduled outage time it is simply 'held' until after
the outage is over. In order for the calendar to act as expected you should have the h_rt resource forced
for all jobs
qconf -mc
** Edit h_rt line and set it to FORCED **
h_rt h_rt TIME <= FORCED NO 0:0:0 0
This will require the additional -l h_rt=HH:MM:SS definition with all queued jobs. The hard CPU wall
time limit is used by SGE to determine if the job would run into the scheduled outage. Without this
the default h_rt time is used, which is often very large, and or incorrect.
Example Calendar
calendar_name outage
year 21.may.2009=14-24=off 22.may.2009=0-8=off
week NONE
To create the calendar if it doesn't already exist, create a file with the above content (adjusted
as needed) and then run
qconf -Acal outage
This will create the calendar, but it will not add it to any Queues. If you want to add it to all
queues run
for q in `qconf -sql`; do qconf -mattr queue calendar outage $q; don
Below is a collection of useful scripts / tools that we've created to interact with our SGE installations.
re-enable_queues.sh - This script allows you to disable or enable SGE queues while maintaining the
existing queue states. This can be useful if you've disabled a few queues, and one them to remain disabled
even after re-enabling everything else.
Limiting Slots by Queue or User
On one of our clusters we have the need to limit the total jobs of a certain type that are running at
any one time. This is done using the resource quotas in SGE.
qconf -mrqs
--
{
name ioload_limit
description "Limit the jobs in the iolimit queue"
enabled TRUE
limit queues ioload.q to slots=75
}
In the above example we are limiting the number of slots in the queue "ioload.q" to a total of 75
irregardless of the total number of slots possible in the queue. You can verify this is set correctly
by running qquota
. You can also limit users slots in this same manner by changing the limit
line to limit users kvollmer to slots=100
resource quota rule limit filter
--------------------------------------------------------------------------------
ioload_limit/1 slots=75/75 queues ioload.q
Submitting a Job
When submitting a job to SGE there are a number of different flags that can be set to modify how the
job is going to behave.
Pass current working directory
: This should be set on most of your jobs, it takes the
current path and passes that into the job. This can greatly simplify launching of jobs.
#$ -cwd
Name of the Job
: This is useful for identification purposes, this doesn't affect how
the jobs runs
#$ -N dos_Na_1k_150
Output File
: This sends the STDOUT of the job to the specified file. This can be useful
for gathering warnings and notices that occur during your job run. If this is not specified the output
is NAME.JOBID
#$ -o Na.out
Hard Virtual Memory
: This sets the hard virtual memory limit of your job. Your job will
not be able to request more then this amount. This setting can be useful if you need to run your job
on a node with, at least this much memory, or if you are afraid that the job might have a memory leak
and would like it to be terminated if it goes over the specified amount.
#$ -l h_vmem=4G
Hard Wall Time
: This sets the hard wall time limit. The format is HH:MM:SS This is required
on most of our clusters. If your job runs longer then the specified time it will be terminated by SGE.
#$ -l h_rt=48:00:00
Pass Environment
: This passes the current environment variables to the job. This is
very useful when using non-standard applications, or applications that require additional path designations.
This is recommended for all jobs.
#$ -V
Parallel Environment
: For most jobs the MPI parallel environment will work. The format
is -pe [ENVIRONMENT] [SLOT COUNT]
#$ -pe mpi 2
Job Shell
: This defines the shell that the job should be launched in. The default shell
on most clusters is BASH.
#$ -S /bin/sh
Example Job
#$ -S /bin/sh
#$ -cwd
#$ -N MyJob5
#$ -o Job.out
#$ -l h_vmem=4G
#$ -l h_rt=336:00:00
#$ -V
#$ -pe mpi 16
mpirun /share/apps/admin/example/mpijob/mpi-ring
Once you have your job submission script setup all you have to do to get it into the queue is run
qsub myjob
This will submit it to the queue and based on your priority and node availability it will be run
right away or put in a qw
status indicating that is it waiting to be run. You can check
on the status of your job by running qstat
. If you want to see the status of everyones
jobs run qstat -u \*
Debugging SGE_QMASTER
source /$SGE_ROOT/util/dl.sh
dl 1 # debug level 1 2 3
Softpanorama Recommended
Resource management
Managing Resources Abstractly
Consumable Resources
Setting Up Load Sensors to
Track Resource Availablility/Utilization
Different resource
management approaches with Grid Engine
Tracking interactive idle time of
desktop workstations
Relocating Jobs From a User's Workstation
Grid Engine Enterprise Edition
Sun Grid Engine, Enterprise Edition --
Configuration Use Cases and Guidelines
Scheduler Policies for Job Prioritization
in the N1 Grid Engine 6 System
File Staging
Logical resource
expressions
Resource quotas
sge_resource_quota.5
Ubuntu Manpage
SGE_resource_quota - Sun Grid Engine resource quota file format
ResourceQuotaSpecification
Limiting Certain Concurrent
Jobs on SGE Cluster (Shared Pool)
RQS – gridengine.info
Grid Engine
6 Policies bioteam.net
RQS Common Uses - GridWiki
publicsoftwaresge [HPC - Wiki]
Reference
Name
SGE_resource_quota - Sun Grid Engine resource quota file format
Description
Resource quota sets (rqs) are a flexible way to set a maximum resource consumption for
any job requests. They are used by the scheduler to select the next possible jobs for
running. The job request distinction is done by a set of user, project, cluster queue, host
and pe filter criteria.
By using the resource quota sets administrators are allowed to define a fine granular
resource quota configuration. This helps restricting some job requests to a lesser resource
usage and granting other job requests a higher resource usage.
Note: Jobs requesting an Advance Reservation (AR) are not honored by Resource Quotas and
are neither subject of the resulting limit, nor are debited in the usage consumption.
A list of currently configured rqs can be displayed via the
qconf(1) -srqsl option. The
contents of each listed rqs definition can be shown via the -srqs switch. The output
follows the SGE_resource_quota format description. New rqs can be created and
existing can be modified via the -arqs, -mrqs and -drqs options to
qconf(1).
A resource quota set defines a maximum resource quota for a particular job request. All
of the configured rule sets apply all of the time. This means that if multiple resource
quota sets are defined, the most restrictive set is used.
Every resource quota set consist of one or more resource quota rules. These rules are
evaluated in order, and the first rule that matches a specific request will be used. A
resource quota set always results in at most one effective resource quota rule for a
specific request.
Note, Sun Grid Engine allows backslashes (\) be used to escape newline (\newline)
characters. The backslash and the newline are replaced with a space (" ") character before
any interpretation.
Format
A resource quota set definition contains the following parameters:
name
- The resource quota set name.
enabled
- If set to true the resource quota set is active and will be considered for
scheduling decisions. The default value is false.
description
- This description field is optional and can be set to arbitrary string. The default
value is NONE.
limit
- Every resource quota set needs at least one resource quota rule definition started
by the limit field. It's possible to define more resource quota rules divided by a new
line. A resource quota rule consists of an optional name, the filters for a specific job
request and the resource quota limit.
By default, the expressed limit counts for the entire filter scope. To express a
filter-scope-specific limit, it's possible to define an expanded list by setting the
list between '{' '}'. It's only possible to set one complete filter in an expanded list.
The tags for expressing a resource quota rule are:
- name
The name of the rule. The use is optional. The rule name must be unique in one
resource quota set.
users
Contains a comma separated list of UNIX users or ACLs (see access_list(5)).
This parameter filters for jobs by a user in the list or one of the ACLs in the list.
Any user not in the list will not be considered for the resource quota rule. The default
value is '*', which means any user. An ACL is differentiated from a UNIX user name by
prefixing the ACL name with an '@' sign. To exclude a user or ACL from the rule, the
name can be prefixed with the '!' sign. Defined UNIX user or ACL names need not be known
in the Sun Grid Engine configuration.
- projects
- Contains a comma separated list of projects (see project(5)). This parameter
filters for jobs requesting a project in the list. Any project not in the list will not
be considered for the resource quota rule. If no project filter is specified, all
projects and jobs with no requested project match the rule. The value '*' means all jobs
with requested projects. To exclude a project from the rule, the name can be prefixed
with the '!' sign. The value '!*' means only jobs with no project requested.
- pes
Contains a comma separated list of PEs (see sge_pe(5)). This parameter filters
for jobs requesting a pe in the list. Any PE not in the list will not be considered for
the resource quota rule. If no pe filter is specified, all pe and jobs with no requested
pe matches the rule. The value '*' means all jobs with requested pe. To exclude a pe
from the rule, the name can be prefixed with the '!' sign. The value '!*' means only
jobs with no pe requested.
queues
Contains a comma separated list of cluster queues (see queue_conf(5)). This
parameter filters for jobs that may be scheduled in a queue in the list. Any queue not
in the list will not be considered for the resource quota rule. The default value is
'*', which means any queue. To exclude a queue from the rule, the name can be prefixed
with the '!' sign.
hosts
Contains a comma separated list of host or hostgroups (see host(5) and hostgroup(5)). This parameter filters for jobs that may be scheduled in a host in
the list or a host contained in a hostgroup in the list. Any host not in the list will
not be considered for the resource quota rule. The default value is '*', which means any
hosts. To exclude a host or hostgroup from the rule, the name can be prefixed with the
to
This mandatory field defines the quota for resource attributes for this rule. The
quota is expressed by one or more limit definitions separated by commas. The
configuration allows two kind of limits definitions
- static limits
Static limits sets static values as quotas. Each limits consists of a complex attribute
followed by an "=" sign and the value specification compliant with the complex attribute
type (see complex(5)).
- dynamic limits
A dynamic limit is a simple algebraic expression used to derive the limit value. To be
dynamic, the formula can reference a complex attribute whose value is used for the
calculation of the resulting limit. The formula expression syntax is that of a sum of
weighted complex values, that is:
{w1|$complex1[*w1]}[{+|-}{w2|$complex2[*w2]}[{+|-}...]]
The weighting factors (w1, ...) are positive integers or floating point numbers in double
precision. The complex values (complex1, ...) are specified by the name defined as type INT
or DOUBLE in the complex list (see
complex(5)).
Note: Dynamic limits can only configured for a host-specific rule.
- Please note that resource quotas are not enforced as job resource limits. Limiting
for example h_vmem in a resource quota set does not result in a memory limit being set
for job execution.
Examples
The following is the simplest form of a resource quota set. It restricts all users
together to the maximal use of 100 slots in the whole cluster.
=======================================================================
{
name max_u_slots
description "All users max use of 100 slots"
enabled true
limit to slots=100
}
=======================================================================
The next example restricts user1 and user2 to 6g virtual_free and all other users to the
maximal use of 4g virtual_free on every host in hostgroup lx_hosts.
=======================================================================
{
name max_virtual_free_on_lx_hosts
description "resource quota for virtual_free restriction"
enabled true
limit users {user1,user2} hosts {@lx_host} to virtual_free=6g
limit users {*} hosts {@lx_host} to virtual_free=4g
}
=======================================================================
The next example shows the use of a dynamic limit. It restricts all users together to a
maximum use of the double size of num_proc.
=======================================================================
{
name max_slots_on_every_host
enabled true
limit hosts {*} to slots=$num_proc*2
}
=======================================================================
See Also
sge_intro(1),
access_list(5),
complex(5),
host(5),
hostgroup(5)
qconf(1),
qquota(1),
project(5). qconf(1),
qquota(1)
Referenced By
qrsub(1)
Society
Groupthink :
Two Party System
as Polyarchy :
Corruption of Regulators :
Bureaucracies :
Understanding Micromanagers
and Control Freaks : Toxic Managers :
Harvard Mafia :
Diplomatic Communication
: Surviving a Bad Performance
Review : Insufficient Retirement Funds as
Immanent Problem of Neoliberal Regime : PseudoScience :
Who Rules America :
Neoliberalism
: The Iron
Law of Oligarchy :
Libertarian Philosophy
Quotes
War and Peace
: Skeptical
Finance : John
Kenneth Galbraith :Talleyrand :
Oscar Wilde :
Otto Von Bismarck :
Keynes :
George Carlin :
Skeptics :
Propaganda : SE
quotes : Language Design and Programming Quotes :
Random IT-related quotes :
Somerset Maugham :
Marcus Aurelius :
Kurt Vonnegut :
Eric Hoffer :
Winston Churchill :
Napoleon Bonaparte :
Ambrose Bierce :
Bernard Shaw :
Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient
markets hypothesis :
Political Skeptic Bulletin, 2013 :
Unemployment Bulletin, 2010 :
Vol 23, No.10
(October, 2011) An observation about corporate security departments :
Slightly Skeptical Euromaydan Chronicles, June 2014 :
Greenspan legacy bulletin, 2008 :
Vol 25, No.10 (October, 2013) Cryptolocker Trojan
(Win32/Crilock.A) :
Vol 25, No.08 (August, 2013) Cloud providers
as intelligence collection hubs :
Financial Humor Bulletin, 2010 :
Inequality Bulletin, 2009 :
Financial Humor Bulletin, 2008 :
Copyleft Problems
Bulletin, 2004 :
Financial Humor Bulletin, 2011 :
Energy Bulletin, 2010 :
Malware Protection Bulletin, 2010 : Vol 26,
No.1 (January, 2013) Object-Oriented Cult :
Political Skeptic Bulletin, 2011 :
Vol 23, No.11 (November, 2011) Softpanorama classification
of sysadmin horror stories : Vol 25, No.05
(May, 2013) Corporate bullshit as a communication method :
Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000):
the triumph of the US computer engineering :
Donald Knuth : TAoCP
and its Influence of Computer Science : Richard Stallman
: Linus Torvalds :
Larry Wall :
John K. Ousterhout :
CTSS : Multix OS Unix
History : Unix shell history :
VI editor :
History of pipes concept :
Solaris : MS DOS
: Programming Languages History :
PL/1 : Simula 67 :
C :
History of GCC development :
Scripting Languages :
Perl history :
OS History : Mail :
DNS : SSH
: CPU Instruction Sets :
SPARC systems 1987-2006 :
Norton Commander :
Norton Utilities :
Norton Ghost :
Frontpage history :
Malware Defense History :
GNU Screen :
OSS early history
Classic books:
The Peter
Principle : Parkinson
Law : 1984 :
The Mythical Man-Month :
How to Solve It by George Polya :
The Art of Computer Programming :
The Elements of Programming Style :
The Unix Hater’s Handbook :
The Jargon file :
The True Believer :
Programming Pearls :
The Good Soldier Svejk :
The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society :
Ten Commandments
of the IT Slackers Society : Computer Humor Collection
: BSD Logo Story :
The Cuckoo's Egg :
IT Slang : C++ Humor
: ARE YOU A BBS ADDICT? :
The Perl Purity Test :
Object oriented programmers of all nations
: Financial Humor :
Financial Humor Bulletin,
2008 : Financial
Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related
Humor : Programming Language Humor :
Goldman Sachs related humor :
Greenspan humor : C Humor :
Scripting Humor :
Real Programmers Humor :
Web Humor : GPL-related Humor
: OFM Humor :
Politically Incorrect Humor :
IDS Humor :
"Linux Sucks" Humor : Russian
Musical Humor : Best Russian Programmer
Humor : Microsoft plans to buy Catholic Church
: Richard Stallman Related Humor :
Admin Humor : Perl-related
Humor : Linus Torvalds Related
humor : PseudoScience Related Humor :
Networking Humor :
Shell Humor :
Financial Humor Bulletin,
2011 : Financial
Humor Bulletin, 2012 :
Financial Humor Bulletin,
2013 : Java Humor : Software
Engineering Humor : Sun Solaris Related Humor :
Education Humor : IBM
Humor : Assembler-related Humor :
VIM Humor : Computer
Viruses Humor : Bright tomorrow is rescheduled
to a day after tomorrow : Classic Computer
Humor
The Last but not Least Technology is dominated by
two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt.
Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org
was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP)
without any remuneration. This document is an industrial compilation designed and created exclusively
for educational use and is distributed under the Softpanorama Content License.
Original materials copyright belong
to respective owners. Quotes are made for educational purposes only
in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains
copyrighted material the use of which has not always been specifically
authorized by the copyright owner. We are making such material available
to advance understanding of computer science, IT technology, economic, scientific, and social
issues. We believe this constitutes a 'fair use' of any such
copyrighted material as provided by section 107 of the US Copyright Law according to which
such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free)
site written by people for whom English is not a native language. Grammar and spelling errors should
be expected. The site contain some broken links as it develops like a living tree...
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or
referenced source) and are
not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness
of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be
tracked by Google please disable Javascript for this site. This site is perfectly usable without
Javascript.
Last modified:
March, 12, 2019