|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
|
Overview
Grid Engine contains a certain set of load parameters which it tracks automatically. Should it be necessary
to track a load value not tracked by Grid Engine, a load sensor can be used. A load sensor is a small
script which simply outputs one or more name-value pairs to standard out. A name-value pair consists
of a resource name, and its current value.
The example below illustrates how to set up a load sensor to track the amount of /tmp space on each
Grid Engine host. Load sensors to monitor any desirable value can be written by using this as a template.
Once a load sensor is added, the new resource can be used as a load threshhold, or consumable resource.
The steps for adding a load sensor are as follows:
Step 1: Define the resource
Step 2: Configure the resource
Step 3: View/Verify the resource
Step 4: Request the resource
Step 1: Define the resource attributes in the host complex
First, modify the complex called "host" or "global", depending on the type of resource, host-specific
or clusterwide. To modify the complex, click on "Complexes Configuration" on the main toolbar of qmon.
Select "host" or "global", then click on "Modify". If the global host does not exist, it must be created
by clicking on "Add"(the name, global, must be used). In the example below, host-specific resources
are added to the "host" complex.
name shortcut type value relop requestable consumable default ---------------------------------------------------------------------- tmpfree tmpfree MEMORY 0 <= YES YES 0 tmptot tmptot MEMORY 0 <= YES NO 0 tmpused tmpused MEMORY 0 >= NO NO 0
This says: There is a complex attribute called "tmpfree" with the shortcut "tmpfree" of type memory.
The "value" is supplied by the load sensor. It is requestable ("yes"), and it is consumable ("yes").
The "default" should be set to 0.
When using qmon, do not forget to press the "Add" button after entering each line. When all the lines
are entered (all are in the table below), press the "Ok" button to close this window.
The complex may be viewed at the command line by running the following:
% qconf -sc host name shortcut type value relop requestable consumable default ------------------------------------------------------------------------ tmpfree tmpfree MEMORY 0 <= YES YES 0 tmptot tmptot MEMORY 0 <= YES NO 0 tmpused tmpused MEMORY 0 >= NO NO 0
Step 2: Configure the "global" host in the cluster configuration
In this case, the load sensor should run on all hosts (each host will return a different value). Therefore,
the pseudo host "global" needs to be configured. In the case of a floating license, a single host would
be configured to run our load sensor (since the value would be the same and all hosts returning this
same value would be redundant).
In the main qmon window, click on "Cluster Configuration". Highlight "global", then click on "Modify".
On the "General Settings" tab, add the path and name of the load sensor program to the load sensor box.
As an example/template, tmpspace.sh is included below. Once OK is pressed, the load sensor will be automatically
started on each host. This may take several minutes.
Note: The resource names output by the load sensor must be the same as the names added to the host or
global complex. Grid Engine takes the smaller of the two values when determining the current value.
Step 3: View the new global resources
The new global resource can be viewed by running the following:
% qhost -F tmpfree,tmptot,tmpused HOSTNAME ARCH NPROC LOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------- BALROG solaris6 2 1.47 1.0G 974.0M 150.0M 130.0M Host Resource(s): hl:tmpfree=337.744000M hl:tmptot=338.808000M hl:tmpused=1.014709M
See the qhost(1) man page for more information.
Step 4: Requesting a resource
Include the -l switch on the command line to request a resource:
% qsub -l tmpfree=100 myjob.sh
This will dispatch the job only to those machines whose tmpfree value is greater than or equal to 100 MB.
Note on Using a Load Sensor for Floating Licenses
In order to track the number of floating licenses used outside of Grid Engine, a load sensor may be
used in conjunction with a consumable resource. The lessor of the Consumable Resources or the load sensor
value will be used to prevent license oversubscription. The load sensor in this case only needs to run
on a single host. To do this, output the string 'global' in place of the machine name in the load sensor.
# ----------------< tmpspace.sh >----------------------------- #!/bin/sh # Grid Engine will automatically start/stop this script on exec hosts, if # configured properly. See the application note for configuration # instructions or contact [email protected] # fs to check FS=/tmp if [ "$SGE_ROOT" != "" ]; then root_dir=$SGE_ROOT # invariant values myarch=`$root_dir/util/arch` myhost=`$root_dir/utilbin/$myarch/gethostname -name` ende=false while [ $ende = false ]; do # ---------------------------------------- # wait for an input # read input result=$? if [ $result != 0 ]; then ende=true break fi if [ "$input" = "quit" ]; then ende=true break fi # ---------------------------------------- # send mark for begin of load report # NOTE: for global consumable resources not attached # to each machine (ie. floating licenses), the load # sensor only needs to be run on one host. In that case, # echo the string 'global' instead of '$myhost'. echo "begin" dfoutput="`df -k $FS | tail -1`" tmpfree=`echo $dfoutput | awk '{ print $4}'` tmptot=`echo $dfoutput | awk '{ print $2}'` tmpused=`echo $dfoutput | awk '{ print $3}'` echo "$myhost:tmpfree:${tmpfree}k" echo "$myhost:tmptot:${tmptot}k" echo "$myhost:tmpused:${tmpused}k" echo "end" done #----------------------< CUT HERE >--------------------------------