|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
PBS Professional Floating Annual License (Per CPU Core)
If we assume $18 per core that is cited on the Web that means $18K per year for 1K cores (small to medium size cluster those days, as one server/blade usually have 20 to 32 cores).Altair’s PBS Works is a suite of on-demand cloud computing technologies that allows enterprises to maximize ROI on computing infrastructure assets. PBS Works is the most widely implemented software environment for grid-, cluster- and on-demand computing worldwide.
The suite’s flagship product, PBS Professional, provides a flexible, on-demand computing environment that allows enterprises to easily share diverse (heterogeneous) computing resources across geographic boundaries. PBS Professional is a service orientated architecture, field-proven cloud infrastructure software that increases productivity even in the most complex computing environments.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Jun 21, 2018 | community.pbspro.org
How to install pbs on compute node and configure the server and compute node? Users/Site Administrators You have selected 0 posts.
cancel selecting Jun 2016 1 / 9 Jul 2016 Apr 2017 Joey Jun '16 Hi guys,
I am new to HPC and PBS or torque. I am able to install PBS pro from source code on my head node . But not sure how to install the compute node and cconfigure it. I didn't see any documentation in the github either. Can anyone give me some help? Thanksbuchmann Jun '16 Install is pretty similar on the compute nodes - however, you do not need the "server" parts.
createdJun '16 last replyApr '17
There are OK docs on the Altair "pro" site, see answer to previous question "documentation-is-missing/81".In short, you the Altair docs for v13, and/or the INSTALL file procedure. (Or install from pre-build binaries).
Actual method will depend on your system type etc.I prefer to install using pre-compiled RPMs (CentOS72 systems), which presently means that I will compile these from tarball+spec-file (slightly modified spec-file).
Hope this helps.
/Bjarne subhasisb Jun '16 @Joey thanks for joining the pbspro forum.You can find the documentation about pbspro here: https://pbspro.atlassian.net/wiki/display/PBSPro/User+Documentation 730
Kindly do not hesitate to post questions about any specific issues you are facing.
Thanks,
Subhasis Joey Jun '16 1 Thanks for your reply.I rebuild the CentOS72 rpm with the src from Centos7.zip
installed pbspro-server-14.1.0-13.1.x86_64.rpm on mye headnode
installed pbspro-execution-14.1.0-13.1.x86_64.rpm on my compute node.
On the head node
create /var/spool/pbs/server_priv/nodes with following:computenode1 np=1
/etc/pbs.conf:
PBS_SERVER=headnode
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=0
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scpon the compute node
/var/spool/pbs/mom_priv/config as following
$logevent 0x1ff
$clienthost headnode
$restrict_user_maxsysid 999/etc/pbs.conf
PBS_SERVER=headnode
PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scpafter that I start the pbs on headnode and compute node without error:
#/etc/init.d/pbs start
But when I try to run pbsnodes -a, it tells me:
pbsnodes: Server has no node list
If I run a script it will just Queue there.Both server firewalld are turned off and pingable.
Can anyone give me some help? Thanks subhasisb Jul '16 Hi @Joey ,
Unlike torque, pbspro uses a real relational database underneath to store information about nodes, queues, jobs etc. Thus creating a nodes file is not supported under pbspro.
To add a node to pbs cluster, use the qmgr command as follows:
qmgr -c "create node hostname"
HTH
regards,
Subhasis Joey Jul '16 Thanks for your reply. I thought PBS and torque are the same except one is open source and one is commerical. subhasisb Jul '16 Hi @JoeyThey might feel similar since Torque was based on the OpenPBS codebase. OpenPBS was a version of PBS released as opensource many years back.
Post that, Altair engineering has put in a huge amount of effort towards PBS Professional and added tons of features and improvements in terms of scalability, robustness and ease of use over decades which resulted in it becoming the number one work load manager in the HPC world. Altair has now open-sourced PBS Professional.
So, pbspro is actually very different from torque in terms of capability and performance, and is actually a completely different product.
Let us know if you need further information in switching to pbspro.
Thanks and Regards,
Subhasis 10 months later sxy Apr '17 Hi Subhasis,To add a node to pbs cluster, use the qmgr command as follows:
qmgr -c "create node hostname"
if a site has a few hundreds of compute nodes, the above method is very tedious.
would there be any easy/quick ways to register computer nodes with pbs server like the nodes file in torque?Thanks,
Sue mkaro Apr '17 This is one way to accomplish it
while read line; do [ -n "$line" ] && qmgr -c "create node $line"; done <nodefile
where nodefile contains the list of nodes, one per line.
Last modified by Yanli Zhang on Jul 10, 2012
- Access
- Managing Data
- Available Software
- Compiling & Debugging
- Running Jobs
- Queues
- Getting Started & FAQs
qsub Tutorial
Synopsis qsub Synopsis ?
- Synopsis
- What is qsub
- What does qsub do?
- Arguments to control behavior
- Declare the date/time a job becomes eligible for execution
- Defining the working directory path to be used for the job
- Manipulate the output files
- Mail job status at the start and end of a job
- Submit a job to a specific queue
- Submitting a job that is dependent on the output of another
- Submitting multiple jobs in a loop that depend on output of another job
- Opening an interactive shell to the compute node
- Passing an environment variable to your job
- Passing your environment to your job
- Submitting an array job: Managing groups of jobs
qsub
[-a date_time]
[-A account_string]
[-b secs]
[-c checkpoint_options]
n No checkpointing is to be performed.
s Checkpointing is to be performed only when the server executing the job is
shutdown
.
c Checkpointing is to be performed at the default minimum
time
for
the server executing
the job.
c=minutes
Checkpointing is to be performed at an interval of minutes,
which
is the integer number
of minutes of CPU
time
used by the job. This value must be greater than zero.
[-C directive_prefix] [-d path] [-D path] [-e path] [-f] [-h]
[-I ]
[-j
join
]
[-k keep ]
[-l resource_list ]
[-m mail_options]
[-N name]
[-o path]
[-p priority]
[-P user[:group]]
[-q destination]
[-r c]
[-S path_list]
[-t array_request]
[-u user_list]
[-
v
variable_list]
[-V ]
[-W additional_attributes]
[-X]
[-z]
[script]
For detailed information, see this page .
What is qsub?qsub is the command used for job submission to the cluster. It takes several command line arguments and can also use special directives found in the submission scripts or command file. Several of the most widely used arguments are described in detail below.
Useful Information
For more information on qsub do More information on qsub ?
$
man
qsub
Overview
What does qsub do?All of our clusters have a batch server referred to as the cluster management server running on the headnode. This batch server monitors the status of the cluster and controls/monitors the various queues and job lists. Tied into the batch server, a scheduler makes decisions about how a job should be run and its placement in the queue. qsub interfaces into the the batch server and lets it know that there is another job that has requested resources on the cluster. Once a job has been received by the batch server, the scheduler decides the placement and notifies the batch server which in turn notifies qsub (Torque/PBS) whether the job can be run or not. The current status (whether the job was successfully scheduled or not) is then returned to the user. You may use a command file or STDIN as input for qsub.
Environment variables in qsubThe qsub command will pass certain environment variables in the Variable_List attribute of the job. These variables will be available to the job. The value for the following variables will be taken from the environment of the qsub command:
- HOME (the path to your home directory)
- LANG (which language you are using)
- LOGNAME (the name that you logged in with)
- PATH (standard path to excecutables)
- MAIL (location of the users mail file)
- SHELL (command shell, i.e bash,sh,zsh,csh, ect.)
- TZ (time zone)
These values will be assigned to a new name which is the current name prefixed with the string "PBS_O_". For example, the job will have access to an environment variable named PBS_O_HOME which have the value of the variable HOME in the qsub command environment.
In addition to these standard environment variables, there are additional environment variables available to the job.
Arguments to control behavior
- PBS_O_HOST (the name of the host upon which the qsub command is running)
- PBS_SERVER (the hostname of the pbs_server which qsub submits the job to)
- PBS_O_QUEUE (the name of the original queue to which the job was submitted)
- PBS_O_WORKDIR (the absolute path of the current working directory of the qsub command)
- PBS_ARRAYID (each member of a job array is assigned a unique identifier)
- PBS_ENVIRONMENT (set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job)
- PBS_JOBID (the job identifier assigned to the job by the batch system)
- PBS_JOBNAME (the job name supplied by the user)
- PBS_NODEFILE (the name of the file contain the list of nodes assigned to the job)
- PBS_QUEUE (the name of the queue from which the job was executed from)
- PBS_WALLTIME (the walltime requested by the user or default walltime allotted by the scheduler)
As stated before there are several arguments that you can use to get your jobs to behave a specific way. This is not an exhaustive list, but some of the most widely used and many that you will will probably need to accomplish specific tasks.
Declare the date/time a job becomes eligible for executionTo set the date/time which a job becomes eligible to run, use the -a argument. The date/time format is [[[[CC]YY]MM]DD]hhmm[.SS]. If -a is not specified qsub assumes that the job should be run immediately.
Example
To test -a get the current date from the command line and add a couple of minutes to it. It was 10:45 when I checked. Add hhmm to -a and submit a command from STDIN.
Example: Set the date/time which a job becomes eligible to run ?
$
echo
"sleep 30"
| qsub -a 1047
Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?Defining the working directory path to be used for the job
#PBS -a 1047
To define the working directory path to be used for the job -d option can be used. If it is not specified, the default working directory is the home directory.
Example
Example: Define the working directory path to be used for the job ?
$
pwd
/home/manchu
$
cat
dflag.pbs
echo
"Working directory is $PWD"
$ qsub dflag.pbs
5596682.hpc0.
local
$
cat
dflag.pbs.o5596682
Working directory is /home/manchu
$
mv
dflag.pbs random_pbs/
$ qsub -d /home/manchu/random_pbs/
/home/manchu/random_pbs/dflag.pbs
5596703.hpc0.
local
$
cat
random_ps/dflag.pbs.o5596703
Working directory is /home/manchu/random_pbs
$ qsub /home/manchu/random_pbs/dflag.pbs
5596704.hpc0.
local
$
cat
dflag.pbs.o5596704
Working directory is /home/manchu
Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?
#PBS -d /home/manchu/random_pbs
Manipulate the output files
As a default all jobs will print all stdout (standard output) messages to a file with the name in the format <job_name>.o<job_id> and all stderr (standard error) messages will be sent to a file named <job_name>.e<job_id>. These files will be copied to your working directory as soon as the job starts. To rename the file or specify a different location for the standard output and error files, use the -o for standard output and -e for the standard error file. You can also combine the output using -j.
Example
Create a simple submission file: ?Create a simple submission file: ?
$
cat
sleep
.pbs
#!/bin/sh
for
i
in
{1..60} ;
do
echo
$i
sleep
1
done
$ qsub -o
sleep
.log
sleep
.pbs
Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?Submit your job with the standard error file renamed: ?
#PBS -o sleep.log
$ qsub -e
sleep
.log
sleep
.pbs
Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?Combine them using the name sleep.log: ?
#PBS -e sleep.log
$ qsub -o
sleep
.log -j oe
.pbs
Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?
#PBS -o sleep.log
#PBS -j oe
Warning
The order of two letters next to flag -j is important. It should always start with the letter that's been already defined before, in this case 'o'. Place the joined output in another location other than the working directory: ?Mail job status at the start and end of a job
$ qsub -o $HOME/tutorials/logs/sleep.log -j oe
sleep
.pbs
The mailing options are set using the -m and -M arguments. The -m argument sets the conditions under which the batch server will send a mail message about the job and -M will define the users that emails will be sent to (multiple users can be specified in a list seperated by commas). The conditions for the -m argument include:
- a: mail is sent when the job is aborted.
- b: mail is sent when the job begins.
- e: main is sent when the job ends.
Example
Using the sleep.pbs script created earlier, submit a job that emails you for all conditions: ?$ qsub -m abe -M [email protected]
sleep
.pbs
Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?Submit a job to a specific queue
#PBS -m abe
#PBs -M [email protected]
You can select a queue based on walltime needed for your job. Use the 'qstat -q' command to see the maximum job times for each queue.
Example
Submit a job to the bigmem queue: ?
$ qsub -q bigmem
sleep
.pbs
Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?Submitting a job that is dependent on the output of another
#PBS -q bigmem
Often you will have jobs that will be dependent on another for output in order to run. To add a dependency, we will need to use the -W (additional attributes) with the depend option. We will be using the afterok rule, but there are several other rules that may be useful. (man qsub)
Example
To illustrate the ability to hold execution of a specific job until another has completed, we will write two submission scripts. The first will create a list of random numbers. The second will sort those numbers. Since the second script will depend on the list that is created we will need to hold execution until the first has finished.
random.pbs ?sort.pbs ?
$
cat
random.pbs
#!/bin/sh
cd
$HOME
sleep
120
for
i
in
{1..100};
do
echo
$RANDOM >> rand.list
done
$
cat
sort
.pbs
#!/bin/sh
cd
$HOME
sort
-n rand.list > sorted.list
sleep
30
Once the file are created, lets see what happens when they are submitted at the same time:
Submit at the same time ?
$ qsub random.pbs ; qsub
sort
.pbs
5594670.hpc0.
local
5594671.hpc0.
local
$
ls
random.pbs sorted.list
sort
.pbs
sort
.pbs.e5594671
sort
.pbs.o5594671
$
cat
sort
.pbs.e5594671
sort
:
open
failed: rand.list: No such
file
or directory
Since they both ran at the same time, the sort script failed because the file rand.list had not been created yet. Now submit them with the dependencies added.
Submit them with the dependencies added ?
$ qsub random.pbs
5594674.hpc0.
local
$ qsub -W depend=afterok:5594674.hpc0.
local
sort
.pbs
5594675.hpc0.
local
$ qstat -u $USER
hpc0.
local
:
Req
'd Req'
d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5594674.hpc0.loc manchu ser2 random.pbs 18029 1 1 -- 48:00 R 00:00
5594675.hpc0.loc manchu ser2
sort
.pbs 1 1 -- 48:00 H --
We now see that the sort.pbs job is in a hold state. And once the dependent job completes the sort job runs and we see:
Job status with the dependencies added ?
$ qstat -u $USER
hpc0.
local
:
Req
'd Req'
d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5594675.hpc0.loc manchu ser2
sort
.pbs 18165 1 1 -- 48:00 R --
Useful Information
Submitting multiple jobs in a loop that depend on output of another job
- afterany:jobid[:jobid...] implies that job may be scheduled for execution after jobs jobid have terminated, with or without errors.
- afterok:jobid[:jobid...] implies that job may be scheduled for execution only after jobs jobid have terminated with no errors.
- afternotok:jobid[:jobid...] implies that job may be scheduled for execution only after jobs jobid have terminated with errors.
This example show how to submit multiple jobs in a loop where each job depends on output of job submitted before it.
Example
Let's say we need to write numbers from 0 to 999999 in order onto a file output.txt. We can do 10 separate runs to achieve this, where each run has a separate pbs script writing 100,000 numbers to output file. Let's see what happens if we submit all 10 jobs at the same time.
The script below creates required pbs scripts for all the runs.
Create PBS Scripts for all the runs ?Change permission to make it an executable ?
$
cat
creation.sh
#!/bin/bash
for
i
in
{0..9}
do
cat
> pbs.script.$i << EOF
#!/bin/bash
#PBS -l nodes=1:ppn=1,walltime=600
cd
\$PBS_O_WORKDIR
for
((i=$((i*100000)); i<$(((i+1)*100000)); i++))
{
echo
"\$i"
>> output.txt
}
exit
0;
EOF
done
Run the Script ?
$
chmod
u+x creation.sh
List of Created PBS Scripts ?
$ ./creation.sh
PBS Script ?
$
ls
-l pbs.script.*
-rw-r--r-- 1 manchu wheel 134 Oct 27 16:32 pbs.script.0
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.1
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.2
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.3
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.4
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.5
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.6
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.7
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.8
-rw-r--r-- 1 manchu wheel 140 Oct 27 16:32 pbs.script.9
Submit Multiple Jobs at a Time ?
$
cat
pbs.script.0
#!/bin/bash
#PBS -l nodes=1:ppn=1,walltime=600
cd
$PBS_O_WORKDIR
for
((i=0; i<100000; i++))
{
echo
"$i"
>> output.txt
}
exit
0;
output.txt ?
$
for
i
in
{0..9};
do
qsub pbs.script.$i ;
done
5633531.hpc0.
local
5633532.hpc0.
local
5633533.hpc0.
local
5633534.hpc0.
local
5633535.hpc0.
local
5633536.hpc0.
local
5633537.hpc0.
local
5633538.hpc0.
local
5633539.hpc0.
local
5633540.hpc0.
local
$
$
tail
output.txt
699990
699991
699992
699993
699994
699995
699996
699997
699998
699999
-
bash
-3.1$
grep
-n 999999 $_
210510:999999
$
This clearly shows the nubmers are in no order like we wanted. This is because all the runs wrote to the same file at the same time, which is not what we wanted.
Let's submit jobs using qsub dependency feature. This can be achieved with a simple script shown below.
Simple Script to Submit Multiple Dependent Jobs ?Let's make it an executable ?
$
cat
dependency.pbs
#!/bin/bash
job=`qsub pbs.script.0`
for
i
in
{1..9}
do
job_next=`qsub -W depend=afterok:$job pbs.script.$i`
job=$job_next
done
Submit dependent jobs by running the script ?
$
chmod
u+x dependency.pbs
Output after first run ?
$ ./dependency.pbs
$ qstat -u manchu
hpc0.
local
:
Req
'd Req'
d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5633541.hpc0.loc manchu ser2 pbs.script.0 28646 1 1 -- 00:10 R --
5633542.hpc0.loc manchu ser2 pbs.script.1 -- 1 1 -- 00:10 H --
5633543.hpc0.loc manchu ser2 pbs.script.2 -- 1 1 -- 00:10 H --
5633544.hpc0.loc manchu ser2 pbs.script.3 -- 1 1 -- 00:10 H --
5633545.hpc0.loc manchu ser2 pbs.script.4 -- 1 1 -- 00:10 H --
5633546.hpc0.loc manchu ser2 pbs.script.5 -- 1 1 -- 00:10 H --
5633547.hpc0.loc manchu ser2 pbs.script.6 -- 1 1 -- 00:10 H --
5633548.hpc0.loc manchu ser2 pbs.script.7 -- 1 1 -- 00:10 H --
5633549.hpc0.loc manchu ser2 pbs.script.8 -- 1 1 -- 00:10 H --
5633550.hpc0.loc manchu ser2 pbs.script.9 -- 1 1 -- 00:10 H --
$
Output after final run ?
$
tail
output.txt
99990
99991
99992
99993
99994
99995
99996
99997
99998
99999
$
$
tail
output.txt
999990
999991
999992
999993
999994
999995
999996
999997
999998
999999
$
grep
-n 100000 output.txt
100001:100000
$
grep
-n 999999 output.txt
1000000:999999
$
This shows that numbers are written in order to output.txt. Which in turn shows that jobs ran one after successful completion of another.
Opening an interactive shell to the compute nodeTo open an interactive shell to a compute node, use the -I argument. This is often used in conjunction with the -X (X11 Forwarding) and the -V (pass all of the users environment)
Example
Open an interactive shell to a compute node ?Passing an environment variable to your job
$ qsub -I
You can pass user defined environment variables to a job by using the -v argument.
Example
To test this we will use a simple script that prints out an environment variable.
Passing an environment variable ?
$
cat
variable.pbs
#!/bin/sh
if
[
"x"
==
"x$MYVAR"
] ;
then
echo
"Variable is not set"
else
echo
"Variable says: $MYVAR"
fi
Next use qsub without the -v and check your standard out file
qsub without -v ?
$ qsub variable.pbs
5596675.hpc0.
local
$
cat
variable.pbs.o5596675
Variable is not
set
Then use the -v to set the variable
qsub with -v ?
$ qsub -
v
MYVAR=
"hello"
variable.pbs
5596676.hpc0.
local
$
cat
variable.pbs.o5596676
Variable says: hello
Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?
#PBS -v MYVAR="hello"
Useful Information
Multiple user defined environment variables can be passed to a job at a time.
Passing Multiple Variables ?Passing your environment to your job
$
cat
variable.pbs
#!/bin/sh
echo
"$VAR1 $VAR2 $VAR3"
> output.txt
$
$ qsub -
v
VAR1=
"hello"
,VAR2=
"Sreedhar"
,VAR3=
"How are you?"
variable.pbs
5627200.hpc0.
local
$
cat
output.txt
hello Sreedhar How are you?
$
You may declare that all of your environment variables are passed to the job by using the -V argument in qsub.
Example
Use qsub to perform an interactive login to one of the nodes:
Passing your environment: qsub with -V ?
$ qsub -I -V
Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?
#PBS -V
Once the shell is opened, use the env command to see that your environment was passed to the job correctly. You should still have access to all your modules that you loaded previously.
Submitting an array job: Managing groups of jobs .hostname would have PBS_ARRAYID set to 0. This will allow you to create job arrays where each job in the array will perform slightly different actions based on the value of this variable, such as performing the same tasks on different input files. One other difference in the environment between jobs in the same array is the value of the PBS_JOBNAME variable.Example
First we need to create data to be read. Note that in a real application, this could be data, configuration setting or anything that your program needs to run.
Create Input Data
To create input data, run this simple one-liner:
Creating input data ?
$
for
i
in
{0..4};
do
echo
"Input data file for an array $i"
> input.$i ;
done
$
ls
input.*
input.0 input.1 input.2 input.3 input.4
$
cat
input.0
Input data
file
for
an array 0
Submission Script
Submission Script: array.pbs ?
$
cat
array.pbs
#!/bin/sh
#PBS -l nodes=1:ppn=1,walltime=5:00
#PBS -N arraytest
cd
${PBS_O_WORKDIR}
# Take me to the directory where I launched qsub
# This part of the script handles the data. In a real world situation you will probably
# be using an existing application.
cat
input.${PBS_ARRAYID} > output.${PBS_ARRAYID}
echo
"Job Name is ${PBS_JOBNAME}"
>> output.${PBS_ARRAYID}
sleep
30
exit
0;
Submit & Monitor
Instead of running five qsub commands, we can simply enter:
Submitting and Monitoring Array of Jobs ?
$ qsub -t 0-4 array.pbs
5534017[].hpc0.
local
qstat
qstat ?
$ qstat -u $USER
hpc0.
local
:
Req
'd Req'
d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534017[].hpc0.l sm4082 ser2 arraytest 1 1 -- 00:05 R --
$ qstat -t -u $USER
hpc0.
local
:
Req
'd Req'
d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534017[0].hpc0. sm4082 ser2 arraytest-0 12017 1 1 -- 00:05 R --
5534017[1].hpc0. sm4082 ser2 arraytest-1 12050 1 1 -- 00:05 R --
5534017[2].hpc0. sm4082 ser2 arraytest-2 12084 1 1 -- 00:05 R --
5534017[3].hpc0. sm4082 ser2 arraytest-3 12117 1 1 -- 00:05 R --
5534017[4].hpc0. sm4082 ser2 arraytest-4 12150 1 1 -- 00:05 R --
$
ls
output.*
output.0 output.1 output.2 output.3 output.4
$
cat
output.0
Input data
file
for
an array 0
Job Name is arraytest-0
pbstop
pbstop by default doesn't show all the jobs in the array. Instead, it shows a single job in just one line in the job information. Pressing 'A' shows all the jobs in the array. Same can be achieved by giving the command line option '-A'. This option along with '-u <NetID>' shows all of your jobs including array as well as normal jobs.
pbstop ?
$ pbstop -A -u $USER
Note
Typing 'A' expands/collapses array job representation.Comma delimited lists
The -t option of qsub also accepts comma delimited lists of job IDs so you are free to choose how to index the members of your job array. For example:
Comma delimited lists ?
$
rm
output.*
$ qsub -t 2,5,7-9 array.pbs
5534018[].hpc0.
local
$ qstat -u $USER
hpc0.
local
:
Req
'd Req'
d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534018[].hpc0.l sm4082 ser2 arraytest 1 1 -- 00:05 Q --
$ qstat -t -u $USER
hpc0.
local
:
Req
'd Req'
d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534018[2].hpc0. sm4082 ser2 arraytest-2 12319 1 1 -- 00:05 R --
5534018[5].hpc0. sm4082 ser2 arraytest-5 12353 1 1 -- 00:05 R --
5534018[7].hpc0. sm4082 ser2 arraytest-7 12386 1 1 -- 00:05 R --
5534018[8].hpc0. sm4082 ser2 arraytest-8 12419 1 1 -- 00:05 R --
5534018[9].hpc0. sm4082 ser2 arraytest-9 12452 1 1 -- 00:05 R --
$
ls
output.*
output.2 output.5 output.7 output.8 output.9
$
cat
output.2
Input data
file
for
an array 2
Job Name is arraytest-2
A more general for loop - Arrays with step size
By default, PBS doesn't allow array jobs with step size. qsub -t 0-10 <pbs.script> increments PBS_ARRAYID in 1. To submit jobs in steps of a certain size, let's say step size of 3 starting at 0 and ending at 10, one has to do
?
qsub -t 0,3,6,9 <pbs.script>
To make it easy for users we have put a wrapper which takes starting point, ending point and step size as arguments for -t flag. This avoids default necessity that PBS_ARRAYID increment be 1. The above request can be accomplished with (which happens behind the scenes with the help of wrapper)
?
qsub -t 0-10:3 <pbs.script>
Here, 0 is the starting point, 10 is the ending point and 3 is the step size. It is not necessary that starting point must be 0. It can be any number. Incidentally, in a situation in which the upper-bound is not equal to the lower-bound plus an integer-multiple of the increment, for example
?
qsub -t 0-10:3 <pbs.script>
wrapper automatically changes the upper bound as shown in the example below.
Arrays with step size ?
[sm4082@login-0-0 ~]$ qsub -t 0-10:3 array.pbs
6390152[].hpc0.
local
[sm4082@login-0-0 ~]$ qstat -u $USER
hpc0.
local
:
Req
'd Req'
d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
6390152[].hpc0.l sm4082 ser2 arraytest -- 1 1 -- 00:05 Q --
[sm4082@login-0-0 ~]$ qstat -t -u $USER
hpc0.
local
:
Req
'd Req'
d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
6390152[0].hpc0. sm4082 ser2 arraytest-0 25585 1 1 -- 00:05 R --
6390152[3].hpc0. sm4082 ser2 arraytest-3 28227 1 1 -- 00:05 R --
6390152[6].hpc0. sm4082 ser2 arraytest-6 8515 1 1 -- 00:05 R 00:00
6390152[9].hpc0. sm4082 ser2 arraytest-9 505 1 1 -- 00:05 R --
[sm4082@login-0-0 ~]$
ls
output.*
output.0 output.3 output.6 output.9
[sm4082@login-0-0 ~]$
cat
output.9
Input data
file
for
an array 9
Job Name is arraytest-9
[sm4082@login-0-0 ~]$
Note
By default, PBS doesn't support arrays with step size. On our clusters, it's been achieved with a wrapper. This option might not be there on clusters at other organizations/schools that use PBS/Torque.Note
If you're trying to submit jobs through ssh to login nodes from your pbs scripts with statement such as ?
ssh
login-0-0
"cd ${PBS_O_WORKDIR};`which qsub` -t 0-10:3 <pbs.script>"
arrays with step size wouldn't work unless you either add
?
shopt
-s expand_aliases
to your pbs script that's in bash or add this to your .bashrc in your home directory. Adding this makes alias for qsub come into effect there by making wrapper act on command line options to qsub (For that matter this brings any alias to effect for commands executed via SSH).
If you have
?
#PBS -t 0-10:3
in your pbs script you don't need to add this either to your pbs script or to your .bashrc in your home directory.
A List of Input Files/Pulling data from the ith line of a file
Suppose we have a list of 1000 input files, rather than input files explicitly indexed by suffix, in a file file_list.text one per line:
A List of Input Files/Pulling data from the ith line of a file ?
[sm4082@login-0-2 ~]$
cat
array.list
#!/bin/bash
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=1,walltime=1:00:00
INPUT_FILE=`
awk
"NR==$PBS_ARRAYID"
file_list.text`
#
# ...or use sed:
# sed -n -e "${PBS_ARRAYID}p" file_list.text
#
# ...or use head/tail
# $(cat file_list.text | head -n $PBS_ARRAYID | tail -n 1)
./executable
< $INPUT_FILE
In this example, the '-n' option suppresses all output except that which is explicitly printed (on the line equal to PBS_ARRAYID).
?
qsub -t 1-1000 array.list
Let's say you have a list of 1000 numbers in a file, one number per line. For example, the numbers could be random number seeds for a simulation. For each task in an array job, you want to get the ith line from the file, where i equals PBS_ARRAYID, and use that value as the seed. This is accomplished by using the Unix head and tail commands or awk or sed just like above.
A List of Input Files/Pulling data from the ith line of a file ??
[sm4082@login-0-2 ~]$
cat
array.seed
#!/bin/bash
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=1,walltime=1:00:00
SEEDFILE=~/data/seeds
SEED=$(
cat
$SEEDFILE |
head
-n $PBS_ARRAYID |
tail
-n 1)
~/programs/executable
$SEED > ~/results/output.$PBS_ARRAYID
qsub -t 1-1000 array.seedYou can use this trick for all sorts of things. For example, if your jobs all use the same program, but with very different command-line options, you can list all the options in the file, one set per line, and the exercise is basically the same as the above, and you only have two files to handle (or 3, if you have a perl script generate the file of command-lines).
Delete
Delete all jobs in array
We can delete all the jobs in array with a single command.
Deleting array of jobs ?
$ qsub -t 2-5 array.pbs
5534020[].hpc0.
local
$ qstat -u $USER
hpc0.
local
:
Req
'd Req'
d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534020[].hpc0.l sm4082 ser2 arraytest 1 1 -- 00:05 R --
$ qdel 5534020[]
$ qstat -u $USER
$
Delete a single job in array
Delete a single job in array, e.g. number 4,5 and 7
Deleting a single job in array ?
$ qsub -t 0-8 array.pbs
5534021[].hpc0.
local
$ qstat -u $USER
hpc0.
local
:
Req
'd Req'
d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------- -- ---- ---------- ---- ---- -- ----- --- - ---
5534021[].hpc0.l sm4082 ser2 arraytest 1 1 -- 00:05 Q --
$ qstat -t -u $USER
hpc0.
local
:
Req
'd Req'
d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534021[0].hpc0. sm4082 ser2 arraytest-0 26618 1 1 -- 00:05 R --
5534021[1].hpc0. sm4082 ser2 arraytest-1 14271 1 1 -- 00:05 R --
5534021[2].hpc0. sm4082 ser2 arraytest-2 14304 1 1 -- 00:05 R --
5534021[3].hpc0. sm4082 ser2 arraytest-3 14721 1 1 -- 00:05 R --
5534021[4].hpc0. sm4082 ser2 arraytest-4 14754 1 1 -- 00:05 R --
5534021[5].hpc0. sm4082 ser2 arraytest-5 14787 1 1 -- 00:05 R --
5534021[6].hpc0. sm4082 ser2 arraytest-6 10711 1 1 -- 00:05 R --
5534021[7].hpc0. sm4082 ser2 arraytest-7 10744 1 1 -- 00:05 R --
5534021[8].hpc0. sm4082 ser2 arraytest-8 9711 1 1 -- 00:05 R --
$ qdel 5534021[4]
$ qdel 5534021[5]
$ qdel 5534021[7]
$ qstat -t -u $USER
hpc0.
local
:
Req
'd Req'
d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534021[0].hpc0. sm4082 ser2 arraytest-0 26618 1 1 -- 00:05 R --
5534021[1].hpc0. sm4082 ser2 arraytest-1 14271 1 1 -- 00:05 R --
5534021[2].hpc0. sm4082 ser2 arraytest-2 14304 1 1 -- 00:05 R --
5534021[3].hpc0. sm4082 ser2 arraytest-3 14721 1 1 -- 00:05 R --
5534021[6].hpc0. sm4082 ser2 arraytest-6 10711 1 1 -- 00:05 R --
5534021[8].hpc0. sm4082 ser2 arraytest-8 9711 1 1 -- 00:05 R --
$ qstat -t -u $USER
$
January 18, 2008
To address this, Altair Engineering has developed a unified pay-per-use licensing model across their entire product line - HyperWorks (CAE applications), PBS GridWorks (workload management) and HiQube (business intelligence analytics). The model is designed to circumvent some of the limitations of hardware-based licensing. Essentially what the company sells are license tokens that are only drawn when an application is running. Tokens are dispensed from a central license server when an application is executing and returned to the token pool when the application is finished. The model allows these application licenses to be shared across a system, a LAN, or even a wide area network.
The idea originated with the company's original HyperWorks product line, where the token-based scheme was first introduced. Last summer, Altair converted their PBS GridWorks licensing model to the HyperWorks model. Prior to that, the GridWorks products were employing a more traditional licensing mechanism that treated each type of hardware platform differently. With the introduction of the PBS Professional 9.0, a single "license" uses three GridWorks tokens to run a single job on a single processor core. In the U.S., each GridWorks token costs $4.50, which works out to $13.50 per license per year, or about 1/4 the price of the previous licensing scheme.
By isolating the software licensing requirements from the hardware platform, users are able to use the hardware more flexibly. This is an especially important distinction when multicore processors and utility computing environments are involved, since it provides a more equitable model for sharing hardware resources with other software running concurrently on the same platforms.
While license tokens are the common currency across all Altair products, the denomination of tokens used for GridWorks products is different from that used for HyperWorks products: 1 HyperWorks token unit = 100 GridWorks token units. Also, different Altair applications may draw different numbers of tokens. For example, HyperMesh (a CAE HyperWorks application), draws 21 HyperWorks tokens independent of the core count, while PBS Professional uses three GridWorks tokens per job per core. If the job needs 4 cores, GridWorks would draw 12 tokens from the pool. Once execution completes, those tokens are available for other applications. So the 21 HyperWorks tokens used to run a HyperMesh application during the day could be used to run 175 simultaneous PBS GridWorks jobs (using four cores each) at night.
For codes that run 24/7, such as some solver applications (e.g., RADIOSS used at Ford for crash simulations), they would use dedicated licensing, where the tokens were statically locked to specific hardware. In this case, the tokens could be returned to the pool, at least temporarily, if for example, the hardware is taken down for maintenance.
According to Michael Humphrey, VP of the PBS GridWorks product line, the whole model is especially valuable if a company can maintain a continuous computing level. "The biggest companies - the Boeings, the Fords, the GMs of the world - get the most value out of their HyperWorks units because they use the most applications and they share them globally," explains Humphrey.
Google matched content |
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March, 20, 2020