|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
Table of contents
The SGE batch scheduling system dispatches jobs according to the availability of resources, necessary for their successful completion. Therefore an unspecified resource or a resource specified unnecessarily high, might lead to an avoidable delay in the scheduling of your jobs. It is therefore advisable to carefully specify the required resources for each of your jobs, in order to optimally utilize our HPC systems.
The following sections will provide you with a brief introduction on how to obtain information about the overall and the actually available resources on the cluster.
To get an abbreviated overview of the currently available resources on each of the cluster's execution hosts, use the command qhost. For a complete representation of all available host-specific resource attributes execute
qhost -F
For more information about standard resource attributes consult the complex man page.
To obtain a list of the available queues on the cluster execute
qconf -sql
Use the following command to get detailed resource information for a specific queue:
qconf -sq queue
There are limitations on the number of available slots per user, as well as the maximum number of
slots per job. As a rule of thumb, a single job may occupy approximately half of the cluster and each
user may fill the cluster up to about 75% with his jobs.
Depending on the specific HPC system there are also transient limitations on the number of available
slots per user, which come into effect only at times of high cluster load (and consequently increased
competition for the available resources). Execute
qquota
to see these limits and your actual resource consumption (there's nothing displayed, if you have
no running jobs on the cluster).
Note: The transient limits are usually not enforced, but they may cause problems for
interactive sessions (see the section
Submitting
interactive jobs on the subject) or for short running jobs. Please contact the ZID cluster administration
if you experience problems or need more resources for the progress of an urgent project.
The subsequent sections will provide sample cases of how to specify the resource requirements for your jobs.
For an optimal scheduling of your jobs, i.e. to allow your job to run as soon as the necessary resources are available, it is advisable to specify the job runtime as closely as possible (in this way exploiting the so called backfilling possibilities of SGE in the case of ongoing resource reservations.) Runtime limits are specified with the h_rt resource attribute. For example, submit your job with the following command line, if it will for sure not take more than 4 hours and 30 minutes (wallclock time) for its completion:
qsub -l h_rt=4:30:00 job_script.sh
Note: Do not use runtime limits too aggressively or if you are unsure about the actual duration of your jobs, as the jobs will be terminated as soon as the specified runtime limits are exceeded. If no runtime limits are provided, the default runtime limits of the system are taken into account, which can be taken from the queue specific resource information.
In order to avoid job failures due to memory oversubscription, the maximum available amount of memory
per process is by default limited to a cluster specific value (issue the command qconf -sc | grep
"default\|h_vmem" to find out the default value).
If your job requires less than 1.5 GByte of memory per process, you can explicitly specify this by setting
the SGE's resource parameter h_vmem as in the following example:
qsub -l h_vmem=1500M -pe openmpi-fillup 4 job_script.sh
This will reserve a total of 6 GByte of memory for your job, potentially distributed over several
hosts. Memory values are specified in bytes by positive decimal (1500), octal (02734)
or hexadecimal (0x5dc) integers. For convenience the multipliers k(1000), K(1024), m, M, g
and G can be appended.
Note: If you know that your memory requirements lie below the default limit, please
do specify the lower value.
You can alter (most of the) the resource requirements of pending jobs at any time with SGE's qalter command. For example, to change the parallel environment, including the number of desired slots of a waiting parallel job, enter:
qalter -pe openmpi-fillup 8-16 YOUR_JOB_ID
|
||||
Bulletin | Latest | Past week | Past month |
|
Limiting User Greed Resource Quotas
Integrated over time, fair-share scheduling should ensure that each user gets their appropriate CPU usage (provided they submit sufficient jobs). Over and above this, we want to prevent any one user dominating any host-group at any given time.
- Prevent any one user dominating the serial queue:
{ name C6100-STD-serial.q.rqs description NONE enabled TRUE limit users {*} queues C6100-STD-serial.q to slots=48 # # ..."users {*}" means "each and every user" while "users *" would # mean "all users together"... # }- Limit total slot-count for each user on the main queues:
{ name CSF.q.rqs description NONE enabled TRUE limit users {*} queues R815.q,C6100-STD.q,C6100-STD-ib.q, \ C6100-FAT.q,C6100-VFAT.q,R410-twoday.q to slots=256 }- Discourage interactive work:
{ name C6100-STD-interactive.q.rqs description NONE enabled TRUE limit users {*} queues C6100-STD-interactive.q to slots=4 }- Prevent any one user grabbing more than half of this one:
{ name R815.q.rqs description NONE enabled TRUE limit users {*} queues R815.q to slots=256 }- Since we have so few M610x-hosted GPGPUs, limit to one per user:
{ name M610x.rqs description NONE enabled TRUE limit users {*} hosts @M610x-GPU to slots=1 }
- Limit total usage (sum of all users) on some queues:
{ name CSF-Queues-total-users.rqs description NONE enabled TRUE limit users * queues C6100-STD-serial.q to slots=144 limit users * queues R410-twoday-interactive.q to slots=12 limit users * queues R410-short-interactive.q to slots=12 }- Multiple queues on some hosts, but don't want to overload them:
{ name CSF-Hosts-slots.rqs description NONE enabled TRUE limit hosts {@C6100-STD} to slots=12 limit hosts {@C6100-FAT} to slots=12 limit hosts {@C6100-STD-ib} to slots=12 limit hosts {@C6100-STD-test} to slots=12 limit hosts {@R815} to slots=32 limit hosts {@R410-twoday} to slots=12 limit hosts {@R410-short} to slots=12 }- Don't want any individual to hog the precious IB-connected Intel nodes:
{ name CSF-PEs-each-user.rqs description NONE enabled TRUE limit users {*} pes orte-12-ib.pe to slots=96 }- Limit MACE use of the non-IB Intel nodes as they contributed only AMD:
{ name CSF-Usersets.rqs description NONE enabled TRUE limit users @mace01.userset queues C6100-STD.q to slots=36- Limit each user's greed on each (well, most) queues:
{ name CSF-Queues-each-user.rqs description NONE enabled TRUE limit users {*} queues C6100-FAT.q to slots=36 limit users {*} queues C6100-STD-serial.q to slots=36 limit users {*} queues C6100-STD-interactive.q to slots=4 limit users {*} queues R815.q to slots=256 limit users {*} queues R815.q,C6100-STD.q,C6100-STD-ib.q, \ C6100-FAT.q,C6100-VFAT.q,R410-twoday.q to slots=256 limit users {*} queues M610x-GPU.q,M610x-GPU-interactive.q to slots=3 }- Limit total usage (sum of users) on some PE/Queue combos:
{ name CSF-PEs-total-users.rqs description NONE enabled TRUE ## limit users * pes orte.pe,orte-12.pe to slots=550 limit users * pes orte.pe,orte-12.pe queues C6100-STD.q to slots=96 # # ...above, changed one t'other... # limit users * pes smp.pe queues C6100-STD.q to slots=440 ## limit users * pes fluent-smp.pe queues C6100-STD.q to slots=48 # # ...above, replaced by mace.userset quota... }
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March, 12, 2019