|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
|
Sun SGE 6.2u5 was the last open source version produced by Sun before Oracle acquisition. This is a classic software and probably the most widely used version of Grid Engine. Installation of classic Sun version is essentially identical to installation of Oracle Grid Engine:
It's difficult to find the original Sun 6.2u5 distribution on the Internet. The only place I know is Open Grid Scheduler - Browse Files at SourceForge.net which contains SGE6.2u5p2
If you need compiled binaries for Red Hat your best bet is to use Son of Grid Engine which is pretty close to classic in spirit and installation details (with important bug fixes and enhancements) and works on RHEL 6.5 pretty well.
Some universities like rutgers have tar files available too. See for example SGE installation from Rutgers for Debian.
Debian packages of SGE 6.2u5 are maintained, but do not use Sun installer. They represent pretty radical deviation from traditional way to install grid Engine. See Debian Package Tracking System - gridengine.
Looks like the initial packager has had too much zeal.
For example here is how file list of package gridengine-exec for amd64 architecture looks like
/etc/init.d/gridengine-exec /usr/lib/gridengine/qrsh_starter /usr/lib/gridengine/sge_coshepherd /usr/lib/gridengine/sge_execd /usr/lib/gridengine/sge_shepherd /usr/sbin/sge_coshepherd /usr/sbin/sge_execd /usr/sbin/sge_shepherd /usr/share/doc/gridengine-exec/NEWS.Debian.gz /usr/share/doc/gridengine-exec/changelog.Debian.gz /usr/share/doc/gridengine-exec/copyright /usr/share/man/man8/sge_execd.8.gz /usr/share/man/man8/sge_shepherd.8.gz
For execution nodes they have special rpm gridengine-client. I do not understand why utilities are duplicated...
/usr/bin/qacct /usr/bin/qalter /usr/bin/qconf /usr/bin/qdel /usr/bin/qhold /usr/bin/qhost /usr/bin/qlogin /usr/bin/qmod /usr/bin/qping /usr/bin/qquota /usr/bin/qrdel /usr/bin/qresub /usr/bin/qrls /usr/bin/qrsh /usr/bin/qrstat /usr/bin/qrsub /usr/bin/qselect /usr/bin/qsh /usr/bin/qstat /usr/bin/qsub /usr/lib/gridengine/qacct /usr/lib/gridengine/qalter /usr/lib/gridengine/qconf /usr/lib/gridengine/qdel /usr/lib/gridengine/qhold /usr/lib/gridengine/qhost /usr/lib/gridengine/qlogin /usr/lib/gridengine/qmod /usr/lib/gridengine/qping /usr/lib/gridengine/qquota /usr/lib/gridengine/qrdel /usr/lib/gridengine/qresub /usr/lib/gridengine/qrls /usr/lib/gridengine/qrsh /usr/lib/gridengine/qrstat /usr/lib/gridengine/qrsub /usr/lib/gridengine/qselect /usr/lib/gridengine/qsh /usr/lib/gridengine/qstat /usr/lib/gridengine/qsub /usr/share/doc/gridengine-client/NEWS.Debian.gz /usr/share/doc/gridengine-client/changelog.Debian.gz /usr/share/doc/gridengine-client/copyright /usr/share/doc/gridengine-client/examples /usr/share/man/man1/qacct.1.gz /usr/share/man/man1/qalter.1.gz /usr/share/man/man1/qconf.1.gz /usr/share/man/man1/qdel.1.gz /usr/share/man/man1/qhold.1.gz /usr/share/man/man1/qhost.1.gz /usr/share/man/man1/qlogin.1.gz /usr/share/man/man1/qmod.1.gz /usr/share/man/man1/qping.1.gz /usr/share/man/man1/qquota.1.gz /usr/share/man/man1/qrdel.1.gz /usr/share/man/man1/qresub.1.gz /usr/share/man/man1/qrls.1.gz /usr/share/man/man1/qrsh.1.gz /usr/share/man/man1/qrstat.1.gz /usr/share/man/man1/qrsub.1.gz /usr/share/man/man1/qselect.1.gz /usr/share/man/man1/qsh.1.gz /usr/share/man/man1/qstat.1.gz /usr/share/man/man1/qsub.1.gz /usr/share/man/man1/sge_submit.1.gz
For RHEL CentOs or Fedora SGE6.2u5 RPM's can be found in EPEL repository. They are close in spitit to Debian distribution. The problem is that they are not working and installer is seriously buggy ;-). In any case for those who want to try them themselves here are links:
gridengine-6.2u5-10.el6.4.i686.rpm
2012-04-17 21:59 15M
gridengine-6.2u5-10.el6.4.x86_64.rpm
2012-04-17 21:59 15M
gridengine-devel-6.2u5-10.el6.4.i686.rpm
2012-04-17 21:59 74K
gridengine-devel-6.2u5-10.el6.4.x86_64.rpm
2012-04-17 21:59 74K
gridengine-execd-6.2u5-10.el6.4.x86_64.rpm
2012-04-17 21:59 1.3M
gridengine-qmaster-6.2u5-10.el6.4.x86_64.rpm
2012-04-17 21:59 1.5M
gridengine-qmon-6.2u5-10.el6.4.x86_64.rpm
2012-04-17 21:59 1.4M
gridengine rpm build for : RedHat EL 6. For other distributions click gridengine.
Name : gridengine Version : 6.2u5 Vendor : Fedora Project Release : 10.el6.4 Date : 2012-04-17 20:58:35 Group : Applications/System Source RPM : gridengine-6.2u5-10.el6.4.src.rpm Size : 42.94 MB Packager : Fedora Project Summary : Grid Engine - Distributed Computing Management software Description :
In a typical network that does not have distributed resource management
software, workstations and servers are used from 5% to 20% of the time.
Even technical servers are generally less than fully utilized. This
means that there are a lot of cycles that can be used productively if
only users know where they are, can capture them, and put them to work.
Grid Engine finds a pool of idle resources and harnesses it
productively, so an organization gets as much as five to ten times the
usable power out of systems on the network. That can increase utilization
to as much as 98%.
Grid Engine software aggregates available compute resources and
delivers compute power as a network service.
These are the local files shared by both the qmaster and execd
daemons. You must install this package in order to use any one of them.
RPM found in directory: /mirror/download.fedora.redhat.com/pub/fedora/epel/6/x86_64
Content of RPM Changelog Provides Requires
Download
ftp.univie.ac.at gridengine-6.2u5-10.el6.4.x86_64.rpm ftp.muug.mb.ca gridengine-6.2u5-10.el6.4.x86_64.rpm mirror.switch.ch gridengine-6.2u5-10.el6.4.x86_64.rpm ftp.pbone.net gridengine-6.2u5-10.el6.4.x86_64.rpm ftp.icm.edu.pl gridengine-6.2u5-10.el6.4.x86_64.rpm ftp.sunet.se gridengine-6.2u5-10.el6.4.x86_64.rpm ftp.is.co.za gridengine-6.2u5-10.el6.4.x86_64.rpm
Provides :
config(gridengine)
libcore.so()(64bit)
libdrmaa.so.1.0()(64bit)
libjgdi.so()(64bit)
libjuti.so()(64bit)
libspoolb.so()(64bit)
libspoolc.so()(64bit)
perl(JSV)
gridengine
gridengine(x86-64)
Requires :
Content of RPM :
/etc/profile.d/sge.csh
/etc/profile.d/sge.sh
/etc/sysconfig/gridengine
/usr/bin/qalter-ge
/usr/bin/qconf
/usr/bin/qdel-ge
/usr/bin/qevent
/usr/bin/qhold-ge
/usr/bin/qhost
/usr/bin/qlogin
/usr/bin/qmake-ge
/usr/bin/qmod
/usr/bin/qping
/usr/bin/qquota
/usr/bin/qrdel
/usr/bin/qresub
/usr/bin/qrls-ge
/usr/bin/qrsh
/usr/bin/qrstat
/usr/bin/qrsub
/usr/bin/qselect-ge
/usr/bin/qsh
/usr/bin/qstat-ge
/usr/bin/qsub-ge
/usr/bin/qtcsh
/usr/bin/sge_shadowd
/usr/bin/sgepasswd
/usr/lib64/gridengine
/usr/lib64/gridengine/jgdi.jar
/usr/lib64/gridengine/juti.jar
There is 397 files more in these RPM.
The Grid Engine RPMs that you will need are:
Install the master RPM on the server which will be your master host. On this machine, run:
$SGE_ROOT/install_qmasterI think after that you will have no question about the quality of those RPMs. You might consider Son of Grid Engine 8.1.8 RPMs, which can be installed on RHEL 6.5 without major problems. For it installation instructions including recommendation on how to resolve dependencies are at
You are experienced builder then like challenges you might also consider Grid Scheduler -- another abandonware version of Sun SGE 6.2u5 with some bug fixes enhancements (support of cgroups in Linux).
Open Grid Scheduler/Grid Engine is released under the Sun Industry Standards Source License (SISSL). New code (new file) is licensed under the BSD license.
The fine print: Most of the code was taken from Sun Grid Engine (more specifically SGE 6.2 update 5 released in 2009), which was developed by Sun Microsystems. Using 6.2u5 as the starting point, we add new features and fixes to create Open Grid Scheduler/Grid Engine.
Configuration of classic Sun distribution is very close to configuration of Oracle Grid Engine (which is actually rebranded Sun SGE 6.2u7). Oracle documentation can be used. See Oracle Grid Engine
|
||||
Bulletin | Latest | Past week | Past month |
|
I've had Sun GridEngine running on our cluster of 12-core HP blades from its earliest days. What has not been working is the the inter-host communication (the ability of the system to schedule and distribute jobs across the nodes). I therefore set out to fix this situation. It turns out that the problems that prevented this from working are mainly caused by quirks in the way that the Debian (and by inheritance, Ubuntu) packaging was done.
Prerequisites for gridengine: Most of the problems that I saw with the Debianised gridengine system are due to a lack of these prerequisites:
1. check the hosts file for localhost.localdomain type entries. If these are present, they will cause host communication to fail. Ensure that, at minimum, there is an entry in the hosts file of the master for each exec node, and in the hosts file of the exec nodes there should be an entry for the master. For example:
I will set up a cluster between my desktop machine, KWIAT22 and my laptop, caleb.
/etc/hosts on KWIAT22 contains:
1 2
34
127.0.0.1 localhost
#127.0.0.1 localhost.localdomain localhost
129.67.46.129 KWIAT22
129.67.46.255 caleb
plus some other irrelevant entries. Note that localhost.localdomain is commented out.
/etc/hosts on caleb contains:
1 2
34
127.0.0.1 caleb
#127.0.0.1 localhost.localdomain localhost
129.67.46.255 caleb
129.67.46.129 KWIAT22
Note again, the localhost.localdomain entry has been commented out.
2. Java is required for inter-host communication. We will use Sun Java, as it is assumed to be most compatible with Sun GridEngine. Edit /etc/apt/sources.list and uncomment the entries for the partner repository:
1 2
deb http://archive.canonical.com/ubuntu maverick partner
deb-src http://archive.canonical.com/ubuntu maverick partner
Then install the JRE:
1 apt-get
install
sun-java6-jre
Check which version of java we've got selected:
1 2
34
root@caleb:~# java -version
java version "1.6.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10.1) (6b22-1.10.1-0ubuntu1)
OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
From that we can see that I still have OpenJDK selected, so we change that:
1 2
34
56
78
910
1112
1314
15root@caleb:~# update-alternatives --config java
There are 2 choices for the alternative java (providing /usr/bin/java).
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/lib/jvm/java-6-openjdk/jre/bin/java 1061 auto mode
1 /usr/lib/jvm/java-6-openjdk/jre/bin/java 1061 manual mode
2 /usr/lib/jvm/java-6-sun/jre/bin/java 63 manual mode
Press enter to keep the current choice[*], or type selection number: 2
update-alternatives: using /usr/lib/jvm/java-6-sun/jre/bin/java to provide /usr/bin/java (java) in manual mode.
root@caleb:~# java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
Now that we have these prerequisites satisfied, we can install the relevant gridengine packages. Installing gridengine on Ubuntu systems is made simple by the packages. We can install the packages on the master node (in our case KWIAT22):
1 apt-get
install
gridengine-client gridengine-qmon gridengine-
exec
gridengine-master
Configure SGE automatically? Yes
SGE cell name: default
SGE master hostname: KWIAT22 (this should be the fully qualified domain name of the SGE master, not localhost)Output will typically look something like this:
1 2
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
2526
2728
2930
3132
3334
3536
3738
3940
4142
4344
4546
4748
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
gridengine-common
The following NEW packages will be installed:
gridengine-client gridengine-common gridengine-exec gridengine-master gridengine-qmon
0 upgraded, 5 newly installed, 0 to remove and 37 not upgraded.
Need to get 0 B/18.7 MB of archives.
After this operation, 44.8 MB of additional disk space will be used.
Do you want to continue [Y/n]?
Preconfiguring packages ...
Selecting previously deselected package gridengine-common.
(Reading database ... 372804 files and directories currently installed.)
Unpacking gridengine-common (from .../gridengine-common_6.2u5-1ubuntu1_all.deb) ...
Selecting previously deselected package gridengine-client.
Unpacking gridengine-client (from .../gridengine-client_6.2u5-1ubuntu1_amd64.deb) ...
Selecting previously deselected package gridengine-exec.
Unpacking gridengine-exec (from .../gridengine-exec_6.2u5-1ubuntu1_amd64.deb) ...
Selecting previously deselected package gridengine-master.
Unpacking gridengine-master (from .../gridengine-master_6.2u5-1ubuntu1_amd64.deb) ...
Selecting previously deselected package gridengine-qmon.
Unpacking gridengine-qmon (from .../gridengine-qmon_6.2u5-1ubuntu1_amd64.deb) ...
Processing triggers for man-db ...
Processing triggers for ureadahead ...
Setting up gridengine-common (6.2u5-1ubuntu1) ...
Creating config file /etc/default/gridengine with new version
Setting up gridengine-client (6.2u5-1ubuntu1) ...
Setting up gridengine-exec (6.2u5-1ubuntu1) ...
error: communication error for "KWIAT22/execd/1" running on port 6445: "can't bind socket"
error: commlib error: can't bind socket (no additional information available)
..........................
critical error: abort qmaster registration due to communication errors
daemonize error: child exited before sending daemonize state
Setting up gridengine-master (6.2u5-1ubuntu1) ...
Initializing cluster with the following parameters:
=> SGE_ROOT: /var/lib/gridengine
=> SGE_CELL: default
=> Spool directory: /var/spool/gridengine/spooldb
=> Initial manager user: sgeadmin
Initializing spool (/var/spool/gridengine/spooldb)
Initializing global configuration based on /usr/share/gridengine/default-configuration
Initializing complexes based on /usr/share/gridengine/centry
Initializing usersets based on /usr/share/gridengine/usersets
Adding user sgeadmin as a manager
Cluster creation complete
Setting up gridengine-qmon (6.2u5-1ubuntu1) ...
Note that the execd cannot bind the socket. This occurs because of a left-over execd that failed to stop from a previous install. It also results if you don't have java installed, as the execd won't respond to /etc/init.d/gridengine-exec stop without java. Also, if you're doing an apt-get purge gridengine-* to get back to a fresh slate, typically the execd will not be stopped properly, despite being removed from the system. This can be fixed by:
1 2
34
56
7root@KWIAT22:~# ps aux |grep sge
sgeadmin 22244 0.0 0.0 135172 4940 ? Sl 17:42 0:00 /usr/lib/gridengine/sge_qmaster
sgeadmin 24272 0.0 0.0 58688 2500 ? Sl May16 0:22 /usr/lib/gridengine/sge_execd
root@KWIAT22:~# kill 24272
root@KWIAT22:~# /etc/init.d/gridengine-exec start
root@KWIAT22:~# /etc/init.d/gridengine-master restart
* Restarting Sun Grid Engine Master Scheduler sge_qmaster
The logfiles we can use for tracking down problems in communication between the qmaster and execd processes are not in the standard debian/ubuntu locations. Instead, they are stored in /var/spool/gridengine/execd/messages for the qmaster and /tmp/execd_messages.[pid] or /var/spool/gridengine/execd/messages for the execd processes. The log messages for our previous socket problem look like this (/tmp/execd_messages.24107):
1 2
34
05/16/2011 20:17:16| main|KWIAT22|E|communication error for "KWIAT22/execd/1" running on port 6445: "can't bind socket"
05/16/2011 20:17:17| main|KWIAT22|E|commlib error: can't bind socket (no additional information available)
05/16/2011 20:17:45| main|KWIAT22|C|abort qmaster registration due to communication errors
05/16/2011 20:17:47| main|KWIAT22|W|daemonize error: child exited before sending daemonize state
If you see any lines containing |E| then you have an error that must be addressed. Any lines with |W| are warnings, and it's probably wise to fix those too.
On the exec nodes:
1 apt-get
install
gridengine-
exec
Configure SGE automatically? yes
SGE cell name: default
SGE master hostname: KWIAT22After installing, you will see the following error in the /tmp/exed_messages.[pid] file and the process will exit:
1 2
05/18/2011 17:53:00| main|caleb|E|getting configuration: denied: host "caleb" is neither submit nor admin host
05/18/2011 17:53:05| main|caleb|C|can't get configuration qmaster - terminating
This occurs because the master doesn't yet know about the exec node. We need to set up a basic configuration on the master. We will use the documentation in /usr/share/doc/gridengine-common/README.Debian, which I will duplicate here, to form the basis of our configuration:
Petter Gustad gridengine at gustad.com
Tue Oct 30 13:20:23 UTC 2012
- Previous message: [gridengine users] Configure gridengine on CentOS 6.3
- Next message: [gridengine users] Configure gridengine on CentOS 6.3
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
From: Reuti <reuti at staff.uni-marburg.de> Subject: Re: [gridengine users] Configure gridengine on CentOS 6.3 Date: Tue, 30 Oct 2012 11:27:49 +0100 Thank you for your reply. > Am 30.10.2012 um 10:53 schrieb Petter Gustad: > >> Does anybody have a pointer to GRE installation docs for CentOS 6.3? >> >> I've been running GRE version 6.2u5p2 (built from source) on Gentoo >> systems for some time. Now I'm trying to add some new nodes running >> CentOS 6.3 using the gridengine-6.2u5-10.el6.4.x86_64 package. > > Just use the version you have already in the shared /usr/sge or your > particular mountpoint. I should probably try this first, at least to verify that it's working. But later I would like to migrate to the CentOS on all my exechosts and leave the installation to somebody else. > For exechosts there is no real installation necessary. It's > sufficient to add the new exechosts as adminhosts, and then start > the sgeexecd on the nodes (you might want to install it in > /etc/init.d/sgeexecd or alike with appropriate links so that they > start while booting). The script you will find in > /usr/sge/default/common/sgeexecd. I'll try this manual approach. > During startup of the sgeexecd they will become exechosts > automatically in SGE's list of exyechosts. > NB: It's not advisable to mix different versions of SGE in a cluster > (while it's fine to mix different platforms of the the same > version). OK. I will try to get the old version running first, then migrate to the more recent version as I replace Gentoo with CentOS on the exechosts. > PS: You installed SGE in addition locally on each node with the > gridengine-6.2u5-10.el6.4.x86_64 package? It might get confused > having more than one version of the tools. There is not PATH pointing to the old version. When I first tried the old version was not even mounted. However, I would assume CentOS users with no previous installation would experince the same problem. Thank you again for your helpful reply. Best regards //Petter >> But the installation procedure seem to be somewhat different to what >> I'm used to. So where can I find an installation guide for the CentOS >> version? >> >> My problem seem to be related to that sge_coshepherd and sge_shepherd >> is missing. Is this a problem with the CentOS 6.3 package or is there >> a different installation procedure on CentOS? >> >> Here's the error message I get when I try to run inst_sge: >> >> >> # export SGE_ROOT=/usr/share/gridengine >> # sh ./inst_sge -x >> missing program >sge_coshepherd< in directory >./bin/lx26-amd64< >> missing program >sge_shepherd< in directory >./bin/lx26-amd64< >> >> Missing Grid Engine binaries! >> >> A complete installation needs the following binaries in >./bin/lx26-amd64<: >> >> qacct qlogin qrsh sge_shepherd >> qalter qmake qselect sge_coshepherd >> qconf qmod qsh sge_execd >> qdel qmon qstat sge_qmaster >> qhold qresub qsub qhost >> qrls qtcsh sge_shadowd qping >> qquota >> >> and the binaries in >./utilbin/lx26-amd64< should be: >> >> adminrun gethostbyaddr loadcheck rlogin uidgid >> authuser checkprog gethostbyname now rsh >> infotext checkuser gethostname openssl rshd >> filestat getservbyname qrsh_starter testsuidroot >> >> Installation failed. Exit. >> >> >> Thanks! >> Best regards >> //Petter >> _______________________________________________ >> users mailing list >> users at gridengine.org >> https://gridengine.org/mailman/listinfo/users[gridengine users] Configure gridengine on CentOS 6.3
Petter Gustad gridengine at gustad.com
Tue Oct 30 13:20:23 UTC 2012
- Previous message: [gridengine users] Configure gridengine on CentOS 6.3
- Next message: [gridengine users] Configure gridengine on CentOS 6.3
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
From: Reuti <reuti at staff.uni-marburg.de> Subject: Re: [gridengine users] Configure gridengine on CentOS 6.3 Date: Tue, 30 Oct 2012 11:27:49 +0100 Thank you for your reply. > Am 30.10.2012 um 10:53 schrieb Petter Gustad: > >> Does anybody have a pointer to GRE installation docs for CentOS 6.3? >> >> I've been running GRE version 6.2u5p2 (built from source) on Gentoo >> systems for some time. Now I'm trying to add some new nodes running >> CentOS 6.3 using the gridengine-6.2u5-10.el6.4.x86_64 package. > > Just use the version you have already in the shared /usr/sge or your > particular mountpoint. I should probably try this first, at least to verify that it's working. But later I would like to migrate to the CentOS on all my exechosts and leave the installation to somebody else. > For exechosts there is no real installation necessary. It's > sufficient to add the new exechosts as adminhosts, and then start > the sgeexecd on the nodes (you might want to install it in > /etc/init.d/sgeexecd or alike with appropriate links so that they > start while booting). The script you will find in > /usr/sge/default/common/sgeexecd. I'll try this manual approach. > During startup of the sgeexecd they will become exechosts > automatically in SGE's list of exyechosts. > NB: It's not advisable to mix different versions of SGE in a cluster > (while it's fine to mix different platforms of the the same > version). OK. I will try to get the old version running first, then migrate to the more recent version as I replace Gentoo with CentOS on the exechosts. > PS: You installed SGE in addition locally on each node with the > gridengine-6.2u5-10.el6.4.x86_64 package? It might get confused > having more than one version of the tools. There is not PATH pointing to the old version. When I first tried the old version was not even mounted. However, I would assume CentOS users with no previous installation would experince the same problem. Thank you again for your helpful reply. Best regards //Petter >> But the installation procedure seem to be somewhat different to what >> I'm used to. So where can I find an installation guide for the CentOS >> version? >> >> My problem seem to be related to that sge_coshepherd and sge_shepherd >> is missing. Is this a problem with the CentOS 6.3 package or is there >> a different installation procedure on CentOS? >> >> Here's the error message I get when I try to run inst_sge: >> >> >> # export SGE_ROOT=/usr/share/gridengine >> # sh ./inst_sge -x >> missing program >sge_coshepherd< in directory >./bin/lx26-amd64< >> missing program >sge_shepherd< in directory >./bin/lx26-amd64< >> >> Missing Grid Engine binaries! >> >> A complete installation needs the following binaries in >./bin/lx26-amd64<: >> >> qacct qlogin qrsh sge_shepherd >> qalter qmake qselect sge_coshepherd >> qconf qmod qsh sge_execd >> qdel qmon qstat sge_qmaster >> qhold qresub qsub qhost >> qrls qtcsh sge_shadowd qping >> qquota >> >> and the binaries in >./utilbin/lx26-amd64< should be: >> >> adminrun gethostbyaddr loadcheck rlogin uidgid >> authuser checkprog gethostbyname now rsh >> infotext checkuser gethostname openssl rshd >> filestat getservbyname qrsh_starter testsuidroot >> >> Installation failed. Exit. >> >> >> Thanks! >> Best regards >> //Petter >> _______________________________________________ >> users mailing list >> users at gridengine.org >> https://gridengine.org/mailman/listinfo/users[gridengine users] Configure gridengine on CentOS 6.3
Petter Gustad gridengine at gustad.com
Tue Oct 30 13:20:23 UTC 2012
- Previous message: [gridengine users] Configure gridengine on CentOS 6.3
- Next message: [gridengine users] Configure gridengine on CentOS 6.3
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
From: Reuti <reuti at staff.uni-marburg.de> Subject: Re: [gridengine users] Configure gridengine on CentOS 6.3 Date: Tue, 30 Oct 2012 11:27:49 +0100 Thank you for your reply. > Am 30.10.2012 um 10:53 schrieb Petter Gustad: > >> Does anybody have a pointer to GRE installation docs for CentOS 6.3? >> >> I've been running GRE version 6.2u5p2 (built from source) on Gentoo >> systems for some time. Now I'm trying to add some new nodes running >> CentOS 6.3 using the gridengine-6.2u5-10.el6.4.x86_64 package. > > Just use the version you have already in the shared /usr/sge or your > particular mountpoint. I should probably try this first, at least to verify that it's working. But later I would like to migrate to the CentOS on all my exechosts and leave the installation to somebody else. > For exechosts there is no real installation necessary. It's > sufficient to add the new exechosts as adminhosts, and then start > the sgeexecd on the nodes (you might want to install it in > /etc/init.d/sgeexecd or alike with appropriate links so that they > start while booting). The script you will find in > /usr/sge/default/common/sgeexecd. I'll try this manual approach. > During startup of the sgeexecd they will become exechosts > automatically in SGE's list of exyechosts. > NB: It's not advisable to mix different versions of SGE in a cluster > (while it's fine to mix different platforms of the the same > version). OK. I will try to get the old version running first, then migrate to the more recent version as I replace Gentoo with CentOS on the exechosts. > PS: You installed SGE in addition locally on each node with the > gridengine-6.2u5-10.el6.4.x86_64 package? It might get confused > having more than one version of the tools. There is not PATH pointing to the old version. When I first tried the old version was not even mounted. However, I would assume CentOS users with no previous installation would experince the same problem. Thank you again for your helpful reply. Best regards //Petter >> But the installation procedure seem to be somewhat different to what >> I'm used to. So where can I find an installation guide for the CentOS >> version? >> >> My problem seem to be related to that sge_coshepherd and sge_shepherd >> is missing. Is this a problem with the CentOS 6.3 package or is there >> a different installation procedure on CentOS? >> >> Here's the error message I get when I try to run inst_sge: >> >> >> # export SGE_ROOT=/usr/share/gridengine >> # sh ./inst_sge -x >> missing program >sge_coshepherd< in directory >./bin/lx26-amd64< >> missing program >sge_shepherd< in directory >./bin/lx26-amd64< >> >> Missing Grid Engine binaries! >> >> A complete installation needs the following binaries in >./bin/lx26-amd64<: >> >> qacct qlogin qrsh sge_shepherd >> qalter qmake qselect sge_coshepherd >> qconf qmod qsh sge_execd >> qdel qmon qstat sge_qmaster >> qhold qresub qsub qhost >> qrls qtcsh sge_shadowd qping >> qquota >> >> and the binaries in >./utilbin/lx26-amd64< should be: >> >> adminrun gethostbyaddr loadcheck rlogin uidgid >> authuser checkprog gethostbyname now rsh >> infotext checkuser gethostname openssl rshd >> filestat getservbyname qrsh_starter testsuidroot >> >> Installation failed. Exit. >> >> >> Thanks! >> Best regards >> //Petter >> _______________________________________________ >> users mailing list >> users at gridengine.org >> https://gridengine.org/mailman/listinfo/users[gridengine users] Configure gridengine on CentOS 6.3
Petter Gustad gridengine at gustad.com
Tue Oct 30 13:20:23 UTC 2012
- Previous message: [gridengine users] Configure gridengine on CentOS 6.3
- Next message: [gridengine users] Configure gridengine on CentOS 6.3
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
From: Reuti <reuti at staff.uni-marburg.de> Subject: Re: [gridengine users] Configure gridengine on CentOS 6.3 Date: Tue, 30 Oct 2012 11:27:49 +0100 Thank you for your reply. > Am 30.10.2012 um 10:53 schrieb Petter Gustad: > >> Does anybody have a pointer to GRE installation docs for CentOS 6.3? >> >> I've been running GRE version 6.2u5p2 (built from source) on Gentoo >> systems for some time. Now I'm trying to add some new nodes running >> CentOS 6.3 using the gridengine-6.2u5-10.el6.4.x86_64 package. > > Just use the version you have already in the shared /usr/sge or your > particular mountpoint. I should probably try this first, at least to verify that it's working. But later I would like to migrate to the CentOS on all my exechosts and leave the installation to somebody else. > For exechosts there is no real installation necessary. It's > sufficient to add the new exechosts as adminhosts, and then start > the sgeexecd on the nodes (you might want to install it in > /etc/init.d/sgeexecd or alike with appropriate links so that they > start while booting). The script you will find in > /usr/sge/default/common/sgeexecd. I'll try this manual approach. > During startup of the sgeexecd they will become exechosts > automatically in SGE's list of exyechosts. > NB: It's not advisable to mix different versions of SGE in a cluster > (while it's fine to mix different platforms of the the same > version). OK. I will try to get the old version running first, then migrate to the more recent version as I replace Gentoo with CentOS on the exechosts. > PS: You installed SGE in addition locally on each node with the > gridengine-6.2u5-10.el6.4.x86_64 package? It might get confused > having more than one version of the tools. There is not PATH pointing to the old version. When I first tried the old version was not even mounted. However, I would assume CentOS users with no previous installation would experince the same problem. Thank you again for your helpful reply. Best regards //Petter >> But the installation procedure seem to be somewhat different to what >> I'm used to. So where can I find an installation guide for the CentOS >> version? >> >> My problem seem to be related to that sge_coshepherd and sge_shepherd >> is missing. Is this a problem with the CentOS 6.3 package or is there >> a different installation procedure on CentOS? >> >> Here's the error message I get when I try to run inst_sge: >> >> >> # export SGE_ROOT=/usr/share/gridengine >> # sh ./inst_sge -x >> missing program >sge_coshepherd< in directory >./bin/lx26-amd64< >> missing program >sge_shepherd< in directory >./bin/lx26-amd64< >> >> Missing Grid Engine binaries! >> >> A complete installation needs the following binaries in >./bin/lx26-amd64<: >> >> qacct qlogin qrsh sge_shepherd >> qalter qmake qselect sge_coshepherd >> qconf qmod qsh sge_execd >> qdel qmon qstat sge_qmaster >> qhold qresub qsub qhost >> qrls qtcsh sge_shadowd qping >> qquota >> >> and the binaries in >./utilbin/lx26-amd64< should be: >> >> adminrun gethostbyaddr loadcheck rlogin uidgid >> authuser checkprog gethostbyname now rsh >> infotext checkuser gethostname openssl rshd >> filestat getservbyname qrsh_starter testsuidroot >> >> Installation failed. Exit. >> >> >> Thanks! >> Best regards >> //Petter >> _______________________________________________ >> users mailing list >> users at gridengine.org >> https://gridengine.org/mailman/listinfo/usersReuti reuti at staff.uni-marburg.de
Tue Oct 30 17:23:31 UTC 2012
- Previous message: [gridengine users] Configure gridengine on CentOS 6.3
- Next message: [gridengine users] Configure gridengine on CentOS 6.3
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi, Am 30.10.2012 um 14:20 schrieb Petter Gustad: >> Am 30.10.2012 um 10:53 schrieb Petter Gustad: >> >>> Does anybody have a pointer to GRE installation docs for CentOS 6.3? >>> >>> I've been running GRE version 6.2u5p2 (built from source) on Gentoo >>> systems for some time. Now I'm trying to add some new nodes running >>> CentOS 6.3 using the gridengine-6.2u5-10.el6.4.x86_64 package. >> > > >> Just use the version you have already in the shared /usr/sge or your >> particular mountpoint. > > I should probably try this first, at least to verify that it's > working. But later I would like to migrate to the CentOS on all my > exechosts and leave the installation to somebody else. Then it's bets to start with the qmaster. >> For exechosts there is no real installation necessary. It's >> sufficient to add the new exechosts as adminhosts, and then start >> the sgeexecd on the nodes (you might want to install it in >> /etc/init.d/sgeexecd or alike with appropriate links so that they >> start while booting). The script you will find in >> /usr/sge/default/common/sgeexecd. > > I'll try this manual approach. > >> During startup of the sgeexecd they will become exechosts >> automatically in SGE's list of exyechosts. > >> NB: It's not advisable to mix different versions of SGE in a cluster >> (while it's fine to mix different platforms of the the same >> version). > > OK. I will try to get the old version running first, then migrate to > the more recent version as I replace Gentoo with CentOS on the > exechosts. Only the exechost? Then I suggest to reinstall the qmaster on the head node with the version you want to use on all exechosts. >> PS: You installed SGE in addition locally on each node with the >> gridengine-6.2u5-10.el6.4.x86_64 package? It might get confused >> having more than one version of the tools. > > There is not PATH pointing to the old version. When I first tried the > old version was not even mounted. It's quite usual to mount /usr/sge (or alike) and /home in the cluster and have only SGE's spool directory local (e.g. /var/spool/sge): http://arc.liv.ac.uk/SGE/howto/nfsreduce.html -- Reuti > However, I would assume CentOS users with no previous installation > would experince the same problem. > > Thank you again for your helpful reply. > > Best regards > //Petter > > >>> But the installation procedure seem to be somewhat different to what >>> I'm used to. So where can I find an installation guide for the CentOS >>> version? >>> >>> My problem seem to be related to that sge_coshepherd and sge_shepherd >>> is missing. Is this a problem with the CentOS 6.3 package or is there >>> a different installation procedure on CentOS? >>> >>> Here's the error message I get when I try to run inst_sge: >>> >>> >>> # export SGE_ROOT=/usr/share/gridengine >>> # sh ./inst_sge -x >>> missing program >sge_coshepherd< in directory >./bin/lx26-amd64< >>> missing program >sge_shepherd< in directory >./bin/lx26-amd64< >>> >>> Missing Grid Engine binaries! >>> >>> A complete installation needs the following binaries in >./bin/lx26-amd64<: >>> >>> qacct qlogin qrsh sge_shepherd >>> qalter qmake qselect sge_coshepherd >>> qconf qmod qsh sge_execd >>> qdel qmon qstat sge_qmaster >>> qhold qresub qsub qhost >>> qrls qtcsh sge_shadowd qping >>> qquota >>> >>> and the binaries in >./utilbin/lx26-amd64< should be: >>> >>> adminrun gethostbyaddr loadcheck rlogin uidgid >>> authuser checkprog gethostbyname now rsh >>> infotext checkuser gethostname openssl rshd >>> filestat getservbyname qrsh_starter testsuidroot >>> >>> Installation failed. Exit. >>> >>> >>> Thanks! >>> Best regards >>> //Petter >>> _______________________________________________ >>> users mailing list >>> users at gridengine.org >>> https://gridengine.org/mailman/listinfo/users >>Orion Poplawski orion at cora.nwra.com
Tue Oct 30 14:46:37 UTC 2012
- Previous message: [gridengine users] Configure gridengine on CentOS 6.3
- Next message: [gridengine users] Jobs are not being Terminated ( Job should have finished since )
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 10/30/2012 03:53 AM, Petter Gustad wrote: > > Does anybody have a pointer to GRE installation docs for CentOS 6.3? > > I've been running GRE version 6.2u5p2 (built from source) on Gentoo > systems for some time. Now I'm trying to add some new nodes running > CentOS 6.3 using the gridengine-6.2u5-10.el6.4.x86_64 package. > > But the installation procedure seem to be somewhat different to what > I'm used to. So where can I find an installation guide for the CentOS > version? Strictly speaking it's not CentOS, but EPEL. Install guide is in: /usr/share/doc/gridengine-6.2u5/README > My problem seem to be related to that sge_coshepherd and sge_shepherd > is missing. Is this a problem with the CentOS 6.3 package or is there > a different installation procedure on CentOS? Have you installed the gridengine-execd package on the exec hosts? Yes, the install is a little different. See the README. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA, Boulder Office FAX: 303-415-9702 3380 Mitchell Lane orion at nwra.com Boulder, CO 80301 http://www.nwra.com
June 9, 2011 | wiki.unixh4cks.com
On master node
Installing prerequisites
apt-get install t1-xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86-nonfree-syriac xfonts-75dpi xfonts-100dpi xfs xfsttnano /etc/apt/sources.listUncomment these two lines
deb http://archive.canonical.com/ubuntu natty partner deb-src http://archive.canonical.com/ubuntu natty partnerInstall Java Runtime.
apt-get install sun-java6-jreIf you have mutiple java installations select the required (sun java 1.6 or higher) using
update-alternatives --config javanano /etc/hostsSetting up environment
- Provide proper host definition
192.168.122.75 sge0.shadow.local sge0 192.168.122.115 sge1.shadow.local sge0Install Gridengine master,client and exec packages on master node
apt-get install gridengine-client gridengine-common gridengine-master gridengine-qmon gridengine-execConfigure SGE automatically? Yes SGE cell name: default SGE master hostname: sge0.shadow.local (this should be the fully qualified domain name of the SGE master, not localhost)
On exec node
Install sun-java6-jre as above and set proper host definition on /etc/host file.
Install Gridengine exec package
apt-get install gridengine-execSee status of exec process
root@sge1:~# cat /tmp/execd_messages.[PID] 06/08/2011 21:48:52| main|sge1|E|can't connect to service 06/08/2011 21:48:52| main|sge1|E|can't get configuration from qmaster -- backgroundingThis occurs because the master doesn't yet know about the exec node. We need to set up a basic configuration on the master
Configuration
We will use the documentation in /usr/share/doc/gridengine-common/README.Debian
- Initially, only the sgeadmin user has admin privileges. It is suggested that you add yourself as a manager and perform the rest of these tasks as your own user:
Syntax : sudo -u sgeadmin qconf -am user_name
root@sge0:~# sudo -u sgeadmin qconf -am basil [email protected] added "basil" to manager list
- and to a userlist.
Syntax: qconf -au myuser users
root@sge0:~# qconf -au basil users added "basil" to access list "users"
- Add a submission host.
Syntax : qconf -as myhost.mydomain
root@sge0:~# qconf -as sge0.shadow.local sge0.shadow.local added to submit host list
- Add a new host group.
Syntax : qconf -ahgrp @allhosts
root@sge0:~# qconf -ahgrp @allhosts # Just save the file without modifying it [email protected] added "@allhosts" to host group list
- Add the exec host to the @allhosts list.
Syntax : qconf -aattr hostgroup hostlist myhost.mydomain @allhosts
root@sge0:~# qconf -aattr hostgroup hostlist sge0.shadow.local @allhosts [email protected] modified "@allhosts" in host group list
- Add a queue.
Syntax : qconf -aq main.q
root@sge0:~# qconf -aq main.q # just save the file without modifying it [email protected] added "main.q" to cluster queue list
- Add the host group to the queue.
Syntax : qconf -aattr queue hostlist @allhosts main.q
root@sge0:~# qconf -aattr queue hostlist @allhosts main.q [email protected] modified "main.q" in cluster queue list
- Make sure there is a slot allocated to the execd.
Syntax : qconf -aattr queue slots "[myhost.mydomain=1]" main.q
root@sge0:~# qconf -aattr queue slots "2, [sge0.shadow.local=3]" main.q [email protected] modified "main.q" in cluster queue list2 by default for all nodes, 1 specifically for sge0.shadow.local, which leaves 1 of the 2 cpus free for the master process.
adding Exec node to the grid
We then add sge1.shadow.local as a submit and exec host
root@sge0:~# qconf -as sge1.shadow.local sge1.shadow.local added to submit host listhostname sge1.shadow.local load_scaling NONE complex_values NONE user_lists NONE xuser_lists NONE projects NONE xprojects NONE usage_scaling NONE report_variables NONEroot@sge0:~# qconf -aattr hostgroup hostlist sge1.shadow.local @allhosts [email protected] modified "@allhosts" in host group listKill the sge_execd process in exec node and then start it via init.d script. Check that it doesn't create a log file in /tmp/execd_messages.[pid]. If it doesn't then it's OK.
Back on our master node, a qstat -f should now show us all set up. Use qmon if the master node have X running.
root@sge0:~# qstat -f queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- [email protected] BIP 0/0/3 0.11 lx26-amd64 --------------------------------------------------------------------------------- [email protected] BIP 0/0/2 0.02 lx26-amd64
- Download file for SGE, ge62u5_lx24-x86.tar.gz and ompi_par.txt
- Untar the archive:
tar -zxvf ge62u5_lx24-x86.tar.gz cd ge6.2u5
- Create directory /usr/local/SGE and uncompress the two archives into it:
mkdir /usr/local/SGE cp ge-6.2u5-bin-lx24-x86.tar.gz /usr/local/SGE cp ge-6.2u5-common.tar.gz /usr/local/SGE cd /usr/local/SGE tar -zxvf ge-6.2u5-bin-lx24-x86.tar.gz tar -zxvf ge-6.2u5-common.tar.gz- Run the installation scripts for the master daemion:
cd /usr/local/SGE ./install_qmasterChoose 'classic spooling' setting during the installation. The other parameters can be the default ones.
- Add both the desktop and the node to the list of administrative hosts. Add the parrallel environment for MPI.
qconf -ah desktop01 qconf -ah node01 qconf -Ap ompi_par.txt
./install_execd
- include the following line in file /etc/profile
. /usr/local/SGE/default/common/settings.sh
- Archive the installation, copy it onto node01 daemon
cd /usr/local tar -zcvf SGE.tar SGE scp SGE.tar root@node01:/usr/local
- On the node, run installation of the execution daemon
cd /usr/local tar -zxvf SGE.tar cd SGE ./install_execd
- Install libmotif3 libraries:
apt-get install libmotif3 source /etc/profile
- Complete the SGE configuration by running qmon
qmon
fsl.fmrib.ox.ac.uk
This is a quick walk through to get Grid Engine going on Linux for those who would like to use it for something like FSL. This documentation is a little old, being written when the Grid Engine software was owned by Sun and often referred to as SGE (Sun Grid Engine). However, this covers the basic requirements. A quick start guide for Ubuntu/Debian is available here, but more detailed setup can be found on this page.
Since the demise of the open source (Sun) Grid Engine, various ports have sprung up. Ubuntu/Debian package the last publicly available release (6.2u5), but users of Red Hat variants (CentOS, Scientific Linux) or Debian/Ubuntu users wishing to use a more modern release should look to installing Son of Grid Engine which makes available RPM and DEB packages and is still actively maintained (last update November 2013).
Grid Engine generally consists of one master (qmaster) and a number of execute (exec) hosts, note that the qmaster machine can also be an exec host which is fine for small deployments, but large clusters should look to keeping these functions separate.
This documentation was originally produced by A. Janke ([email protected]) and is now maintained by the FSL team.
sit.auckland.ac.nz
The Sun Grid Engine makes it easy for users to run compute jobs.
http://gridengine.sunsource.net/
The Sun Grid Engine (SGE) allows jobs to be queued for running on a suitable compute host. Suitable means that there is currently spare CPU time, sufficient memory for your job, and any other number of characteristics which you wish to test. It can also be used to launch jobs running MPI.
From an administrator's perspective it allows a lot more control over jobs. If a host is overloaded (due to no fault of SGE's - sometimes jobs consume > 100% CPU) jobs can easily be suspended or rescheduled to run on a less-loaded host. Jobs will be run, and they'll be resubmitted until they complete (depending on job options).
A variety of material regarding SGE is on Stephen Cope's Sun Grid Engine page. This covers both administration, using SGE, and other common questions from users (namely, "I want my job running on the fastest machine, now.")
Available Sun Grid Engine installations:
web.stanford.edu
We're using the Debian packages of "Sun Grid Engine" which isn't quite "Sun" anymore since Oracle bought Sun, and the Debian packages are a bit behind the current forks of Open Grid Engine or Son of Grid Engine or Univa Grid Engine.
June 05, 2012 | Lindqvist
Finally, I've got nfs set up to share a folder from the front node (~/jobs) to all my subnodes. See here for instructions on how to set it up: http://verahill.blogspot.com.au/2012/02/debian-testing-wheezy-64-sharing-folder.htmlWhen you use ecce, you can and SHOULD use local scratch folders i.e. use your nfs shared folder as the runtime folder, but set scratch to e.g. /tmp which isn't an nfs exported folder.
Before you start, stop and purge
if you've tried installing and configuring gridengine in the past, there may be processes and files which will interfere. On each computer do
ps aux|grep sge
use sudo kill to kill any sge processes
Then
sudo apt-get purge gridengine-*
First install sun/oracle java on all nodes.[UPDATE 24 Aug 2013: openjdk-6-jre or openjdk-7-jre work fine, so you can skip this]
There's no sun/oracle java in the debian testing repos anymore, so we'll follow this: http://verahill.blogspot.com.au/2012/04/installing-sunoracle-java-in-debian.html
sudo apt-get install java-package
Download the jre-6u31-linux-x64.bin from here: http://java.com/en/download/manual.jsp?locale=en
make-jpkg jre-6u31-linux-x64.bin
sudo dpkg -i oracle-j2re1.6_1.6.0+update31_amd64.debThen select your shiny oracle java by doing:
sudo update-alternatives --config java
sudo update-alternatives --config javaws
Do that one every node, front and subnodes. You don't have to do all the steps though: you just built oracle-j2re1.6_1.6.0+update31_amd64.deb so copy that to your nodes, do sudo dpkg -i oracle-j2re1.6_1.6.0+update31_amd64.deb and then do the sudo update-alternatives dance.
Front node:
sudo apt-get install gridengine-client gridengine-qmon gridengine-exec gridengine-master
(at the moment this installs v 6.2u5-7)I used the following:
Configure automatically: yes=> SGE_ROOT: /var/lib/gridengine
Cell name: rupert
Master hostname: beryllium
=> SGE_CELL: rupert
=> Spool directory: /var/spool/gridengine/spooldb
=> Initial manager user: sgeadmin
Once it was installed, I added myself as an sgeadmin:
sudo -u sgeadmin qconf -am ${USER}
sgeadmin@beryllium added "verahill" to manager listand to the user list:
qconf -au ${USER} users
added "verahill" to access list "users"We add beryllium as a submit host
qconf -as beryllium
beryllium added to submit host listCreate the group allhosts
qconf -ahgrp @allhosts
1 group_name @allhostsI made no changes
2 hostlist NONE
Add beryllium to the hostlist
qconf -aattr hostgroup hostlist beryllium @allhostsverahill@beryllium modified "@allhosts" in host group listqconf -aq main.qThis opens another text file. I made no changes.
verahill@beryllium added "main.q" to cluster queue listAdd the host group to the queue:qconf -aattr queue hostlist @allhosts main.q
verahill@beryllium modified "main.q" in cluster queue list1 core on beryllium is added to SGE:qconf -aattr queue slots "[beryllium=1]" main.q
verahill@beryllium modified "main.q" in cluster queue listAdd execution host
qconf -ae
which opens a text file in vimI edited hostname (boron) but nothing else. Saving returns
added host boron to exec host listAdd boron as a submit host
qconf -as boron
boron added to submit host listAdd 3 cores for boron:
qconf -aattr queue slots "[boron=3]" main.q
Add boron to the queue
qconf -aattr hostgroup hostlist boron @allhosts
Here's my history list in case you can't be bother reading everything in detail above.
2015 sudo apt-get install gridengine-client gridengine-qmon gridengine-exec gridengine-master
2016 sudo -u sgeadmin qconf -am ${USER}
2017 qconf -help
2018 qconf user_list
2019 qconf -au ${USER} users
2020 qconf -as beryllium
2021 qconf -ahgrp @allhosts
2022 qconf -aattr hostgroup hostlist beryllium @allhosts
2023 qconf -aq main.q
2024 qconf -aattr queue hostlist @allhosts main.q
2025 qconf -aattr queue slots "[beryllium=1]" main.q
2026 qconf -as boron
2027 qconf -ae
2028 qconf -aattr hostgroup hostlist beryllium @allhosts
2029 qconf -aattr queue slots "[boron=3]" main.q
2030 qconf -aattr hostgroup hostlist boron @allhosts
Next, set up your subnodes:My example here is a subnode called boron.
On the subnode:
sudo apt-get install gridengine-exec gridengine-client
Configure automatically: yesThis node is called boron.
Cell name: rupert
Master hostname: berylliumCheck whether sge_execd got start after the install
ps aux|grep sge
sgeadmin 25091 0.0 0.0 31712 1968 ? Sl 13:54 0:00 /usr/lib/gridengine/sge_execdIf not, and only if not, do
/etc/init.d/gridengine-exec start
cat /tmp/execd_messages.*
If there's no message corresponding to the current iteration of sge (i.e. you may have old error messages from earlier attempts) then you're probably in a good place.Back to the front node:
qhost
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUSIf the exec node isn't recognised (i.e. it's listed but no cpu info or anything else) then you're in a dark place. Probably you'll find a message about "request for user soandso does not match credentials" in your /tmp/execd_messages.* files on the exec node. The only way I got that solved was stopping all sge processes everywhere, purging all gridengine-* packages on all nodes and starting from the beginning -- hence why I posted the history output above.
-------------------------------------------------------------------------------
global - - - - - - -
beryllium lx26-amd64 6 0.57 7.8G 3.9G 14.9G 597.7M
boron lx26-amd64 3 0.62 3.8G 255.6M 14.9G 0.0
qstat -f
queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
main.q@beryllium BIP 0/0/1 0.64 lx26-amd64
---------------------------------------------------------------------------------
main.q@boron BIP 0/0/3 0.72 lx26-amd64
GitHub
Debian provides binaries for Grid Engine with the packages:
gridengine-master
,gridengine-exec
,gridengine-client
. The queue master stores logging in/var/spool/gridengine/qmaster/messages
. It contains scheduling decisions and error information about the daemon as well as failed jobs. The corresponding daemonsge_qmaster
needs to be running in order to accept jobs. You can check this by looking for processes from the usersgeadmin
. Control the master daemons using the script/etc/init.d/gridengine-master
. In order to accept jobs from the queue master each execution node needs to have a correctly configuredsge_execd
daemon running under the user accountsgeadmin
. Control the execution daemons using the init-script/etc/init.d/gridengine-exec
.In case of communication problem between queue master and the exec node lookout for log files like
/tmp/exed_messages.[pid]
. Also the queue master indicates authorization problems with execution nodes in its log-file. The job spool directory is located in/var/spool/gridengine/execd/
.Installation
The most simple setup configures a single machine to host the Grid Engine queue master, to act as an execution node and to be an job submit node with client command-line interface. The following example is build with a virtual machine named
lxdev01.devops.test
running Debian Wheezy as operating system." apt-get install gridengine-master gridengine-exec gridengine-client [...SNIP...] " /etc/init.d/gridengine-exec start " qhost HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - lxdev01.devops.test lx26-amd64 1 0.42 497.0M 64.7M 0.0 0.0
After installing all the packages the queue master daemon
sge_qmaster
should be running. Once the init-scriptgridengine-exec
starts an instance ofsge_execd
daemon the host can execute jobs. Before a job can be submitted a host group@default
is defined, which in turn is used to configure a queuedefault
." qconf -ahgrp @default [email protected] added "@default" to host group list " qconf -shgrp @default group_name @default hostlist lxdev01.devops.test " qconf -aq default [email protected] added "default" to cluster queue list " qconf -sq default | head -2 qname default hostlist @default
Last thing to do is to add the host to the list of submit nodes.
" qstat -g c CLUSTER QUEUE CQLOAD USED RES AVAIL TOTAL aoACDS cdsuE -------------------------------------------------------------------------------- default 0.02 0 0 1 1 0 0 " qstat -f queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- [email protected] BIP 0/0/1 0.01 lx26-amd64 " qconf -as $(hostname -f) lxdev01.devops.test added to submit host list
Installation and configuration is done with root privileges, to submit the first job a user account
devops
is used." cat echo.sge echo $USER@`hostname`:`pwd` " qsub -j y -o /tmp/job.log -wd /tmp echo.sge Your job 1 ("echo.sge") has been submitted " qstat job-ID prior name user state submit/start at queue slots -------------------------------------------------------------------------------- 1 0.00000 echo.sge devops qw 12/06/2012 13:48:25 1 " cat /tmp/job.log devops@lxdev01:/tmp " qacct -j 1 ============================================================== qname default hostname lxdev01.devops.test group devops owner devops project NONE department defaultdepartment jobname echo.sge [...SNIP...]
Adding Another Execution-Node
The actually build a "cluster" of machines at least a second execution node
lxdev02.devops.test
is needed. Before this node is installed we can add it to the@default
host group." qconf -mhgrp @default lxdev01.devops.test modified "@default" in host group list " qconf -shgrp @default group_name @default hostlist lxdev01.devops.test lxdev02.devops.test " qhost HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - lxdev01.devops.test lx26-amd64 1 0.01 497.0M 65.5M 0.0 0.0 lxdev02.devops.test - - - - - - -
On the node itself install only the execution node package and configure the address of the queue master in the file
/var/lib/gridengine/default/common/act_qmaster
." apt-get install gridengine-exec [...SNIP...] " echo "lxdev01.devops.test" > /var/lib/gridengine/default/common/act_qmaster " service gridengine-exec restart Restarting Sun Grid Engine Execution Daemon: sge_execd.
After restarting the execution daemon, it should register with the queue master.
" qhost HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - lxdev01.devops.test lx26-amd64 1 0.01 497.0M 65.7M 0.0 0.0 lxdev02.devops.test lx26-amd64 1 0.14 1003.0M 64.5M 0.0 0.0 " for i in {1..10}; do qsub -b y sleep -- 10 ; done Your job 7 ("sleep") has been submitted Your job 8 ("sleep") has been submitted Your job 9 ("sleep") has been submitted Your job 10 ("sleep") has been submitted Your job 11 ("sleep") has been submitted Your job 12 ("sleep") has been submitted Your job 13 ("sleep") has been submitted Your job 14 ("sleep") has been submitted Your job 15 ("sleep") has been submitted Your job 16 ("sleep") has been submitted
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March 12, 2019