1. Description
This node will be a Condor Submit node for the user jobs. You can install many such nodes to improve scalability of your system.
3. Needed software
Software | Notes | Install Before glideinWMS |
Linux OS | A reasonably recent Linux OS (RH/SL4 nad RH/SL5 tested at press time). | X |
The OSG client software. | This can be installed prior to glideinWMS, but the installer can install it inline with the glideinWMS install | |
The Condor distribution as a tarball. | The installer will use the tarball to install and configure Condor inline | |
The glideinWMS software. |
NOTE:
- Condor version v7.3.1 has a known issue with incorrect return/exit codes of condor_status and condor_q
-
If you are using Condor version v7.3.2 disable VOMS checking in condor_config file used by Condor daemons other
than that used by user schedd. VOMS checking adds unrequired overhead. To do so, set
USE_VOMS_ATTRIBUTES = False
or for individual condor daemons like collector
COLLECTOR.USE_VOMS_ATTRIBUTES = False
4. Before you begin...
4.1 Required Certificates/Proxies
Each service in the GlideinWMS will use a x509 certificate in order to identify itself using GSI authentication (see the Quick Reference Guide" for an overview. The installer will ask for several DNs for GSI authentication. You have the option of using a service certificate or a proxy. These should be created and put in place before running the installer. The following is a list of DNs the installer will ask for:
- Pool Collector cert/proxy DN
- Glidein Frontend cert/proxy DN
4.2 Miscellaneous Notes
By default, match authentication will be used.5. Installation instructions
The scheduler node software should be installed as root. While it is possible to run the schedds as a non-privileged user, it has some serious The whole process is managed by a install script described below.
Move into
glideinWMS/install
and execute
./glideinWMS_install
You will be presented with this screen:
Welcome to the glideinWMS Installation Helper What do you want to install? (May select several options at one, using a , separated list) [1] glideinWMS Schedds and Collector [2] Glidein Factory [3] GCB [4] User Pool Collector [5] User Schedd [6] Condor for Glidein Frontend [7] Glidein Frontend [8] Components
Select 5.
Now follow the instructions and install all the software components. Further detail and a walk-through is presented below:
Field | Installation Text | Description |
Condor | Where do you have the Condor tarball? Where do you want to install it? If something goes wrong with Condor, who should get email about it? |
The user Schedd is part of the Condor pool that will actually run the user's jobs.
This will be the server that will schedule jobs from the users to the glidein pilots that
will be running in the grid. This piece of the install will configure
the collector to work with the submitted glideins. For this, you will need a condor distribution and a location to install to. It will also prompt for a administrator email. Installing this into a user home directory is not recommended. |
GSI Security | Where can I find the directory with the trusted CAs? | GSI security is based on x509 certificates. You will next need a certificate or proxy for the WMS collector. See the previous section for more information on required certificates and proxies. |
Please insert all such DNs, together with a user nickname. |
This will populate the condor_mapfile with the dns and nicknames.
See GSI Reference for more information.
You will need to provide the DN of the pool collector and the DN of the frontend. Match authentication is enabled by default. This changes the Condor security model to rely more on the Collector. When enabled, * only the Collector needs to know the DNs of all glidein proxies. * and the SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION. attribute is set to True in the condor_config of the Schedd service. If you have a reason not to use it, * be sure to set to False the USE_MATCH_AUTH attribute in both in both the Factory and Frontend configuration files. * you will need to list the DNs of all glidein proxies both in the Collector and in the Schedd. * you will need the DN of the glidein startd as well. Any other DNs you list will be put into the condor_mapfile for authentication. |
|
Collector | What node is the collector running (i.e. CONDOR_HOST)? | Here you will have to provide the host (and port if not default) of the User pool collector. |
GCB | Please list all the GCB servers you will be using | If you are using GCB instead of CCB (now included with Condor), you will need to put the hostnames of the GCB nodes here. Note that GCB is not recommended unless you have network restrictions that prevent you from using CCB. |
Tuning | Do you want to enable the shared_port_daemon?: (y/n) What port should it use?: [9615] |
The shared port daemon drastically reduces the number of ports used. This port must be open on any firewalls. Recommended for Condor versions 7.5.4 and above. (Does not work with older versions of condor) |
How many secondary schedds do you want? |
The proper value of this value depends on many factors, including the memory and CPU of the
server running it as well as the number of jobs submitted and the number of entry points.
This should be increased enough so that each schedd will not have to handle multiple
glidein requests to different entry points from factories simultaneously.
The default install of 10 schedds should be enough to handle a site with around 10000 jobs.
If you are only running hundreds of jobs, you may want to tune this down.
Conversely, with higher amounts of jobs, this may need to be increased.
This value depends on your installation and can later be tuned
based on load and average number of jobs. A rough estimate is to use one collector per 100 glideins with a hard limit on 200 glideins per collector. |
Here a possible set of answers is presented; your setup will probably be slightly different:
Welcome to the glideinWMS Installation Helper What do you want to install? (May select several options at one, using a , separated list) [1] glideinWMS Schedds and Collector [2] Glidein Factory [3] GCB [4] User Pool Collector [5] User Schedd [6] Condor for Glidein Frontend [7] Glidein Frontend [8] Components Please select: 5 The following profiles will be installed: [5] User Schedd Installing user submit schedds Installing condor You will now need the Condor tarball You can find it on http://www.cs.wisc.edu/condor/ Versions v7.4.4 and 7.5.3 have been tested, but you should always use the latest one Where do you have the Condor tarball? /root/condor-7.4.4-linux-x86_64-rhel5-dynamic.tar.gz Checking... Seems condor version 7.4.4 Where do you want to install it?: [/opt/glidecondor] /opt/glidecondor Directory '/opt/glidecondor' does not exist, should I create it?: (y/n) y Installing condor in '/opt/glidecondor' If something goes wrong with Condor, who should get email about it?: admin@my.org Extracting from tarball Running condor_configure Installing Condor from /opt/glidecondor/tar/condor-7.4.4 to /opt/glidecondor Condor has been installed into: /opt/glidecondor Configured condor using these configuration files: global: /opt/glidecondor/etc/condor_config local: /opt/glidecondor/condor_local/condor_config.local You should look inside the installation log for some details about how Condor was installed. Created scripts which can be sourced by users to setup their Condor environment variables. These are: sh: /opt/glidecondor/condor.sh csh: /opt/glidecondor/condor.csh Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y The Condor config has been put in your login files Please remember to exit and reenter the terminal after the install Condor installed Configuring GSI security To use the GSI security, you either need a valid GSI proxy or a valid x509 certificate and relative key The cert/proxy subject of User Schedd, Glidein Pool Collector and Glidein Frontend (used by glidein startd) will be added as the trusted daemon in GSI_DAEMON_NAME in the condor configuration. Will you be using a proxy or a cert? (proxy/cert) cert Where is your certificate located?: /etc/grid-security/hostcert.pem Where is your certificate key located?: /etc/grid-security/hostkey.pem My DN = '/DC=org/DC=doegrids/OU=Services/CN=submit1.my.org' Glidein Pool Collector and Glidein Frontend and glidein startd will connect to and act as client to User Schedd. Subjects for these services should be in the gridmap file of the User Schedd. Please insert all such DNs, together with a user nickname. An empty DN entry means you are done. DN: /DC=org/DC=doegrids/OU=Services/CN=collector1.my.org nickname: [condor001] collector Is this a trusted Condor daemon?: (y/n) y DN: /DC=org/DC=doegrids/OU=Service/CN=gfactory/gfactory1.my.org nickname: [condor002] pilot Is this a trusted Condor daemon?: (y/n) y DN: enter Condor 7.5.3 and up support the use of a shared_port_daemon that drastically reduces the number of ports used and thus also improves the pool scalability. Do you want to enable the shadred_port_daemon?: (y/n) y What port should it use?: [9615] enter Make sure you use Condor 7.5.3 and up in glideins, too What node is the collector running (i.e. CONDOR_HOST)?: collector1.my.org Please list all the GCB servers you will be using Leave an empty line when finished GCB node:enter How many secondary schedds do you want?: [9] 4
The installer will also start the Condor daemons.
The installer also created init.d scripts in:
/etc/init.d/condor
Use it to stop and restart the schedd(s)
NOTE: Since Schedd node is installed as root, condor installer will create condor.sh and condor.csh files in /etc/profile.d so that you have correct environment setup. If this machine hosts another condor installation or you prefer not to have user PATH and CONDOR_CONFIG point to this condor setup automatically, you need to manually move these files out from the /etc/profile.d
6. Submitting user jobs
From the user point of view, this is just a regular Condor pool.
However, since the resources potentially come from all over the world, users need to create more complex Requirements,
in order to prevent the jobs from landing on sites that cannot run their jobs.
Users will need to know which attributes the glideins publish and use them accordingly.
One useful attribute that all glideins publish is GLIDEIN_Site. If a user wants to restrict its job to a list of sites, he can do it by using:
+DESIRED_Sites = "Site1,Site4,Site7,Site22"
Requirements = stringListMember(GLIDEIN_Site,DESIRED_Sites)
Some glideins may also need to properly identify the final user, using GSI authentication. A user should thus add the following line:
x509userproxy = <path to X509 proxy>
to their Condor submission file.
7. Fine Tunning for Large Scale Installations
7.1 Increase the number of available file descriptors
Number of ports used by the condor_schedd process increases as the number of jobs running/queued in the schedd increase. The default number of file descriptors per process is 1024 on most systems. Increase this limit to ~16k or value higher than number of jobs that might be in the queue at any given time. This is particularly required for large scale installations.
ulimit -n 16384
In most cases for default installation, user schedd is configured to start as root and started through the script in /etc/init.d/condor. This is a good place to set higher file descriptor limit for the schedd process. Condor can also be started using the xinetd mechanism, so make sure to check the configuration in /etc/xinet.d if you are using this functionality.
For more information on advanced condor installations and multiple schedds, see the Advanced Condor page.