Example Configuration
Below is an example frontend configuration xml file. Click on any piece for a more detailed description.<downtimes /></frontend>
<match match_expr="True">
<factory query_expr="True"></match>
<match_attrs /></factory>
<collectors>
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=factory-server.fnal.gov" comment="" factory_identity="factoryuser@factory-server.fnal.gov" my_identity="frontenduser@frontend-server.fnal.gov" node="factory-server.fnal.gov:8618" /></collectors>
<job comment="" query_expr="(JobUniverse==5)&&(GLIDEIN_Is_Monitor =!= TRUE)&&(JOB_Is_Monitor =!= TRUE)">
<match_attrs /></job>
<schedds>
<schedd DN="/DC=org/DC=doegrids/OU=Services/CN=userpool.fnal.gov" fullname="userpool.fnal.gov"></schedds>
<schedd DN="/DC=org/DC=doegrids/OU=Services/CN=userpool.fnal.gov" fullname="schedd_jobs1@userpool.fnal.gov">
<schedd DN="/DC=org/DC=doegrids/OU=Services/CN=userpool.fnal.gov" fullname="schedd_jobs2@userpool.fnal.gov">
<monitor base_dir="/var/www/html/vofrontend/monitor" flot_dir="/opt/javascriptrrd-0.6.3/flot" javascriptRRD_dir="/opt/javascriptrrd-0.6.3/src/lib" jquery_dir="/opt/javascriptrrd-0.6.3/flot">
<monitor_footer display_txt="Legal Disclaimer" href_link="/site/disclaimer.html" />
<security classad_proxy="/etc/grid-security/vocert.pem" proxy_DN="/DC=org/DC=doegrids/OU=Services/CN=frontend-server.fnal.gov" proxy_selection_plugin="ProxyAll" security_name="frontenduser" sym_key="aes_256_cbc">
<proxies><stage base_dir="/var/www/html/vofrontend/stage" use_symlink="True" web_base url="http://frontend-server.fnal.gov:9000/vofrontend/stage">
<proxy absfname="/tmp/x509up_u" security_class="frontend"></proxies>
<work base_dir="/opt/vofrontend" base_log_dir="/opt/vofrontend/logs">
<attrs>
<attr name="GLIDECLIENT_Rank" glidein_publish="False" job_publish="False " parameter="True" type="string" value="1"></attrs>
<attr name="GLIDECLIENT_Start" glidein_publish="False" job_publish="False" parameter="True" type="string" value="True">
<attr name="GLIDEIN_Expose_Grid_Env" glidein_publish="True" job_publish="True" parameter="False" type="string" value="True">
<attr name="GLIDEIN_Glexec_Use" glidein_publish="True" job_publish="True" parameter="False" type="string" value="OPTIONAL">
<attr name="USE_MATCH_AUTH" glidein_publish="False" job_publish="False" parameter="True" type="string" value="True">
<groups>
<group name="main" enabled="True"></groups>
<config></group>
<idle_glideins_per_entry max="100" reserve="5"></config>
<idle_vms_per_entry curb="5" max="100">
<running_glideins_per_entry max="10000" relative_to_queue="1.15">
<downtimes />
<match match_expr="True">
<factory query_expr="True"><security>
<match_attrs /></factory>
<collectors />
<job query_expr="True">
<match_attrs /></job>
<schedds />
<proxies /></security>
<attrs />
<files />
<collectors groups="default,ha">
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=usercollector.fnal.gov" node="usercollector.fnal.gov" secondary="False" group="default"></collectors>
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=usercollector.fnal.gov" node="usercollector.fnal.gov" secondary="False" group="ha">
<files />
The Glidein Frontend Configuration
The Glidein Frontend configuration involves creating the configuration directory and files and then creating the daemons. As in the Glidein Factory set up, an XML file is converted into a configuration tree by a configuration tool.
For the installer to create the Glidein Frontend instance from the configuration directory and grid mapfile, the following objects can be defined:
- <frontend frontend_name="your name" advertise_delay="seconds" loop_delay="nr" > advertise_with_tcp="True|False" advertise_with_multiple="True|False" restart_attempts="3" restart_interval="1800">The frontend_name is a combination of the frontend and instance names specified during installation. It is used to create Glidein Frontend instance directory and files. The delay parameters define how active the Glidein Frontend should be. restart_attempts defines how many times frontend should auto-restart a crashed frontend group within restart_interval (sec). If a group crashes more than max allowed restart_attempts during the restart_interval frontend will shutdown. advertise_with_tcp defines if TCP should be use to advertise the ClassAds to the factory, and advertise_with_multiple can enable the condor_advertise -multiple option present in Condor 7.5.4 and up.
- <frontend><match><factory><collectors><collector my_identity="joe@collector1.myorg.com" node="collector1.myorg.com" factory_identity="gfactory@gfactory1.my.org" DN="/DC=org/DC=doegrids/OU=Services/CN=collector/collectory1.my.org"/>This shows what WMS Collector the Glidein Frontend will map to. It is the mapped name for the identity of the classad. It also tells what should be the identity used by the frontend itself and the expected identity of the factory. These attributes must be configured correctly, or else the factory will drop requests from the frontend, citing security concerns.
To configure the WMS collector for multiple (secondary) ports, refer to the Advanced Condor Configuration Multiple Collectors document.
The configuration required for the frontend to communicate with the secondary WMS collectors is shown below:<collector DN="/DC=org/DC=doegrids/OU=Services/CN=collector/collectory1.my.org" comment="Primary collector"
-
factory_identity="gfactory@gfactory1.my.org" my_identity="joe@collector1.myorg.com"
node="collector1.myorg.com:primary_port"/>
-
factory_identity="gfactory@gfactory1.my.org" my_identity="joe@collector1.myorg.com"
node="collector1.myorg.com:secondary_port"/>
- <frontend><match><job query_expr="expression" >A Condor constraint to be used with condor_q when looking for user jobs. (like 'JobUniverse=?=5', i.e. consider only Vanilla jobs). If you want to consider all user jobs, this can be set to TRUE.
- <frontend><match><job><schedds><schedd fullname="schedd name" />When you provide the user pool collector to the installer, it will find all the available schedds. You can specify which schedds to monitor here for user jobs. The schedd fullname is the name under which schedd is registered with the user pool collector.
- <frontend><monitor base_dir="web_dir" flot_dir="web_dir" javascriptRRD_dir="web_dir" jquery_dir="web_dir" />The base_dir defines where the web monitoring is. The other entries point to where javascriptRRD, Flot and JQuery libraries are located.
-
<glidein><monitor_footer display_txt="Legal Disclaimer" href_link="/site/disclaimer.html" >
OPTIONAL: If the display text and link are configured, the monitoring pages will display the text/link at the bottom of the page.
- <frontend><security proxy_DN="/DC=org/DC=doegrids/OU=Service/CN=frontend/frontend1.my.org" classad_proxy="proxy_dir" proxy_selection_plugin="ProxyAll" security_name="vofrontend1">
The security section of the configuration determines the proxies that the frontend will use to communicate to the WMS Condor Collector and factory (security parent tag) and which proxies will be used to submit grid jobs to sites (proxy child elements described below).
The <frontend><security> tag determines the proxy and security information used to communicate with the WMS collector and factory in order to submit requests for glideins and find out about useful entry points. It also determines some default security defaults:
- classad_proxy: Location of the proxy to use to communicate with factory. You must specify the full path.
- security_name: signifies the name under which the frontend is registered with the factory
- proxy_DN: This is the DN of the proxy used to communicate with the factory. This must be registered in the condor mapfile of the WMS collector and must match the DN contained in the proxy listed in the classad attribute.
- proxy_selection_plugin: This determines which proxies listed
will be used for jobs. Possible values are:
- ProxyAll: Returns all proxies (defaults)
- ProxyFirst: Always return first proxy
- ProxyUserCardinality: Return the first N proxies where N is the number of users. Useful if first proxies are of higher priority
- ProxyUserRR:Use proxies in round-robin policy
- ProxyUserMapWRecycling: Maps to a pilot proxy not used in the longest time. For existing users, re-use existing mapping unless it has already been reassigned due to inactivity.
- sym_key: Type of symetric key system used for secure message passing (default aes_256_cbc). Currently not used, but there for future functionality.
- <frontend><security><proxies><proxy absfname="proxy" security_class="frontend"/>The location of the proxy the Glidein Frontend forwards to the factory for use in submitting glideins. You can have multiple proxy entries listed here and they are used by all groups. The absfname attribute references the full path of the proxy, and the security class tells the factory which class of users to use to map this to a username.
- <frontend><stage base_dir="web_dir" web_base_url="URL" />The location of the web server directories. It is the staging area where the files needed to run you job are located (i.e. condor). You may need to change this according to your requirements.
- <frontend><work base_dir="directory" />This defines the path to the Glidein Frontend directory.
- <frontend><files><file absfname="filepath" relfname="filename" executable="boolean" />The file parameters are used to specify the location of additional files. The grid mapfile location must be specified. This can be used to add custom scripts to the vo frontend. See below or the page dedicated to writing custom scripts for more information.
- <frontend><attrs><attr name="attr_name" value="value" parameter="True" type="string" glidein_publish="boolean" job_publish="boolean" >The following three attributes are required to be set in the frontend requests and/or published for jobs. Others can be specified as well.
- <attr name="GLIDEIN_Collector" >The contains the name of the pool collector, i.e. mymachine.mydomain.
- <attr name="GLIDEIN_Expose_Grid_Env" >This determines if you want to expose the user to the grid environment.
- <attr name="USE_MATCH_AUTH" >This determines whether or not you want to use match authentication. You specify the match expression in a group section of the config file.
- <attr name="GLIDEIN_Glexec_Use" >This determines whether or not you want to mandate the use of GLEXEC. Possible values are NEVER (do not use GLEXEC) or OPTIONAL (use GLEXEC if the site is configured with it) or REQUIRED (Always use GLEXEC). Mandating the use of GLEXEC also enforces the factory to submit jobs to sites that have GLEXEC configured.
- <attr name="GLIDECLIENT_Rank" >This is a condor expression that allows the ranking and prioritization of jobs. This attribute is propagated to condor_vars.lst.
- <attr name="GLIDECLIENT_Start" >This is a condor expression that determines whether a glidein job will start on a resource. It is used for resource management by Condor.
- <attr name="GLIDEIN_MaxMemMBs_Estimate" >If set to TRUE, glidein will estimate the MEMORY that condor daemon can use based on memory/core or memory/cpu. Estimation only happens if the GLIDEIN_MaxMemMBs_Estimate is not configured by the factory for the entry.
Other attributes can be specified as well. They are used by the VO frontend matchmaking and job matchmaking. The format is similar to the attributes on the Factory config file. The table below describes the <attrs ... > tag in more detail.
Attribute Name |
Attribute Description |
name |
Name of the attribute |
value |
Value of the attribute |
parameter |
Set to True if the attribute should be passed as a parameter. If set to False, the attribute will be put in the staging area to be accessed by the glidein startup scripts. Always set this to True unless you know what you are doing. |
glidein_publish |
If set to True, the attribute will be available in the condor_startd's classad. Used only if parameter is True. |
job_publish |
If set to True, the attribute will be available in the user job's environment. Used only if parameter is True. |
comment |
You can specify description of the attribute here. |
type |
Type of the attribute. Supported types are 'int', 'string' and 'expr'. Type expr is equivalent to condor constant/expression in condor_vars.lst |
An example attribute would be:
The following group parameters are used to configure multiple frontends. If only one group is specified, they apply to all frontends. The objects specified are used for creating and monitoring glideins. Groups are used to group users with similar requirements, such as proxies, criteria for matching job requirements with sites, and configuration of glideins.
- <frontend><groups><group name="name" enabled="boolean" >This specifies the name of the group and whether it is enabled.
- <frontend><groups><config>The parameters listed here define how many glideins to create and maintain, such as idle_glideins_per_entry and running_glideins_per_entry.
- idle_glideins_per_entry: This will limit the number of idle glideins sent to each entry point.
- running_glideins_per_entry: This will limit the number of running glideins sent to each entry point.
- <frontend><groups><match match_expr="expr" start_exp="expr" >The match_expr is Python boolean expression is used to match glideins to jobs. The glidein and job dictionaries are used in the expression, i.e. glidein["attrs"].get("GLIDEIN_Site") in ((job.get("DESIRED_Sites") != None) and job.get("DESIRED_Sites").split(",")). If you want to match all, just specify True. All Condor attributes used in this expression should be listed below in the match_attrs section.
The start_expr is a Condor expression that will be evaluated to match glideins to jobs. This should be a valid Condor ClassAd expression.
Note that there are two expressions. The "match_expr" will be evaluated when matching glidein (pilot) requests to entries. This will determine which sites glideins are submitted to. The "start_expr" is evaluated when jobs are matched to pilots. This will determine, once glidein pilots are up and running, which jobs will run on them. More on match expressions - <frontend><groups><group><match><factory query_expr="expr" >This is a Condor ClassAd expression to select factory entry points. The expression will be evaluated by Condor and should be a valid Condor expression.
One example to select from entries that list CMS as a supported VO could be query_expr='stringListMember("CMS",GLIDEIN_Supported_VOs) && (GLIDEIN_Job_Max_Time=!=UNDEFINED)'
See the condor manual for more on valid expressions and functions.
Note that if you specify a factory constraint in the global default section as well as in the group factory tag (such as <factory query_expr='EXPR1'>... <group> <factory query_expr='EXPR2'> </group>) then the frontend will "AND" the expressions to create a combination (such as, EXPR1 && EXPR2). - <frontend><groups><group><match><factory><match attrs><match attr name="name" >A list of glidein factory attributes used in the factory match expression. Each match attribute should be a Condor classad attribute that is listed in the above match_expr. More on match expressions
- <frontend><groups><group><match><job query_expr="expr" >This is a Condor ClassAd expression to select valid user jobs to find glideins for. It will be evaluated by Condor and should be a valid Condor expression. For instance, query_expr="(JobUniverse==5)" will select only vanilla universe jobs.
More on match expressions - <frontend><groups><group><match><job><match attrs><match attr name="name" >A list of glidein factory attributes used in the factory match expression. Each match attribute should be a Condor classad attribute that is listed in the above match_expr. More on match expressions
- <frontend><groups><group><security><proxies><proxy absfname="proxy" security_class="classname" >Group-specific proxies for submitting glideins can be defined that will be passed to the factory in addition to any globally defined proxies. See above for more information about the proxy tags.
- <collectors groups="list of collector groups"><collector DN="certificate DN" node="User Pool Collector" secondary="True/False" group="Group Name">List of user pool collector(s) where condor_startd started by glideins should report to as an available resource. This is where the user jobs will be matched with glidein resources.
For scalability purposes, multiple (secondary) user pool collectors can be used. Refer to the Advanced Condor Configuration Multiple Collectors document for details for the Condor configuration.
The configuration required for the frontend to communicate with the secondary user pool collectors is shown below:<collectors>> <collector DN="certificate DN" secondary="False" node="User Pool Collector:primary_port"/> <collector DN="certificate DN" secondary="True" node="User Pool Collector:secondary_port"/> If the secondary port numbers were assigned a contiguous range, only one additional collector is required specifying the port range (e.g.- 9640:9645) otherwise you will need a collector element for each secondary port.
For redundancy purposes, the user pool can be configured using the Condor High Availability feature. Refer to the Advanced Condor Configuration Multiple Collectors document for details for the Condor configuration. Support to utilize this Condor feature is available in v2.6+.
The requires listing all the condor collectors, primary as well as secondary, grouped by group name. Condor daemons started by the glideins will report back to one of the secondary collectors in each group. If a group has no secondary collector, the primary collector is selected for that group. The collector element groups attribute defines the set of collectors the frontend will use.<collectors groups="default,1">
-
<collector DN="certificate DN" secondary="False"
node="User Pool Collector 1:primary_port" group="default"/>
<collector DN="certificate DN" secondary="True"
node="User Pool Collector 1:secondary_port" group="default"/>
<collector DN="certificate DN" secondary="False"
node="User Pool Collector 2:primary_port" group="1"/>
<collector DN="certificate DN" secondary="True"
node="User Pool Collector 2:secondary_port" group="1"/>
If the secondary port numbers were assigned a contiguous range, only one additional collector
is required specifying the port range (e.g.- 9640:9645) otherwise you will need a collector
element for each secondary port.
Adding Custom Code/Scripts to Glidein Frontend Glideins
You can add custom scripts to glideins created for this Glidein Frontend by adding scripts and files to the configuration in the files section:[<groups><group>]
<files>
<file absfname="script name" executable="True" comment="comment"/>
The script will be copied to the Web-accessible area and added to the glidein's file_list, and when a glidein starts, the glidein startup script will pull it and execute it. If any parameters are needed, they can be specified using <attr />.
Files will be in the "main" sub directory for factory files and the "client" sub directory for frontend files.
For more detailed information, see the page dedicated to writing custom scripts.
You can also create wrapper scripts or tar-balls of files, see the factory entry page for syntax. (Use groups/group tags instead of the factory's entry tag).
Match and Match Attributes
Several sections in the configuration allow a match expression.
Each of these sections allows an expression to be evaluated to
determine where glideins and jobs should be matched.
For example, expressions allowing a white list by the frontend
can be created in order to control where the glideins are submitted.
It can also allow you to give a Condor expression
to specify where jobs can run or to specify
which glidein_sites can run jobs.
There are two ways to restrict matching in most cases.
Note that match_expr clauses, such as <match match_expr>
will use python based expressions as explained below. Others,
such as <factory query_expr> and <job query_expr>
use Condor ClassAd expressions. For these, only valid Condor
expressions can be used. Python expressions can not be
evaluated in these contexts.
Note that, for some tags (like
factory query_expr), you can specify expressions in both the
default global section as well as in individual group sections.
You should take special care before doing this to make sure the
expressions are correct, as the expreesions are typically "AND"-ed
together.
Each match expression is a python expression that will be evaluated. Matches can be scoped to either global scope (<frontend><match>) or to a group specific scope.
Each python expression will typical be a series of boolean tests, surrounded by parantheses and connected by the boolean expressions "and", "or", and "not". You can use several dictionaries in these match expressions. The "job" dictionary contains the classad of the job being matched, and the "glidein" dictionary contains information about the factory (entry point) classad. There is also a "attr_dict" dictionary that can reference attributes defined in the frontend.xml. While an extensive list of everything you can in these expressions is out of scope, some examples are below:
- (job.has_key("ImageSize")): Returns true if the job classad has the attribute "ImageSize".
- (job["NumJobStarts"]>5): Returns true if the job classad attribute "NumJobStarts" is greater than 5.
- (glidein["attrs"].has_key("GLIDEIN_Retire_Time")): Returns true if the factory entry classad has the attribute "GLIDEIN_Retire_Time".
- (glidein["attrs"]["GLIDEIN_Retire_Time"]>21600): Returns true if the factory entry classad's "GLIDEIN_Retire_Time" is greater than 21600.
- (attr_dict["NUM_USERS_ALLOWED"]>0): Returns true if there is a attribute in the frontend.xml with NUM_USERS_ALLOWED that is greater than zero.
- (job["Owner"] in attr_dict["ITB_Users"].split(",")): Returns true if the job classad attribute "Owner" is in the comma-delimited string attribute ITB_USERS (which would be defined in frontend.xml)
Each attribute used in a match expression should be declared in a subsequent match_attrs section. This makes classad variables available to the match expression. Attributes can be made available from the:
- Factory classad: (<match><factory><match_attr>)
- Job classad: (<match><job><match_attr>)
Each match_attr tag must contain a name.
This is the name of the attribute
in the appropriate classad.
It must also contain a type which can be
one of the following:
- string: A constant string of letters, numbers, or characters.
- int: An integer: a positive or negative number, or zero.
- real: A real number that could have decimal places
- bool: It can by "True" or "False"
- Expr: A ClassAd expression
Example
<factory query_expr="(GLIDEIN_Site=!=UNDEFINED)">
<match_attrs> <match_attr name="GLIDEIN_Site" type="string"/> </match_attrs>
<collectors> </collectors>
</factory>
<job query_expr="(DESIRED_Sites=!=UNDEFINED)">
<match_attrs> <match_attr name="DESIRED_Sites" type="string"/> </match_attrs>
<schedds> </schedds>
</job>
</match>
Example
Glideins can also use "start_expr" to make sure the correct jobs start on pilots. This is a Condor expression run on the pilot startd daemon. Here is an example:...
< /match >
Using Multiple Proxies
Why would you want to use a pool of pilot proxies instead of a single one?
If your VO maps to a single group account at the remote grid sites, you wouldn't. A pool of pilot proxies (try saying that 5 times fast) does not gain you anything. If your VO maps to a pool of accounts at remote grid sites, you should consider using a pool of proxies equivalent to the number of users you have. Why?
Consider the following scenario: Alice, Bob, and Charlie are all in the FUNGUS experiment and form a VO. They are using glideinWMS. Alice sends 1000 jobs to FNAL via their glideinWMS using a single pilot proxy. The pilots map to ther userid fungus01 at FNAL, and in accordance with the batch system's fairshare policies, the job priority for user fungus01 is decreased significantly.
Bob comes along and submits 1000 jobs via glideinWMS, while Charlie submits 1000 jobs under his own proxy and not using glideinWMS. The glideinWMS pilots launch for Bob, and map to fungus01. Charlie launches his own jobs that get mapped to fungus73. Relative to fungus02, fungus01 priority is terrible, and Bob's jobs sit around waiting for Charlie -- even though Bob didn't occupy the FNAL resources, Alice did!
The solution: have a pool of pilot proxies. We then spread the fairshare penalty amongst fungus01, fungus02, and fungus03, and Bob now can compete on a more equal footing with Charlie.
Using multiple proxies
Proxies can be specified in the <security><proxies><proxy> tags. Multiple proxy tags can be entered, one for each proxy file. These can found in the security section at the top of the xml, in which case, the proxies are shared for all securitty groups. They can also be found within <group> tags, in which case they are used only by that security group.
One example follows:
<security>
<proxies><security>
<proxy absfname="/home/frontend/.globus/x509_pilot05_cms_prio.proxy" security_class="cmsprio"/><proxies>
<proxy absfname="/home/frontend/.globus/x509_pilot06_cms_prio.proxy" security_class="cmsprio"/>
<proxy absfname="/home/frontend/.globus/x509_pilot07_cms_prio.proxy" security_class="cmsprio"/>
<proxy absfname="/home/frontend/.globus/x509_pilot08_cms_prio.proxy" security_class="cmsprio"/>
<proxy absfname="/home/frontend/.globus/x509_pilot09_cms_prio.proxy" security_class="cmsprio"/>
Starting a Glidein Frontend Daemon
Once you have the desired configuration file, move to the VO frontend directory and launch the command:
./frontend_startup start
All the activity messages will go into
group_*/log/frontend_info.<date>.log
while the warnings go into
group_*/log/frontend_err.<date>.log