Index

IMPORTANT NOTE: starting with version wrapper.sh-0.9.8, this documentation is moved to the wrapper package itself. This section is therefore not under maintenance.

New wrapper
    Motivation
    Wrapper.sh
    Plug-ins architecture

New wrapper

Motivation

The first piece of code that PanDA system submits to sites by different job submission mechanisms is called "pilot wrapper". This is the first code that executes on the worker node, performs some environment checks, and downloads from a valid URL the following piece of code to continue operations, called in the PanDA nomenclature as "pilot".

This "pilot wrapper" is not unique. There are a multiplicity of versions for this part of the system, depending on the final pilot type, and the grid flavor, for example.

This multiplicity forces to maintain several pieces of software even though they have a common purpose.

On the another hand, for practical reasons, these pilot wrappers are implemented in BASH language, with the consequent lack of flexibility and inherent difficulties to implemented complicated operations. One practical case is the need to generate weighted random numbers to pick up an specific development version of the ATLAS code only a given percentage of the times. This weighted random numbers generation is more complicated in BASH language.

Finally, the new AutoPyFactory pilot submission mechanism was introduced in the scenario. This new pilot submission tool was implemented in its first version to submit a specific ad-hoc pilot wrapper, with a different set of input options and with different formats. Moreover, this specific pilot wrapper is only valid for ATLAS in EGEE, being invalid for other purposes or in OSG sites. This discrepancy adds to the multiplicity of pilot wrapper versions, and introduces difficulties for its deployment as a submission tool to replace the already existing AutoPilot.

A final reason is that these wrappers require some improvements. One example is the absence of proper validation on the number and format of the input options. Given these improvements are important, it will be always easier to introduce and maintain them in a single piece of code than in several.

For these reasons it was agreed that a refactoring of the different pilot wrappers was needed. The proposal is to create a single pilot wrapper implemented in BASH language, performing the minimum amount of checking operations. This unique code should be valid for any kind of final application, grid flavor environment, submission tool, etc. In particular, it will allow the easy deployment of AutoPyFactory as pilot submission tool.

After checking the presence of required programs needed to continue with operations, and setting up the corresponding grid environment if needed, a second piece of code will be downloaded from a valid URL to continue operations. This second code will now be written in Python, which allows for more complex operations implemented in an easier manner. Therefore, its maintainability and scalability will be improved. This will require the reimplementation of all BASH code from the multiple pilot wrappers, except those operations already done by the new unified wrapper, in Python. Finally, in this second step, the final payload code to be run will be chosen, downloaded, and executed.

Wrapper.sh

A generic wrapper with minimal functionalities

input options:
--wrappervo=vo the Virtual Organization.
--wrapperwmsqueue=wmsqueue is the wms queue (e.g. the panda siteid)
--wrapperbatchqueue=batchqueue is the batch queue (e.g. the panda queue)
--wrappergrid=grid_middleware is the grid flavor, i.e. OSG or EGI.
The reason to include it as an input option, instead of letting the wrapper to discover by itself the current platform is to be able to distinguish between these two scenarios:
(a) running on local cluster
(b) running on grid, but the setup file is missing

(b) is a failure and should be reported, whereas (a) is fine.

A reason to include wrappergrid as an option in this very first wrapper is that for sites running condor as local batch system, the $PATH environment variable is setup only after sourcing the OSG setup file. And only with $PATH properly setup is possible to perform actions as curl/wget to download the rest of files, or python to execute them.
--wrapperpurpose=purpose will be the VO in almost all cases, but not necessarily when several groups share the same VO.
An example is VO OSG, shared by CHARMM, Daya, OSG ITB testing group...
--wrapperserverurl=url is the url with the PanDA server instance
--wrappertarballurl=url is the base url with the wrapper tarball to be downloaded
--wrapperspecialcmd=special_cmd is special command to be performed, for some specific reason, just after sourcing the Grid environment, but before doing anything else.
This has been triggered by the need to execute command $ module load <module_name> at NERSC after sourcing the OSG grid environment.
--wrapperplugin=plugin is the plug-in module with the code corresponding to the final wrapper flavor.
--wrapperpilottype=pilot_type is the actual pilot code to be executed at the end.
--wrapperloglevel=log_level is a flag to activate high verbosity mode. Accepted values are debug or info.
--wrappermode=mode allows performing all steps but querying and running a real job.


Note: before the input options are parsed, they must be re-tokenized so whitespaces as part of the value (i.e. --wrapperspecialcmd='module load osg') create no confussion and are not taken as they are splitting different input options.

The format in the condor submission file (or JDL) to address the multi-words values is:

arguments = "--in1=val1 ... --inN=valN --cmd=""module load osg"""

This first wrapper perform basic actions:
(1) check the environment, and the availability of basic programs

(2) downloads a first tarball with python code as passes all input options to this code. With passed options, the python code will download a second tarball with the final pilot code.

Plug-ins architecture

This is the suggested architecture:

         AutoPyFactory ---> wrapper.sh ---> wrapper.py

wrapper.sh downloads a tarball (wrapper.tar.gz), untars it, and invoked wrapper.py. The content of the tarball is something like this

       - wrapper.py
- wrapperutils.py
- lookuptable.conf
- plugins/base.py
- plugins/<pilottype1>.py
- plugins/<pilottypeN>.py

The different plug-ins corresponds with the different wrapper flavors, so far written in BASH. For example, trivialWrapper.sh, atlasProdPilotWrapper.sh, atlasProdPilotWrapperCERN.sh, atlasProdPilotWrapperUS.sh, etc.) All of these wrappers share a lot of common functionalities, with only small differences between them. To take advantage from that, the different wrapper flavors will be implmented as plug-ins.

Look-up table

The current mechanism to choose the right plugin is implemented by inspecting a lookup table like this one:

# -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
#   VO          PURPOSE           GRID            WMSQUEUE              BATCHQUEUE                            PLUGIN                  PILOTCODE                 PILOTCODEURL 
# -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
                                          
# --- default values ---
    ATLAS       *                 *               +                     +                                     atlasprodpilot         pilotcode,pilotcode-rc      http://pandaserver.cern.ch:25080/cache/pilot
    ATLAS       devel             *               +                     +                                     atlasprodpilotdev      pilotcode-dev               http://project-atlas-gmsb.web.cern.ch/project-atlas-gmsb
    OSG         *                 OSG             +                     +                                     trivial                trivialPilot                http://pandaserver.cern.ch:25080/cache/pilot

# --- testing sites ---                       
    OSG         *                 *               TEST2                 TEST2                                 trivial                 trivialPilot               http://pandaserver.cern.ch:25080/cache/pilot  
    OSG         *                 *               TEST3                 TEST3                                 trivial                 trivialPilot               http://pandaserver.cern.ch:25080/cache/pilot 
    ATLAS       *                 *               ANALY_TEST-APF        ANALY_TEST-APF-condor                 atlasprodpilot          pilotcode,pilotcode-rc     http://pandaserver.cern.ch:25080/cache/pilot   
    ATLAS       *                 *               ANALY_TEST-APF2       ANALY_TEST-APF2-condor                atlasprodpilot          pilotcode,pilotcode-rc     http://pandaserver.cern.ch:25080/cache/pilot  
    ATLAS       *                 *               BNL_TEST_APF          BNL_TEST_APF-condor                   atlasprodpilot          pilotcode,pilotcode-rc     http://pandaserver.cern.ch:25080/cache/pilot 

 + means any value is accepted, but one must be provided
 * means any value is accepted, or no value was provided
 - means no value was provided

if no value was provided for a given field:
1) first '-' will be used
2) if the field is not '-', then '*' will be checked

if a value was provided for a given field:
1) the value is searched
2) if the value is not in the field, the '+' will be checked
3) finally, '*" will be checked

Each column has the same value for every row.

The first N columns are the patterns, and the rest are outputs. That means that a number N of input values will be provided each time, the row maching with those inputs is selected, and the output values will be returned.

Columns will be parsed for matching from left to right. This means the first column is the most important field for matching, the second column is the next most important field, and so on.

When one of the provided inputs matches exactly the content of the field in the table, that row is selected. In the given input is not provided, or it is not in the table, then fields with symbols are inspected.

If no row matches completely with the input values, then None should be returned.