5.3 The Job Object

A job is deployed to the Orchestrate Server to automate processes, such as coordinating VM provisioning, high-performance computing, or general data center management. Jobs consist of Job Development Language (JDL) scripts(s) and might have one or more policies associated with them. Policies define job arguments and other facts that are used by the job.

Usually a Job has logic that runs on the PlateSpin Orchestrate Server itself and schedules work to run on one or more managed resources that are running the PlateSpin Orchestrate Agent. The logic that is dispatched and run on the managed resources is called a joblet. A job may or may not define one or more joblets.

A JDL script is partitioned into a “Job” section and one or more “Joblet” sections. The joblet sections of the script describe most of the work of a job. The PlateSpin Orchestrate Server dispatches joblets to resources in the grid where the work is done.

The Job object also contains Facts with attributes that are used for job and joblet control. Policies associated with the job also control the job. The Orchestrate Development Client has an administrative (“admin”) view in the Explorer Panel that lets you edit these objects.

This section includes information about a Job object that is visible in the Explorer view and the accompanying Admin view of the Orchestrate Development Client:

5.3.1 Job Groups

Any group object displayed in the Explorer panel represents a collection of similar object types. Groups can also be created automatically, as in the case when a provisioning adapter (PA) discovers a local repository on a VM host. For example, the xen30 PA, upon discovery of a VM host, automatically creates a local repository for that VM host and places the created repository in a xen30 repository group. You can also create groups manually in the Development Client, either by clicking the Actions menu and choosing Create Job Group or by right clicking a Job Group object (anywhere in the Job hierarchy) and selecting New Job Group.

5.3.2 The Job Info/Groups Tab

The page that opens under the Info/Configuration tab of the Job admin view includes several collapsible sections on the page where you can configure the general information and attributes of the job.

NOTE:Whenever you make changes to any Grid object, that object’s icon is overlaid with the write icon , signifying that the object has been altered. If you want to save the changes you have made, you need to click the Save icon on the Development Client toolbar.

Info

The following fields on the Information panel provide facts for the Repository object:

Show Inherited Fact Values Check Box

Select this check box to show facts with overridden values supplied through attached and/or inherited polices. Such fact values are read only (non-editable).

Job Control Settings

The Job Control Settings panel on the Info/Groups page includes the following fields:

NOTE:Tool tip text is available when you mouse over any of these fields.

Description: Enter information in this box that describes the nature or purpose of this job.

In the Fact Editor, this fact is listed as job.description:

<fact name="job.description" value="" type="String" />

Enabled: This check box is selected by default. When it is selected (that is, its value is “true”), the job is enabled (that is, it is ready to run).

In the Fact Editor, this fact is listed as job.enabled:

<fact name="job.enabled" value="true" type="Boolean" />

Job Visible to Users: This check box is selected by default. When it is selected (that is, its value is “true”), the job can be viewed in the Development Client, by using command line queries, or in the Orchestrate Server Portal. Deselecting this check box does not keep the job from running.

In the Fact Editor, this is fact is listed as job.visible:

<fact name="job.visible" value="true" type="Boolean" />

JDL Debug Tracing: This check box is not selected by default. When it is selected (that is, its value is “true”), the job log includes tracing information when job events are executed.

In the Fact Editor, this fact is listed as job.tracing:

<fact name="job.tracing" value="false" type="Boolean" />

Job Type: This drop-down list lets you choose the job type that applies to this job. This setting is optional and is leveraged by the server to provide better quality completion time calculation for the job.

The job type options (completion time algorithms) include:

  • normal: The default job type. If this job has joblets, the job is based on PSPACE progression algorithm. If it does not have joblets, it is based on historical wall time average.

  • workflow: This job type does not offer a time algorithm to the server.

  • pspace: If this job has joblets, the job is based on PSPACE progression. If it does not have joblets, do not offer a time algorithm.

  • fixedtime: This job type directs the server to use a time algorithm based on historical wall time average.

  • fixedgcycles: If this job has joblets, the job is based on average gcycles and current consumption rate. If it does not have joblets, the job is based on historical wall time average.

NOTE:You can change this setting at runtime to refine the calculation time as the job progresses. For example, the zosmake job might start out as type normal, but when all tasks have been submitted, you could change it to type workflow to allow its subjobs to drive the end time.

In the Fact Editor, the Job Type fact is listed as job.jobtype:

<fact name="job.jobtype" value="normal" type="String" />

Job Timeout: Enter the amount of time (in seconds) after which the server can take action to cancel the whole job, including all joblets and subjobs. A value of -1 indicates no timeout.

In the Fact Editor, this fact is listed as job.timeout:

<fact name="job.timeout" value="-1" type="Integer" />

Job Auto Terminate: This check box is selected by default. When it is selected (that is, its value is “true”), the job ends when all child jobs and joblets are executed.

In the Fact Editor, this fact is listed as job.autoterminate.

<fact name="job.autoterminate" value="true" type="Boolean" />

Queue Type: This drop-down list lets you choose the queue type that applies to this job. This setting is optional and is leveraged by the server to provide a better quality start time calculation for the job.

The queue type options (start time algorithms) include:

  • none: The start time is always unknown for jobs that are queued.

  • pfifo: (Packet First In First Out) The start time implemented through policies. The server is directed to look at the job as having a finite number of active slots, so its start time depends on its position in the queue and the estimated end time of running jobs of this type. The FIFO queue for this queue reshuffles based on priority.

  • fifo: (First In First Out) The start time implemented through policies. The server is directed to look at the job as having a finite number of active slots, so its start time depends on its position in the queue (first-come, first-served) and the estimated end time of running jobs of this type. The FIFO queue for this job does not reshuffle based on priority.

  • lifo: (Last In First Out) The start time implemented through policies. The server is directed to look at the job as having a finite number of active slots, so its start time depends on its position in the queue and the estimated end time of running jobs of this type. The queue for this job does not reshuffle based on priority.

  • fixedtime: The start time is based on the historical average queue time. This can be explicitly overridden through setting the job.history.queuetime.average fact.

In the Fact Editor, this fact is listed as job.queuetype:

<fact name="job.queuetype" value="pfifo" type="String" />

Job Queued Timeout: Enter the amount of time (in seconds) after which the server can take action to cancel a queued job, including all joblets and subjobs. A value of -1 indicates no timeout.

In the Fact Editor, this fact is listed as job.queuedtimeout:

<fact name="job.queuedtimeout" value="-1" type="Integer" />

Resource Match Cache TTL: This value specifies the job’s willingness to allow resource matches to be cached if the Job Scheduler becomes too loaded. The value is the time (in seconds) to live (TTL) of the cache. Enter a value less zero (<0) to disable caching.

In the Fact Editor, this fact is listed as jopb.cacheresourcematches.ttl:

<fact name="job.cacheresourcematches.ttl" value="30" type="Integer" />

Preemptible: This check box is not selected by default. When it is selected (that is, its value is “true”), you set the job’s ability to be preempted. This setting can be overridden by the job instance.

In the Fact Editor, this fact is listed as job.preemptible:

<fact name="job.preemptible" value="false" type="Boolean" />

Restartable: This check box is not selected by default. When it is selected (that is, its value is “true”), you set the job’s ability to be restarted when the server restarts. This setting can be overridden by the job instance.

In the Fact Editor, this fact is listed as job.restartable:

<fact name="job.restartable" value="false" type="Boolean" />

Absolute Max Joblets: This value specifies the absolute maximum number of joblets that you want this job to schedule.

In the Fact Editor, this fact is listed as job.joblet.max:

<fact name="job.joblet.max" value="1000" type="Integer" />

Max Joblet Failures: This value specifies the number of non-fatal joblet errors that you want this job to tolerate before the job fails completely. Set the value at -1 to attempt to continue after errors.

In the Fact Editor, this fact is listed as job.joblet.maxfailures:

<fact name="job.joblet.maxfailures" value="0" type="Integer" />

Max Node Failures: This value specifies the number resource failures that you want this job to tolerate before the node is excluded from further joblet processing. Set the value at -1 to specify that limited failures are acceptable.

In the Fact Editor, this fact is listed as job.maxnodefailures:

<fact name="job.maxnodefailures" value="2" type="Integer" />

Max Resources: This value specifies the absolute maximum number of resources that you want the job to use at one time. PlateSpin Orchestrate does not exceed the value set here. Set the value at -1 to specify unlimited resources.

In the Fact Editor, this fact is listed as job.maxresources:

<fact name="job.maxresources" value="-1" type="Integer" />

Max Joblets Running: This value specifies the absolute maximum number of joblets that you want the job to have running at one time. PlateSpin Orchestrate does not exceed the value set here. Set the value at -1 to specify unlimited joblets.

In the Fact Editor, this fact is listed as job.joblet.maxrunning:

<fact name="job.joblet.maxrunning" value="-1" type="Integer" />

Max Joblets Per Resource: This value specifies the absolute maximum number of joblets that you want the job to occupy on a resource. Set the value at -1 to specify unlimited joblets.

In the Fact Editor, this fact is listed as job.joblet.maxperresource:

<fact name="job.joblet.maxperresource" value="-1" type="Integer" />

Resource Selection Ranking: This field displays ranking specification used to select suitable resources. Element syntax is fact/order where order is either ascending or descending

In the Fact Editor, this fact is listed as an array:

<fact name="job.resources.rankby">
  <array>
    <string>resource.loadaverage/a</string>
    <string>resource.anything/a</string>
  </array>
</fact>

You can edit this array by clicking the button to open the Attribute element values dialog box. In this dialog box you can add or remove fact specifications to the array of element choices.

Persist Facts on Completion: This check box is not selected by default. When it is selected (that is, its value is “true”), you specify that the Grid objects that this job modifies are persisted at the end of the job. This setting is available and applicable only in a high availability setup.

In the Fact Editor, this fact is listed as job.persistfactsonfinish:

<fact name="job.persistfactsonfinish" value="false" type="Boolean" />
Joblet Control Settings

Joblet Timeout: This value specifies the amount of time (in seconds) you want the Orchestrate Server to wait until cancelling the joblet. Set the value at -1 to specify no timeout.

In the Fact Editor, this fact is listed as job.joblet.timeout:

<fact name="job.joblet.timeout" value="-1" type="Integer" />

Max Joblet Retries: This value specifies the number of joblet retries (of any type) to be attempted before the Orchestrate Server considers the joblet as failed. A value of zero (0) specifies that the joblet should not be retried. A value of less than zero (<0) specifies the joblet should be continually retried.

In the Fact Editor, this fact is listed as job.joblet.maxretry:

<fact name="job.joblet.maxretry" value="0" type="Integer" />

Retry Limit (Forced): This value specifies the number of forced joblet retries (that is, requested by the joblet to run on another resource) to be allowed before the Orchestrate Server considers the joblet as failed. A value of zero (0) specifies that the joblet should not be retried. A value of less than zero (<0) specifies the joblet should be continually retried. This value should never exceed the value in job.joblet.maxretry.

In the Fact Editor, this fact is listed as job.joblet.retrylimit.forced:

<fact name="job.joblet.retrylimit.forced" value="-1" type="Integer" />

Retry Limit (Unforced): This value specifies the number of unforced joblet retries to be allowed before the Orchestrate Server considers the joblet as failed. A value of zero (0) specifies that the joblet should not be retried. A value of less than zero (<0) specifies the joblet should be continually retried. This value should never exceed the value in job.joblet.maxretry.

In the Fact Editor, this fact is listed as job.joblet.retrylimit.unforced:

<fact name="job.joblet.retrylimit.unforced" value="-1" type="Integer" />

Retry Limit (Resource Disconnect): This value specifies the number of joblet retries caused by unexpected resource disconnect to be allowed before the Orchestrate Server considers the joblet as failed. A value of zero (0) specifies that the joblet should not be retried. A value of less than zero (<0) specifies the joblet should be continually retried. This value should never exceed the value in job.joblet.maxretry.

In the Fact Editor, this fact is listed as job.joblet.retrylimit.disconnect:

<fact name="job.joblet.retrylimit.disconnect" value="-1" type="Integer" />

Retry Limit (Timeout): This value specifies the number of joblet retries caused by server-initiated joblet timeout to be allowed before the Orchestrate Server considers the joblet as failed. A value of zero (0) specifies that the joblet should not be retried. A value of less than zero (<0) specifies the joblet should be continually retried. This value should never exceed the value in job.joblet.maxretry.

In the Fact Editor, this fact is listed as job.joblet.retrylimit.timeout:

<fact name="job.joblet.retrylimit.timeout" value="-1" type="Integer" />

Immediately Retry Failed Joblet: This check box is not selected by default. When it is selected (that is, its value is “true”), you specify that you want the system to immediately retry a joblet rather than waiting until all others are either running or complete before retrying.

In the Fact Editor, this fact is listed as job.joblet.immediateretry:

<fact name="job.joblet.immediateretry" value="true" type="Boolean" />

Max Joblet Wait Time: This value specifies the amount of time (in seconds) you want a resource to wait before being utilized by a joblet. A setting of -1 indicates no timeout.

In the Fact Editor, this fact is listed as job.joblet.maxwaittime:

<fact name="job.joblet.maxwaittime" value="-1" type="Integer" />

Joblet JDL Debug Tracing: This check box is not selected by default. When it is selected (that is, its value is “true”), you specify that you want the joblet to include tracing information on the job log as it executes joblet events.

In the Fact Editor, this fact is listed as job.joblet.tracing:

<fact name="job.joblet.tracing" value="false" type="Boolean" />

Joblet Run Type: From the drop-down list, you can select whether or not the file and executable operations that run in the joblet are in behalf of the job user.

  • RunAsJobUserFallingB ackToNodeUser: (The default setting.) If this option is selected, any joblet logic executes as the local user with the same name as the grid user. If a local user of a matching name is not available, the joblet logic runs as the same user who is running the Orchestrate Agent (also known as the “Node User”). By default, the agent (Node User) is root.

  • RunOnlyAsJobUser: If this option is selected, any joblet logic executes as the local user using the same name as the grid user (that is, the Orchestrate Server user who matches the PlateSpin Orchestrate username. If a local user of a matching name is not available, the joblet logic (and perhaps the job) fails. By default, the agent (Node User) is root.

  • RunOnlyAsNodeUser: If this option is selected, any joblet logic runs as the same user who is running the Orchestrate Agent (also known as the “Node User”). It does not run as the OS user whose username matches the PlateSpin Orchestrate user name. By default, the agent (Node User) is root.

In the Fact Editor, this fact is listed as job.joblet.runtype:

<fact name="job.joblet.runtype" value="RunAsJobUserFallingBackToNodeUser" type="String" />
Automatic Resource Provisioning Settings

Max Resource Provisions: This value specifies the number of resources that can be automatically provisioned in behalf of this job. A setting of zero (0) turns off automatic provisioning behavior. A setting of -1 allows unlimited provisioning.

In the Fact Editor, this fact is listed as job.provision.maxcount:

<fact name="job.provision.maxcount" value="0" type="Integer" />

Max Pending Provisions: This value specifies the number of resources that can be automatically provisioned at one time (that is, simultaneously) in behalf of this job. A setting of less than or equal to zero (<=0) turns off automatic provisioning behavior.

In the Fact Editor, this fact is listed as job.provision.maxpending:

<fact name="job.provision.maxpending" value="1" type="Integer" />

Max Resource Provision Failures: This value specifies the maximum number of provision failures resources to be tolerated before excluding the node from future automatic provisioning. A setting of -1 indicates that unlimited failures are acceptable.

In the Fact Editor, this fact is listed as job.provision.maxnodefailures:

<fact name="job.provision.maxnodefailures" value="1" type="Integer" />

Provision Selection Ranking: This field displays ranking specification used to select suitable resources to automatically provision. Element syntax is fact/order where order is either ascending or descending.

In the Fact Editor, this fact is listed as an array:

<fact name="job.provision.rankby">
  <array type="String">
  </array>
</fact>

You can edit this array by clicking the button to open the Attribute element values dialog box. In this dialog box you can add or remove fact specifications to the array of element choices.

Host Selection Strategy: This drop-down list lets you choose the type of strategy you want to use in finding a host for any automatically provisioned resource. The choices include:

  • queue: The queue option directs the server to use the default affinity wait period defined by the resource before considering all possible hosts. The request is queued until a suitable resource becomes available or a requesting job completes.

  • immediate: The immediate option directs the server to immediately consider the affinity host before trying to find any matching resources and to fail if a suitable resource is not available.

In the Fact Editor, this fact is listed as job.provision.hostselection:

<fact name="job.provision.hostselection" value="immediate" type="String" />
Resource Preemption Settings

Job Selection Ranking: This field displays ranking specification used to select suitable jobs to automatically preempt on a resource. Element syntax is fact/order where order is either ascending or descending.

In the Fact Editor, this fact is listed as an array:

<fact name="job.preemption.rankby">
  <array>
    <string>jobinstance.priority/a</string>
    <string>jobinstance.joblets.running/d</string>
  </array>
</fact>

You can edit this array by clicking the button to open the Attribute element values dialog box. In this dialog box you can add or remove fact specifications to the array of element choices.

Job Counts

Total Instances: This field displays the total number of job instances of this type that exist in the PlateSpin Orchestrate system.

In the Fact Editor, this fact is listed as job.instances.total:

<fact name="job.instances.total" value="0" type="Integer" />

Active Instances: This field displays the total number of job instances of this type that are in a queued state in the PlateSpin Orchestrate system.

In the Fact Editor, this fact is listed as job.instances.active:

<fact name="job.instances.active" value="0" type="Integer" />

Queued Instances: This field displays the total number of job instances of this type that are active in the PlateSpin Orchestrate system.

In the Fact Editor, this fact is listed as job.instances.queued:

<fact name="job.instances.queued" value="0" type="Integer" />

Job Accounting Group: This drop-down list lets you select the Job Group whose statistics are updated by default when the job runs.

In the Fact Editor, this fact is listed as job.accountinggroup:

<fact name="job.accountinggroup" value="all" type="String" />

Job Resource Group: This drop-down list lets you select the default Resource Group whose members and any of its resource policies are selected for this job.

In the Fact Editor, this fact is listed as job.resourcegroup:

<fact name="job.resourcegroup" value="all" type="String" />
Job History

Shared Instance Count: (Read only) This field displays the total number of job instances (including those denied by “accept” constraints) of this job that have ever been initiated on this PlateSpin Orchestrate system.

In the Fact Editor, this fact is listed as job.history.jobcount:

<fact name="job.history.jobcount" value="0" type="Integer" />

Completed Count: (Read only) This field displays the total number of job instances (including those denied by “accept” constraints) of this job that have been canceled.

In the Fact Editor, this fact is listed as job.history.jobcount.complete:

<fact name="job.history.jobcount.complete" value="0" type="Integer" />

Cancelled Count: (Read only) This field displays the total number of job instances (including those denied by “accept” constraints) of this job that have been completed.

In the Fact Editor, this fact is listed as job.history.jobcount.cancelled:

<fact name="job.history.jobcount.cancelled" value="0" type="Integer" />

Failed Count: (Read only) This field displays the total number of job instances of this type that have failed.

In the Fact Editor, this fact is listed as job.history.jobcount.failed:

<fact name="job.history.jobcount.failed" value="0" type="Integer" />

Total Cost: This field displays the total cost of running this job. The amount is calculated since the job was deployed or last modified.

In the Fact Editor, this fact is listed as job.history.cost.total:

<fact name="job.history.cost.total" value="0.0000" type="Real" />

Average Cost: This field displays the average cost of running this job. The amount is calculated since the job was deployed or last modified and is updated only if the job finishes successfully.

In the Fact Editor, this fact is listed as job.history.cost.average:

<fact name="job.history.gcycles.average" value="0" type="Integer" />

Total Runtime: This field displays the total runtime (in seconds) since the job was deployed.

In the Fact Editor, this fact is listed as job.history.runtime.total:

<fact name="job.history.runtime.total" value="0" type="Integer" />

Average Runtime: This field displays the average runtime (in seconds) since the job was deployed.

In the Fact Editor, this fact is listed as job.history.runtime.average:

<fact name="job.history.runtime.average" value="0" type="Integer" />

Total Execution Time: This field displays the total combined resource wall time (in seconds) of all work performed on behalf of this job since the job was deployed.

In the Fact Editor, this fact is listed as job.history.time.total:

<fact name="job.history.time.total" value="0" type="Integer" />

Average Execution Time: This field displays the average resource wall time (in seconds) of all work performed on behalf of this job since the job was deployed.

In the Fact Editor, this fact is listed as job.history.time.average:

<fact name="job.history.time.average" value="0" type="Integer" />

Total Grid Time: This field displays the total amount of normalized grid time (in gcycles) consumed by this job since deployment.

In the Fact Editor, this fact is listed as job.history.gcycles.total:

<fact name="job.history.gcycles.total" value="0" type="Integer" />

NOTE:A gcycle can be thought of as a normalized second of compute time. It is really a relative measure and approximates to a second of real processing time of a 2Ghz Pentium* class Intel* processor.

Average Grid Time: This field displays the average amount of normalized grid time (in gcycles, which is a normalized grid cycle) consumed by running this job. The value is updated only if the job finishes successfully.

In the Fact Editor, this fact is listed as job.history.gcycles.average:

<fact name="job.history.gcycles.average" value="0" type="Integer" />

Total Queue Time: This field displays the total amount of time (in seconds) since deployment that the job has spent in a queued state.

In the Fact Editor, this fact is listed as job.history.queuetime.total:

<fact name="job.history.queuetime.total" value="0" type="Integer" />

Average Queue Time: This field displays the average amount of wall time (in seconds) spent waiting for this job to start.

In the Fact Editor, this fact is listed as job.history.queuetime.average:

<fact name="job.history.queuetime.average" value="0" type="Integer" />

Average Sample Size: This field displays the total number of points you want to use in the trailing average calculation for all historical averages.

In the Fact Editor, this fact is listed as job.history.samplesize:

<fact name="job.history.samplesize" value="2" type="Integer" />

NOTE:Similar to a moving average, a trailing average is the mean average measured over the last x datapoints.

Groups

This section of the Info/Groups page lists the groups of Job objects in the grid. Click Choose to open the Job Group Selection dialog box. In this dialog box, you can choose which Job Groups to display in the Explorer Panel by selecting a group and then clicking Add or Remove to move it to or from the Source Job Groups list.

5.3.3 The JDL Editor Tab

The JDL Editor tab of the Job admin view opens an editor where you can inspect and modify the Job Description Language (JDL) code. This code consists of a Python-based script and contains the bits to control a job. The JDL code for each job includes commented documentation to explain the job’s purpose and methods for implementation.

Figure 5-2 The JDL Editor

A drop-down list at the top of the editor includes the Java classes and their methods that are bookmarked in the code. Select any of these to go to the location in the code where they are invoked. Clickable, colored blocks on the editor scroll bar perform a similar bookmarking function.

5.3.4 The Job Library Editor Tab

The Library Editor tab of the Job admin view opens an editor where you can inspect and modify the different library scripts for a job. The scripts for each job include instructions to the Orchestrate Server for handling job functions.

Figure 5-3 The Job Library Editor

There are two drop down lists located at the top of the Library Editor view. The first labeled “Library” lists the different libraries for the job, and the second lists the methods that are bookmarked in the code. Select a method in the second drop-down list to go to the location in the library code where that method is invoked. Clickable, colored blocks on the editor scroll bar perform a similar bookmarking function.

5.3.5 The Job Policies Tab

The Polices tab of the Job admin view opens a page that contains a policy viewer for each of the policies associated with a Job Grid object.

You can modify a policy using the Policy Grid object. for more information see Section 5.8.1, The Policy Object.

Click Choose in the admin view of the Policy viewer to launch a Policy Selection dialog box where you can add or remove individual policies to be applied to the selected Job Grid object.

Figure 5-4 The Policy Selection Dialog Box

5.3.6 The Job Constraints/Facts Tab

The Constraints/Facts tab opens a page that shows all of the effective constraints and facts for a Grid object. Each Grid object has an associated set of facts and constraints that define its properties. In essence, by changing the policy constraints and fact values for a job, you can change the behavior of the job and how the PlateSpin Orchestrate Server allocates available system resources to it. The Orchestrate Server assigns default values to each of the component facts, although they can be changed at any time by the administrator, unless they are read-only. Facts with mode r/o have read-only values, which can be viewed (that is, using the edit “pencil” icon) but changes cannot be made.