Job Configurations

Common Parameters

Besides the type and the dependencies parameters, there are several parameters that Azkaban reserves for all jobs. All of the parameters below are optional.

{.parameter}Parameter {.description} Description
retries The number of retries that will be automatically attempted for failed jobs |
retry.backoff The millisec time between each retry attempt |
working.dir Override the working directory for the execution. This is by default the directory that contains the job file that is being run. |
env. property Set the environment variable with named property |
failure.emails Comma delimited list of emails to notify during a failure. * |
success.emails Comma delimited list of emails to notify during a success. * |
notify.emails Comma delimited list of emails to notify during either a success or failure. * |

{.params}

* note that for email properties, this property is retrieved from the last job in the flow and applied flow level. All other email properties of jobs in the flow are ignored.

Runtime Properties

These properties are automatically added to Azkaban properties during runtime for a job to use.

{.parameter}Parameter {.description} Description
azkaban.job.attempt The attempt number for the job. Starts with attempt 0 and increments with every retry. |
azkaban.flow.flowid The flow name that the job is running in. |
azkaban.flow.execid The execution id that is assigned to the running flow. |
azkaban.flow.projectid The numerical project id. |
azkaban.flow.projectversion The project upload version. |
azkaban.flow.uuid A unique identifier assigned to a flow's execution. |
azkaban.flow.start.timestamp The millisecs since epoch start time. |
azkaban.flow.start.year The start year. |
azkaban.flow.start.month The start month of the year (1-12) |
azkaban.flow.start.day The start day of the month. |
azkaban.flow.start.hour The start hour in the day. |
azkaban.flow.start.minute The start minute.|
azkaban.flow.start.second The start second in the minute. |
azkaban.flow.start.milliseconds The start millisec in the sec|
azkaban.flow.start.timezone The start timezone that is set.|

{.params}

Inherited Parameters

Any included .properties files will be treated as properties that are shared amongst the individual jobs of the flow. The properties are resolved in a hierarchical manner by directory.

For instance, suppose you have the following directory structure in your zip file.

system.properties
baz.job
myflow/
   myflow.properties
   myflow2.properties
   foo.job
   bar.job

That directory structure will be preserved when running in Azkaban. The baz job will inherit only from system.properties. The jobs foo and bar will inherit from myflow.properties and myflow2.properties, which in turn will inherit from system.properties.

The hierarchical ordering of properties in the same directory is arbitrary.

Parameter Substitution

Azkaban allows for replacing of parameters. Whenever a ${parameter} is found in a properties or job file, Azkaban will attempt to replace that parameter. The resolution of the parameters is done late.

# shared.properties
replaceparameter=bar
# myjob.job
param1=mytest
foo=${replaceparameter}

param2=${param1}

In the previous example, before myjob is run, foo will equal bar and param2 will equal mytest.

Parameter substitution is also recursive, allowing for constructs such as:

# global.properties
mycluster.environment=production

# shared.properties
production.database_host=db.example.com
staging.database_host=some_other_host.example.com
# myjob.job
database=${${mycluster.environment}.database_host}

In this example, myjob will use the value db.example.com, having first evaluated mycluster.environment to be production and then production.database_host.

Parameter Passing

There is often a desire to pass these parameters to the executing job code. The method of passing these parameters is dependent on the jobtype that is run, but usually Azkaban writes these parameters to a temporary file that is readable by the job. The path of the file is set in JOB_PROP_FILE environment variable. The format is the same key value pair property files. Certain built-in job types do this automatically for you. The java type, for instance, will invoke your Runnable and given a proper constructor, Azkaban can pass parameters to your code automatically.

Parameter Output

Properties can be exported to be passed to its dependencies. A second environment variable JOB_OUTPUT_PROP_FILE is set by Azkaban. If a job writes a file to that path, Azkaban will read this file and then pass the output to the next jobs in the flow.

The output file should be in json format. Certain built-in job types can handle this automatically, such as the java type.