Scheduling and Coordinating Oozie Workflows in Hadoop

By Dirk deRoos

After you’ve created a set of workflows, you can use a series of Oozie coordinator jobs to schedule when they’re executed. You have two scheduling options for execution: a specific time and the availability of data in conjunction with a certain time.

Time-based scheduling for Oozie coordinator jobs

Oozie coordinator jobs can be scheduled to execute at a certain time, but after they’re started, they can then be configured to run at specified intervals. The following example shows a coordinator job that starts running at a specified start time and date:

<coordinator-app name="sampleCoordinator"
                 frequency="${coord:days(1)}"
                 start="2014-06-01T00:01Z "
                 end="2014-06-01T01:00Z "
                 timezone="UTC"
                 xmlns="uri:oozie:coordinator:0.1">
   <controls>...</controls>
   <action>
      <workflow>
         <app-path>${workflowAppPath}</app-path>
      </workflow>
   </action>     
</coordinator-app>

Time and data availability-based scheduling for Oozie coordinator jobs

Oozie coordinator jobs can also be scheduled to execute at a certain time if specified data files or directories are available. The following listing shows an example of a coordinator that starts running at a specified start time and date, is executed once a day if the data set identified by triggerDatasetDir exists, and runs until the specified end time:

<coordinator-app name="sampleCoordinator"
                 frequency="${coord:days(1)}"
                 start="${startTime}"
                 end="${endTime}"
                 timezone="${timeZoneDef}"
                 xmlns="uri:oozie:coordinator:0.1">
   <controls>...</controls>
   <datasets>
      <dataset name="input" frequency="${coord:days(1)}" initial-instance="${startTime}" timezone="${timeZoneDef}">
         <uri-template>${triggerDatasetDir}</uri-template>
      </dataset>
   </datasets>
   <input-events>
         <data-in name="sampleInput" dataset="input">
         <instance>${startTime}</instance>
      </data-in>
   </input-events>
   <action>
      <workflow>
         <app-path>${workflowAppPath}</app-path>
      </workflow>
   </action>     
</coordinator-app>

Running Oozie coordinator jobs

Similar to Oozie workflow jobs, coordinator jobs require a job.properties file, and the coordinator.xml file needs to be loaded in the HDFS. To run an Oozie coordinator job from the Oozie command-line interface, issue a command like the following while ensuring that the job.properties file is locally accessible:

$ oozie job –config sampleCoordinator/job.properties –run

After you submit the job, the coordinator is stored in the Oozie object database. On submission, Oozie returns an identifier to enable you to monitor and administer your coordinator — job: 0000001-00000001234567-oozie-C.

To check the status of this job, run the command

oozie job -info 0000001-00000001234567-oozie-C