Running Oozie Workflows in Hadoop - dummies

Running Oozie Workflows in Hadoop

By Dirk deRoos

Before running your Oozie workflows, all its components need to exist within a specified directory structure. Specifically, the workflow itself should have its own, dedicated directory, where workflow.xml is in the root directory, and any code libraries exist in the subdirectory named lib. The workflow directory and all its files must exist in HDFS for it to be executed.

If you’ll be using the Oozie command-line interface to work with various jobs, be sure to set the OOZIE_URL environment variable. (This is easily done from a command line in a Linux terminal.) You can save yourself a lot of typing because the Oozie server’s URL will now automatically be included with your requests.

Here’s a sample command one could use to set the OOZIE_URL environment variable from the command line:

export OOZIE_URL="http://localhost:8080/oozie"

To run an Oozie workload from the Oozie command-line interface, issue a command like the following, while ensuring that the job.properties file is locally accessible — meaning the account you’re using can see it, meaning it has to be on the same system where you’re running Oozie commands:

$ oozie job –config sampleWorkload/job.properties –run

After you submit a job, the workload is stored in the Oozie object database.

image0.jpg

On submission, Oozie returns an identifier to enable you to monitor and administer your workflow — job: 0000001-00000001234567-oozie-W, for example.

To check the status of this job, you’d run the command

oozie job -info 0000001-00000001234567-oozie-W