About Oozie
Oozie is an open source project that simplifies workflow and coordination between jobs. It provides users with the ability to define actions and dependencies between actions. Oozie will then schedule actions to execute when the required dependencies have been met.
A workflow in Oozie is defined in what is called a Directed Acyclical Graph (DAG). Acyclical means there are no loops in the graph (in other words, there’s a starting point and an ending point to the graph), and all tasks and dependencies point from start to end without going back. A DAG is made up of action nodes and dependency nodes. An action node can be a MapReducejob, a Pig application, a file system task, or a Java application. Flow control in the graph is represented by node elements that provide logic based on the input from the preceding task in the graph. Examples of flow control nodes are decisions, forks, and join nodes.
What is Oozie?
• Oozie allows a user to create Directed Acyclic
Graphs of workflows and these can be ran in
parallel and sequential in Hadoop.
• Oozie can also run plain java classes, Pig
workflows, and interact with the HDFS
– Nice if you need to delete or move files before a
job runs
• Oozie can run job’s sequentially (one after the other) and in parallel (multiple at a time)
Why use Oozie instead of just
cascading a jobs one after another?
• Major flexibility
– Start, Stop, Suspend, and re-run jobs
• Oozie allows you to restart from a failure
– You can tell Oozie to restart a job from a specific
node in the graph or to skip specific failed nodes
Other Features
• Java Client API / Command Line Interface
– Launch, control, and monitor jobs from your Java
Apps
• Web Service API
– You can control jobs from anywhere
• Run Periodic jobs
– Have jobs that you need to run every hour, day,
week? Have Oozie run the jobs for you
• Receive an email when a job is complete
How do you make a workflow?
• First make a Hadoop job and make sure that it works
using the jar command in Hadoop
– This ensures that the configura)on is correct for your job
• Make a jar out of your classes
• Then make a workflow.xml file and copy all of the job
configura)on proper)es into the xml file. These
include:
– Input files
– Output files
– Input readers and writers
– Mappers and reducers
– Job specific arguments
How do you make a workflow?
• You also need a job.proper)es file. This file
defines the Name node, Job tracker, etc.
• It also gives the loca)on of the shared jars and
other files
• When you have these files ready, you need to
copy them into the HDFS and then you can
run them from the command line
Oozie is an open source project that simplifies workflow and coordination between jobs. It provides users with the ability to define actions and dependencies between actions. Oozie will then schedule actions to execute when the required dependencies have been met.
A workflow in Oozie is defined in what is called a Directed Acyclical Graph (DAG). Acyclical means there are no loops in the graph (in other words, there’s a starting point and an ending point to the graph), and all tasks and dependencies point from start to end without going back. A DAG is made up of action nodes and dependency nodes. An action node can be a MapReducejob, a Pig application, a file system task, or a Java application. Flow control in the graph is represented by node elements that provide logic based on the input from the preceding task in the graph. Examples of flow control nodes are decisions, forks, and join nodes.
What is Oozie?
• Oozie allows a user to create Directed Acyclic
Graphs of workflows and these can be ran in
parallel and sequential in Hadoop.
• Oozie can also run plain java classes, Pig
workflows, and interact with the HDFS
– Nice if you need to delete or move files before a
job runs
• Oozie can run job’s sequentially (one after the other) and in parallel (multiple at a time)
Why use Oozie instead of just
cascading a jobs one after another?
• Major flexibility
– Start, Stop, Suspend, and re-run jobs
• Oozie allows you to restart from a failure
– You can tell Oozie to restart a job from a specific
node in the graph or to skip specific failed nodes
Other Features
• Java Client API / Command Line Interface
– Launch, control, and monitor jobs from your Java
Apps
• Web Service API
– You can control jobs from anywhere
• Run Periodic jobs
– Have jobs that you need to run every hour, day,
week? Have Oozie run the jobs for you
• Receive an email when a job is complete
How do you make a workflow?
• First make a Hadoop job and make sure that it works
using the jar command in Hadoop
– This ensures that the configura)on is correct for your job
• Make a jar out of your classes
• Then make a workflow.xml file and copy all of the job
configura)on proper)es into the xml file. These
include:
– Input files
– Output files
– Input readers and writers
– Mappers and reducers
– Job specific arguments
How do you make a workflow?
• You also need a job.proper)es file. This file
defines the Name node, Job tracker, etc.
• It also gives the loca)on of the shared jars and
other files
• When you have these files ready, you need to
copy them into the HDFS and then you can
run them from the command line
No comments:
Post a Comment