Search This Blog

Monday, February 17, 2014

What is Oozie?

About Oozie
Oozie is an open source project that simplifies workflow and coordina­tion between jobs. It provides users with the ability to define actions and dependencies between actions. Oozie will then schedule actions to execute when the required dependencies have been met.

A workflow in Oozie is defined in what is called a Directed Acyclical Graph (DAG). Acyclical means there are no loops in the graph (in other words, there’s a starting point and an ending point to the graph), and all tasks and dependencies point from start to end without going back. A DAG is made up of action nodes and dependency nodes. An action node can be a MapReducejob, a Pig application, a file system task, or a Java application. Flow control in the graph is represented by node elements that provide logic based on the input from the preceding task in the graph. Examples of flow control nodes are decisions, forks, and join nodes.

What is Oozie?   
•  Oozie    allows    a    user    to    create    Directed    Acyclic   
Graphs    of    workflows    and    these    can    be    ran    in   
parallel    and    sequential    in    Hadoop.   
•  Oozie    can    also    run    plain    java    classes,    Pig   
workflows,    and    interact    with    the    HDFS   
– Nice    if    you    need    to    delete    or    move    files    before    a   
job    runs   
•  Oozie    can    run    job’s    sequentially    (one after the other)    and    in    parallel    (multiple at a time)   

  Why    use    Oozie    instead    of    just   
cascading    a    jobs    one    after    another?   
•  Major    flexibility   
– Start,    Stop,    Suspend,    and    re-run    jobs   
•  Oozie    allows    you    to    restart    from    a    failure   
– You    can    tell    Oozie    to    restart    a    job    from    a    specific   
node    in    the    graph    or    to    skip    specific    failed    nodes   

Other    Features   
•  Java    Client    API    /    Command    Line    Interface   
– Launch,    control,    and    monitor    jobs    from    your    Java   
Apps   
•  Web    Service    API   
– You    can    control    jobs    from    anywhere   
•  Run    Periodic    jobs   
– Have    jobs    that    you    need    to    run    every    hour,    day,   
week?    Have    Oozie    run    the    jobs    for    you   
•  Receive    an    email    when    a    job    is    complete   

   How    do    you    make    a    workflow?   
•  First    make    a    Hadoop    job    and    make    sure    that    it    works   
using    the    jar    command    in    Hadoop   
–  This    ensures    that    the    configura)on    is    correct    for    your    job   
•  Make    a    jar    out    of    your    classes   
•  Then    make    a    workflow.xml    file    and    copy    all    of    the    job   
configura)on    proper)es    into    the    xml    file.        These   
include:   
–  Input    files   
– Output    files   
–  Input    readers    and    writers   
–  Mappers    and    reducers   
–  Job    specific    arguments   

    How    do    you    make    a    workflow?   
•  You    also    need    a    job.proper)es    file.        This    file   
defines    the    Name    node,    Job    tracker,    etc.   
•  It    also    gives    the    loca)on    of    the    shared    jars    and   
other    files   
•  When    you    have    these    files    ready,    you    need    to   
copy    them    into    the    HDFS    and    then    you    can   
run    them    from    the    command    line   

No comments:

My Profile

My photo
can be reached at 09916017317