-
Notifications
You must be signed in to change notification settings - Fork 177
How To Install
Azkaban2 is designed to make a number of important functionalities plugins, so that
a) they can be selectively installed/upgraded in different environment without changing the core Azkaban2, and
b) it makes Azkaban2 very easy to be extended for different systems.
These plugins handle important and essential functionalities of Azkaban2 and we recommend installing them along with Azkaban2.
Currently we have the following plugins:
-
HDFS filesystem browser plugin.
-
jobtype plugin, in azkaban-plugins/plugins/jobtype
The default jobtypes plugin bundle enables 'java' job and 'pig' job, which are able to run mapreduce jobs and pig jobs on a hadoop cluster.
To Install These Plugins
Get the azkaban-plugins, either by download the tar ball or check out from Github.
Untar it to my-azkaban-plugins directory.
Installing jobtypes plugins :
1. Build the bundle:
Go to
azkaban-plugins/plugins/jobtype
do
ant package
This should create a install tar ball in
my-azkaban-plugins/dist/jobtype/packages/azkaban-jobtype*.tar.gz
2. Set in Azkaban2 config file where you want Azkaban2 to load the plugin jobtypes:
Azkaban2-executor-server-install-directory/conf/azkaban.properties : azkaban.jobtype.plugin.dir=where-my-jobtype-plugin-go
If you don't set it, it is default to 'plugins/jobtypes'
3. Place the jobtype plugin tar file in 'plugins' directory, untar it, rename it jobtypes or the
name you set in step 2.
4. The 'java' and 'pig' job types in the default package work with hadoop clusters, so you should edit
the jobtypes/commonprivate.properties file
and put in necessary hadoop cluster security settings.
Here are the settings you will likely use:
hadoop.security.manager.class : the class to issue hadoop tokens or get proxy users. The one enclosed is for Hadoop 1.x versions. One should choose one that works with specific hadoop installation and security settings.
azkaban.should.proxy : whether or not azkaban should proxy as a user. default should be on. otherwise, all hadoop jobs will be run as user azkaban.
proxy.keytab.location : kerberos keytab location
proxy.user : proxy user for azkaban in hadoop settings
jobtype.global.classpath : jars in this list will be inherited by all job types
jobtype.global.jvm.args : settings in this list will be inherited by all job types
Start the Azkaban2 executor server. See to that it is able to load the plugin jobtypes correctly by checking the log. One should also run mapreduce/pig jobs to validate correct installation before releasing Azkaban2 to the users.
Installing hsfs browser plugins :
1. Build the bundle:
Go to
azkaban-plugins/plugins/hdfsviewer
do
ant package
This should create a install tar ball in
my-azkaban-plugins/dist/hdfsviewer/packages/azkaban-hdfs-viewer-*.tar.gz
2. Set in Azkaban2 config file where you want Azkaban2 to load the hdfs viewer plugin:
Azkaban2-executor-server-install-directory/conf/azkaban.properties :
# hdfs browser
viewer.plugins=hdfs
3. Place the viewer plugin tar file in 'plugins' directory, untar it, rename it hdfs or the
name you set in step 2, AND put it under /plugin/viewer directory. In this example, you will have
plugins/viewer/hdfs/conf
plugins/viewer/hdfs/lib
4. You need to set the actual hadoop security settings if your cluster have security turned on.
You don't need to do anything if the security setting is turned off.
(This version of hdfs viewer is compiled against hadoop 1.x. It should not be difficult to create another plugin for hadoop 0.20)
If you need to have it work with secure hadoop, you need the following setting:
azkaban.should.proxy : when true, azkaban should get hadoop proxy user using provided authentication
proxy.user : your proxy.user as configured in $HADOOP_HOME/conf/core.xml
proxy.keytab.location : the keytab you will use for kerberos authentication
allow.group.proxy : when true, azkaban can get proxy hadoop user not only as the login user, but as one of the user's associated headless account. For example, user 'someone' can log in azkaban as 'someone' (this can be from xml usermanager or your custom ldap usermanager). She also has hadoop account 'someone', with groups of 'users' 'headles1' 'headles2' 'headles3'. In this case, hdfs browser can also proxy as 'headles1' or 'headles2' or 'headles3' in addition to 'someone'. This group information should come from hadoop ldap upon user login. This feature is most useful in production environment with strict security settings. If your production cluster is not secured, there is no need to turn this on.