VoldemortBuildandPush Type
Pushing data from hadoop to voldemort store used to be entirely in java. This created lots of problems, mostly due to users having to keep track of jars and dependencies and keep them up-to-date. We created the "VoldemortBuildandPush" job type to address this problem. Jars and dependencies are now managed by admins; absolutely no jars or java code are required from users.
How-To-Use
This is essentially a hadoopJava job, with all jars controlled by the admins. User only need to provide a .job file for the job and specify all the parameters. The following needs to be specified:
{.parameter}Parameter | {.description} Description | |
---|---|---|
type | The type name as set by the admin, e.g. "VoldemortBuildandPush" | | |
push.store.name | The voldemort push store name | | |
push.store.owners | The push store owners | | |
push.store.description | Push store description | | |
build.input.path | Build input path on hdfs | | |
build.output.dir | Build output path on hdfs | | |
build.replication.factor | replication factor number | | |
user.to.proxy | The hadoop user this job should run under. | | |
build.type.avro | if build and push avro data, true, otherwise, false | | |
avro.key.field | if using avro data, key field | | |
avro.value.field | if using avro data, value field | |
{.params}
Here are what's needed and normally configured by the admn (always put common properties in commonprivate.properties and common.properties for all job types)
These go into private.properties
{.parameter}Parameter | {.description} Description | |
---|---|---|
hadoop.security.manager.class | The class that handles talking to hadoop clusters. | | |
azkaban.should.proxy | Whether Azkaban should proxy as individual user hadoop accounts. | | |
proxy.user | The Azkaban user configured with kerberos and hadoop, for secure clusters. | | |
proxy.keytab.location | The location of the keytab file with which Azkaban can authenticate with Kerberos for the specified proxy.user | | |
hadoop.home | The hadoop home where the jars and conf resources are installed. | | |
jobtype.classpath | The items that every such job should have on its classpath. | | |
jobtype.class | Should be set to azkaban.jobtype.HadoopJavaJob | | |
obtain.binary.token | Whether Azkaban should request tokens. Set this to true for secure clusters. | | |
azkaban.no.user.classpath | Set to true such that Azkaban doesn't pick up user supplied jars. | |
{.params}
These go into plugin.properties
{.parameter}Parameter | {.description} Description | |
---|---|---|
job.class | voldemort.store.readonly.mr.azkaban.VoldemortBuildAndPushJob | | |
voldemort.fetcher.protocol | webhdfs | | |
hdfs.default.classpath.dir | HDFS location for distributed cache | | |
hdfs.default.classpath.dir.enable | set to true if using distributed cache to ship dependency jars | |
{.params}
Please refer to voldemort project site for more info: project voldemort