Building Druid for Cloudera 5.4.x

Posted by Kaya Kupferschmidt • Monday, November 30. 2015 • Category: Java

So the other day I wanted to investigate into using Druid as a reporting backend database. But unfortunately Druid doesn't work out of the box with Cloudera 5.4. I always get an error when running the Hadoop indexer, either via CLI or via the Indexing service. The exceptions in Hadoop always look like this:

2015-11-30 11:42:37,653 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.VerifyError: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    ...

So the problem seems to be a classical version mismatch between Cloudera Hadoop and Druid. Specifically both projects are using incompatible versions of the Jackson libraries (Cloudera still uses 2.2.3 while Druid uses 2.4.6). After some trials with different Jackson versions I got it to work by modifying the dependencies of Druid itself and building it myself. Since I suspect that others may run into similar problems, here is what I did to get Druid up and running:

git clone https://github.com/druid-io/druid.git
cd druid
git checkout 0.8.2
sed -i "s#jackson.version>2.4.6<#jackson.version>2.3.5<#" pom.xml
mvn package -DskipTests

After that you will find a packaged version of Druid at

distribution/target/druid-0.8.3-SNAPSHOT-bin.tar.gz

which should work with Cloudera 5.4. 

Setting up an Apache Cluster with Vagrant

Posted by Kaya Kupferschmidt • Wednesday, February 4. 2015 • Category: Java

Vagrant makes the perfect companion for developers that need to simulate complex cluster setups on a single machine. This is especially true when using vagrant-lxc as the container provider, which uses Linux containers instead of a full virtualisation.

Directory Structure

With the following ingredients you can setup a whole Apache Storm cluster. You can download the whole package on github. But let us look at the details. You will need the following directory structure

+ Vagrantfile
|
+----- provision
         |
         +------ data
         |        + hosts
         |
         +------ puppet
         |        |
         |        +------ manifests
         |        |        + site.pp
         |        |
         |        +------ modules
         |        + Puppetfile
         |
         +------ scripts
                  + main.sh

Continue reading "Setting up an Apache Cluster with Vagrant"


A Simple Sidebar