Tuesday, April 23, 2013

How to Build Oozie with Different Versions of Hadoop

After downloading Oozie code with

svn checkout http://svn.apache.org/repos/asf/oozie/tags/release-3.3.0/ .

and then building it with Hadoop 1.1.0 with the familiar

mvn clean compile -Dhadoop.version=1.1.0

I got the following error:

[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:06.497s
[INFO] Finished at: Tue Apr 23 12:36:53 BRT 2013
[INFO] Final Memory: 20M/67M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project oozie-sharelib-distcp: Could not resolve dependencies for project org.apache.oozie:oozie-sharelib-distcp:jar:3.3.0: Could not find artifact org.apache.oozie:oozie-hadoop-distcp:jar:1.1.0.oozie-3.3.0 in central (http://repo1.maven.org/maven2) -> [Help 1]


Reading a bit about it, and checking some pom files, I realized that inside the hadoolibs directory (inside oozie home), there are three sub-directories with the hadoop version hard coded on their poms.
So when you pass the -Dhadoop.version, these pom don't "change"! And they continue on using their pre-defined version of Hadoop!

I talked to the community guys from Oozie, and they say that the recommended thing to do is to change the pom files itself, and not pass by parameter.

Resuming, if you want to build oozie 3.3 with a different Hadoop, edit these pom files:

oozie_home/hadooplibs/hadoop-1/pom.xml
oozie_home/hadooplibs/hadoop-distcp-1/pom.xml
oozie_home/hadooplibs/hadoop-test-1/pom.xml
oozie_home/pom.xml

Setting the desired version of Hadoop. This off courseif you are building against Hadoop 1.x. If you are building oozie with Hadoop 2.x, edit:


oozie_home/hadooplibs/hadoop-2/pom.xml
oozie_home/hadooplibs/hadoop-distcp-2/pom.xml
oozie_home/hadooplibs/hadoop-test-2/pom.xml
oozie_home/pom.xml


Friday, April 5, 2013

HashMap JVM Differences

Although Java slogan's is "Write once, run everywhere" , to emphasize the cross-platform benefit, in practice unfortunately this is not totally true.

One known difference between Sun and other JVMs is the HashMap order output.

When  executing the exact same program and iterating though  the same exact same HashMap input, a Sun JVM will produce a different output than another JVM.

See as example the code below:

 
import java.util.LinkedHashMap;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;


public class HashMapTest {

        static HashMap<String, String> result = new HashMap<String, String>();
        static Iterator<Map.Entry<String, String>> entryIter;
        static HashMap<String, String> thash = new HashMap<String, String>();

        public static void main(String[] args) {

                for (int i = 0; i < 10; i++){
                        thash.put(Integer.toString(10 - i), "abc");
                }

                result.putAll(thash);

                entryIter = result.entrySet().iterator();
                while (entryIter.hasNext()) {

                        Map.Entry<String, String>  entry = entryIter.next();
                        String val1 = entry.getKey();
                        String val =  entry.getValue();
                        System.out.println("Key: "+ val1 + " Value: "+val);
                }
        }

}






Compiling and executing this code with Sun Java will create the following output:

Key: 3 Value: abc
Key: 2 Value: abc
Key: 10 Value: abc
Key: 1 Value: abc
Key: 7 Value: abc
Key: 6 Value: abc
Key: 5 Value: abc
Key: 4 Value: abc
Key: 9 Value: abc
Key: 8 Value: abc

While whether doing the same thing with IBM Java you should get:

Key: 10 Value: abc
Key: 9 Value: abc
Key: 8 Value: abc
Key: 7 Value: abc
Key: 6 Value: abc
Key: 5 Value: abc
Key: 4 Value: abc
Key: 3 Value: abc
Key: 2 Value: abc
Key: 1 Value: abc

I don't want to enter in merits of which one is right and which one is wrong. Just want to alert people that this issue can cause serious differences in programs output.