Posts

Showing posts from February, 2013

Why are there three Hadoop svn repositories (common, hdfs and mapreduce)? Where is the repository for YARN?

When developers start reading about Hadoop, one of the first info they get is: " The project includes these modules: Hadoop Common : The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS™) : A distributed file system that provides high-throughput access to application data. Hadoop YARN : A framework for job scheduling and cluster resource management. Hadoop MapReduce : A YARN-based system for parallel processing of large data sets. " So it might be a little confusing when trying to build Hadoop code from source, they are indicated to check out only a repository called hadoop-common . It might became even more confusing when you realize that there are two other repositories for Hadoop: hadoop-hdfs and hadoop-mapreduce . So what repositories to use?  The answer is: hadoop-common encompasses all these Hadoop modules. When looking at the hadoop-hdfs or  hadoop-mapreduce repo