Posts

Showing posts from January, 2011

Apache Hadoop for Beginners

Image
The Apache Hadoop is a framework for distributed computing applications, inspired by Google's MapReduce  and GFS paper. It is an Open Source software that enables the processing of  massive amounts of data with commodity. First introduced by Doug Cutting, who named the project after his son's toy (a yellow elephant), Hadoop it is now one of the greatest Apache projects. It involves many contributors and users around the world such as Yahoo!, IBM, Facebook and many others.   The framework presents a master/worker shared nothing architecture . The Hadoop cluster is composed of a group of single nodes (computers), being one of these nodes the master server and the other nodes the workers. On the master node, the Namenode deamon and the JobTracker daemon usually run. The Namenode deamon keeps files metadata, and the JobTracker one manages the mapreduce tasks executed on the cluster. The management and monitoring of tasks are made by the Hadoop server its