Posts

Showing posts from August, 2013

Understanding Apache Hive

Image
   Introduction   BigData and Hive Apache Hive is a software application created to facilitate data analyses on Apache Hadoop. It is a Java framework that helps extracting knowledge from data placed on a HDFS cluster by providing a SQL-like interface to it. The Apache Hadoop platform is a major project on distributed computing and it is commonly assumed to be the best approach when dealing with BigData challenges. It is now very well established that great volume of data is produced everyday. Whether it is by system logs or by users purchases, the amount of information generated is such that previous existing Databases and Datawarehouses solutions don’t seem to scale well enough. The MapReduce programming paradigm was uncovered in 2004 as a new approach on processing large datasets. In 2005 its OpenSource version, Hadoop, was created by Doug Cutting. Although Hadoop is not set for substituting relational databases, it is a good solution for big data analyses