I have been working with BigData quite a lot lately and this area is mostly dominated by Apache OpenSource projects.
So, naturally (given the nerd that I am) I tried to investigate their history. I created a list of articles and companies that originated most BigData Apache projects.
Here it is! Hope you guys find it interesting too. :)
Apache Hadoop
Based on: Google MapReduce and GFS
Papers:
Apache Spark
Created by: University of California, Berkeley
Papers:
Apache Kafka
Created by: Linkedin
Papers:
Apache Impala
Based on: Google F1
Papers:
Apache HBase
Based on: Google BigTable
Papers:
Apache Drill
Based on: Google Dremel
Papers:
Apache Pig
Created by: Yahoo!
Papers:
Apache Oozie
Created by: Yahoo!
Papers:
Apache Sqoop
Started as a module for Apache Hadoop on issue https://issues.apache.org/jira/browse/HADOOP-5815 by Aaron Kimball.
Links:
Apache Flume
Links: