Posts

Showing posts from November, 2012

Dependencies on a Hadoop Ecosystem

Image
When building a Hadoop cluster and all other Apache projects related to it, it might be tricky to know what to install and how to install. You first should understand your data, and what you want to do with it. If you have log-like data, that keeps increasing all the time, and you have keep updating it to the Hadoop cluster, you might want to consider Flume. Apache Flume is a distributed deamon-like software ( and for that presents High Availability) that can keep feeding data to the Hadoop cluster. If you need a no-sql radom read/write database you can use Hbase , implemented based on Google's BigTable database. If you have relatively structured data and you want to do query-like analyses on it, you might consider Pig or Hive . Both work on top of the Hadoop cluster, executing commands instead of Java written MapReduce jobs. They both furnish their own language to execute such commands. Pig uses a textual language called Pig Latin, and Hive uses a syntax ver

Duine Open Source Recommender

Image
Duine   is a open source Recommender System . It is a collection of software libraries developed by Telematica Instituut/Novay that intends to predict how interesting is an information to a user. It provides the collaborative filtering and content based recommender and other features, such as an Explanation API (explanations to why such recommendations has been made). Its result recommendations are quantified by a number, ranging from -1 to +1, being that the greater the result, the more interesting the item should be to the user. One of the main advantages of Duine is its well formed architecture. When it performs a recommendation, it can incorporate the user feedback to its systems. Also, it possess a switching engine, being able to analyse which method (content or collaborative) is better in the data situation, and dynamically change it. a . Architecture The following picture describes the main concept of Duine framework. b. Installation To install

Open Source Recommendation Systems Survey

Image
Here follows a survey I did back in 2010 when I was studying Recommender Systems. Hope it is useful. The growth of web content and the expansion of e-commerce has deeply increased the interest on recommender systems. This fact has led to the development of some open source projects in the area. Among the recommender systems algorithms available in the web, we can distinguish the following:   Duine , Apache Mahout , OpenSlopeOne , Cofi , SUGGEST and Vogoo . All of these projects offers collaborative-filtering implementations, in different programming languages. The Duine Framework supplies also an hybrid implementation. It is a Java software that presents the content-based and collaborative filtering in a switching engine: it dynamically switches between each prediction given the current state of the data. For example if there aren't many ratings available, it uses the content-based approach, and switches to the collaborative when the scenario changes.