Project Description
This project seeks to explore recommender systems, including considerations of open source tools that provide recommender frameworks, a discussion of several implementation options related to recommenders, and options for making recommenders scale to large data sets using Hadoop. In regards to open source tools, this project focuses primarily on the Mahout implementation, an open-source project, which provides a series of java-based, machine-learning JARs. While Mahout supports many different aspects of machine learning, including classification, clustering and genetic algorithms, our work focuses primarily on the Mahout class libraries associated with recommender systems. Specifically, we review in depth the workings of recommenders based on item-similarity and user-similarity in creating movie recommendations for users in the Movie Lens dataset, an openly distributed listing of user movie ratings collected by the Open Lens project at the University of Minnesota. The paper progresses by examining and contrasting the different modeling options provided by Mahout, including our implementation of a simple Java “meta-evaluator” to compare the various similarity measures and their relationships to one another, along with a review of additional available parameters, such as neighborhood size. The paper closes with a review of measures of scale. Here, Apache Hadoop is used to demonstrate an efficient map-reduce based implementation of parallelization of the user-user comparisons in an Amazon EC2 cloud installation. We review installation options (and challenges) in the EC2 cloud, and we discuss high-level performance topics for implementing a large-scale data recommender system.

Last edited May 11, 2012 at 12:29 AM by josephflu, version 2