Recommendation system: Difference between revisions
imported>Yash Prabhu |
imported>John R. Brews (→History: link) |
||
(54 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
{{subpages}} | {{subpages}} | ||
{{TOC|right}} | {{TOC|right}} | ||
== Definition == | |||
A '''recommendation system''' is a software program which attempts to narrow down selections for users based on their expressed preferences, past behavior, or other data which can be mined about the user or other users with similar interests. | A '''recommendation system''' is a software program which attempts to narrow down selections for users based on their expressed preferences, past behavior, or other data which can be mined about the user or other users with similar interests. | ||
== History == | == History == | ||
Recommendation systems have their roots in "Usenet," a worldwide distributed discussion system originating at Duke University in the late 1970s. Usenet operated in a client/server format, allowing user input that was categorized into specific "newsgroups." In Usenet, the posts made by users are categorized into these newsgroups, which are then further divided into sub-categories, if needed. | '''Recommendation systems''' have their roots in "Usenet," a worldwide distributed discussion system originating at Duke University in the late 1970s. Usenet operated in a client/server format, allowing user input that was categorized into specific "newsgroups." In Usenet, the posts made by users are categorized into these newsgroups, which are then further divided into sub-categories, if needed. | ||
Information Filtering (IF) is a way of sifting through the overabundance of data on the Web. As newsgroups grew exponentially, database administrators were scrambling for a way to reduce e-clutter. Some of the early solutions for data overload include: | '''Information Filtering (IF)''' is a way of sifting through the overabundance of data on the Web. As newsgroups grew exponentially, database administrators were scrambling for a way to reduce e-clutter. Some of the early solutions for data overload include: | ||
* '''Tapestry''' - developed by Xerox, they coined the phrase "collaborative filtering" | |||
* '''Lotus Notes''' - a component of this software had built-in collaborative filtering mechanisms | |||
* '''GroupLens''' - started in 1992, this Open Source project was built on the premise of Tapestry with the intention of simplifying Usenet data by using distributed networks that addressed privacy issues and making suggestions according to others' ratings | |||
Pattie Maes was primarily responsible for collaborative filtering with the advent of her efforts at MIT on a system called | '''Pattie Maes''' was primarily responsible for collaborative filtering with the advent of her efforts at MIT on a system called '''Firefly''', a recommendation system for music lovers. Firefly was later purchased by Microsoft for an estimated 40 million dollars. | ||
Through the 1990s and beyond, collaborative filtering recommendation systems included: | Through the 1990s and beyond, collaborative filtering recommendation systems included: | ||
Mosaic – First graphical browser allowing users to publish comments to Web pages | *'''Mosaic''' – First graphical browser allowing users to publish comments to Web pages | ||
HOMR – Helpful Online Music Recommendations; predecessor to Firefly | *'''HOMR''' – Helpful Online Music Recommendations; predecessor to Firefly | ||
Ringo – Social Information filtering system for music recommendations | *'''Ringo''' – Social Information filtering system for music recommendations | ||
Firefly – Grew out of Ringo project, music and movies | *'''Firefly''' – Grew out of Ringo project, music and movies | ||
Yahoo! – Started by Princeton students David Filo and Jerry Yang | *'''Yahoo!''' – Started by Princeton students David Filo and Jerry Yang | ||
Point’s Top 5% - NYC-based qualitative website rating | *'''Point’s Top 5%''' - NYC-based qualitative website rating | ||
PHOAKS – People Helping One Another Know Stuff | *'''PHOAKS''' – People Helping One Another Know Stuff | ||
Fab – Allowed users to create content-based filters | *'''Fab''' – Allowed users to create content-based filters | ||
Webdoggie – Helped people find websites according to their likes | *'''Webdoggie''' – Helped people find websites according to their likes | ||
Alexa Internet – When someone visits a website, Alexa displays other websites they might be interested in | *'''Alexa Internet''' – When someone visits a website, [[Alexa Internet|Alexa]] displays other websites they might be interested in | ||
Recommendation systems are now an integral part of Amazon.com's purchasing power! | Recommendation systems are now an integral part of '''Amazon.com's''' purchasing power! | ||
== Classification == | == Classification == | ||
The current generation of recommendation methods can be broadly classifed into the following five categories, based on the knowledge sources they use to make recommendations | The current generation of '''recommendation methods''' can be broadly classifed into the following five categories, based on the knowledge sources they use to make recommendations:<br /> | ||
1. Content-based recommendations | # Content-based Recommendations<br /> | ||
# Collaborative Recommendations<br /> | |||
# Knowledge-based Recommendations<br /> | |||
# Demographic Recommendations<br /> | |||
# Hybrid Recommendations<br /> | |||
'''General Requirements for Recommendation Systems''' <br /> | |||
To make a viable recommendation, three things are needed: <br /> | |||
* '''Background Information''' - the information that the system has before the recommendation process begins<br /> | |||
* '''Input Information''' - the information that a user must enter to the system in order to trigger a recommendation<br /> | |||
* '''An Algorithm''' - this will combine background and input information to arrive at the recommendation<br /> | |||
==== 1. Content-based Recommendation ==== | |||
Content-based systems recommended items to the user similar to the ones he or she preferred in the past. For example, in a book recommendation application, in order to recommend books to user u, the content-based recommendation system looks for the similarities among the books user u has rated highly in the past (specific writers, genres, subject matter). Only the books that have a high degree of similarity to whatever the user’s preferences are would be recommended. | |||
Content-based Systems are designed mostly to recommend text-based items. The preferences that get evaluated are called “keywords.” | |||
Content-based recommendations can either be:<br /> | |||
* '''Memory/Heuristic''' - uses frequency, inverse document frequency (TF-IDF) text retrieval method<br /> | |||
* '''Model based''' - uses Decision trees, neural networks, Bayesian classifiers, Clustering or vector-based representations<br /> | |||
'''How Content-based Recommendation Works'''<br /> | |||
The user profile of preferences is stored as a vector of keywords. These profiles are obtained by analyzing the content of the items previously seen and rated by the user and are usually constructed using keyword analysis techniques from information retrieval. | |||
Information retrieval involves allocating various weights to keywords by use of algorithms such as '''Winnow and Rocchio''' algorithms. | |||
In the model implementation of content-based RS, other techniques of information retrieval such as '''Bayesian classifiers''' and various machine learning techniques, including clustering, decision trees, and artificial neural networks are used. | |||
'''Limitations of Content-based Recommendation Systems'''<br /> | |||
* '''Ramp-up Problem'''<br /> | |||
This is a term used to describe a well-known issue with the Content-based systems and recommendation systems in general. For example, new items cannot be recommended to any user until they get some sort of rating. Recommendations for items that are new to the database are essentially relatively weaker than more widely rated products, and this is the same case for users who are new to the system. In other words, until there is a large number of users whose habits are known, the system cannot be useful for most users, and until a sufficient number of rated items has been collected, the system cannot be useful for a particular user. | |||
* '''Plasticity problem'''<br /> | |||
Once a user’s profile has been set up in the system, it is hard for the system to adapt to changes in the user’s preferences. An alcoholic who becomes a teetotaler will continue to get alcohol recommendations from a content-based or collaborative recommender for some time, until newer ratings kick in. | |||
==== 2. Collaborative RS ==== | |||
'''Collaborative recommendation systems''' suggest items that people with similar taste preferred in the past. | |||
See also: [[Collaborative filtering]] | |||
==== 3. Knowledge-based Recommendation ==== | |||
Knowledge-based recommendation utilizes the knowledge about users and products and reasons out what products meet the users requirements. Some of the systems being used at present effectively walk the user down a discrimination tree of product attributes whereas others have adopted a quantitative decision support tool for this task. | |||
'''Advantages of Knowledge-based Recommendation''' | |||
It doesn't have the “ramp-up” problem since its recommendations don’t depend on any database of user ratings. Users are encouraged to explore and understand the information space and, by doing so, they elaborate more on their needs. | |||
'''Disadvantages of Knowledge-based Recommendation''' | |||
It requires an engineered knowledge database to make useful recommendations. This knowledge base has to be updated to keep up with the ever-changing consumer ratings and preferences. This system tends to give static suggestions that limit the user to what is contained in the database. | |||
==== | ==== 4. Demographic-based Recommendation ==== | ||
Categorizes the user based on personal attributes and makes recommendations based on demographic classes, e.g. college students, teenagers, women, men, etc. | |||
The advantages and disadvantages of this system are similar to those of Knowledge-based Recommendation Systems. | |||
==== 5. Hybrid Recommendation Systems ==== | |||
All the above mentioned systems have complementary strengths and weaknesses. A Hybrid recommendation system combines two or more recommendation techniques to gain better system optimization and fewer of the weaknesses of any individual ones. | |||
The most popular Hybrids are those of '''Content-based''' and '''Collaborative Filtering.''' | |||
''' Methods/Strategies of Hybridization''' | |||
There are different strategies by which hybridization can be achieved and they are broadly classified into seven categories:<br /> | |||
* '''Weighted''': Implementing '''Collaborative''' and '''Content-based''' methods separately and then combining their predictions. | |||
* '''Switching''': A certain switching criterion is used by the system to interchange between two recommendation systems operating on the same object. | |||
* '''Feature Combination''': Features from different recommendation systems' data sources are put into a single recommendation algorithm. | |||
* '''Cascading''': For this category, one recommendation system refines the results given by another. | |||
* '''Meta Level''': In this case, a feature such as a model learned by one recommendation is used as input to another. It differs from '''Feature Augmentation System,''' in that the entire model is used as input. | |||
* '''Feature Augmentation''': The output of one system is used as an input feature to another; for example, using the model generated by one to generate features that are used by another. | |||
* '''Mixed''' :Incorporates two or more techniques at the same time; for example: '''Content-based''' and '''Collaborative Filtering.''' | |||
== '''Issues''' == | |||
The five most challenging issues recommendation systems face are: | |||
# '''Changing Data''' - Trying to keep pace with people's tastes and changing opinions | |||
# '''Lack of Data''' - Getting users to rate products and enter information about their likes and dislikes | |||
# '''Updating User Preferences''' - Initial preferences are stored but need to be updated by users | |||
# '''Unpredictable Results''' - How would you ever guess that someone who listens to Barry Manilow is also a Depeche Mode fan? | |||
# '''Lots of Work!''' - Although they might look simple, recommedation systems take lots and lots of computations | |||
'''More RS Issues''' | |||
* Constructing accurate user models | |||
* Compatibility among models | |||
* Shilling | |||
* SMS Spam | |||
* Preferences crossing domains | |||
* Costs | |||
== '''Future''' == | |||
The future of recommendation systems is unclear. Options discussed include: | |||
* '''OpenFolders (OpenCola)''' is a way of storing recommendations on a user's computer so that when he or she logs on each day, a folder containing current items of interest is available. This idea has had mixed reviews and controversy regarding privacy. | |||
* Other recommendation systems might use commonality to promote diversity - This technique takes advantage of the commonalities between two parties and used to promote a product or service that one favors but not the other. It's based on a trust mechanism: if Sally and Sarah both like yoga, but Sally also likes motorcycle riding, maybe Sarah will, too. | |||
Recommendation systems are also being targeted to the following industries: | |||
* '''Intelligent tourist and restaurant guides''' | |||
* '''Navigation aids''' | |||
* '''Shopping systems that recommend based on user behavior''' | |||
== Recent Press == | == Recent Press == | ||
[http://www.wired.com Wired.com] recently released | [http://www.wired.com Wired.com] recently released an article on Caterina Fake and her work with [http://hunch.com/ Hunch.com] especially with respect to the cold start problem.<ref name="Wired article">{{cite web|url=http://www.wired.com/magazine/tag/caterina-fake/|title=What You Want: Flickr Creator Spins Addictive New Web Service}}</ref> | ||
== References == | == References == | ||
# [http://www.computer.org/portal/web/csdl/doi/10.1109/TKDE.2005.99 Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions]<br /> | |||
# [http://portal.acm.org/citation.cfm?id=1278202 Privacy-enhanced personalization]<br /> | |||
# [http://portal.acm.org/citation.cfm?id=586352 Hybrid Recommender Systems: Survey and Experiments]<br /> | |||
# [http://www.citeulike.org/group/2072/article/1668971 Hybrid Web Recommender Systems]<br /> | |||
<references /> | <references /> |
Latest revision as of 05:58, 23 July 2012
Definition
A recommendation system is a software program which attempts to narrow down selections for users based on their expressed preferences, past behavior, or other data which can be mined about the user or other users with similar interests.
History
Recommendation systems have their roots in "Usenet," a worldwide distributed discussion system originating at Duke University in the late 1970s. Usenet operated in a client/server format, allowing user input that was categorized into specific "newsgroups." In Usenet, the posts made by users are categorized into these newsgroups, which are then further divided into sub-categories, if needed.
Information Filtering (IF) is a way of sifting through the overabundance of data on the Web. As newsgroups grew exponentially, database administrators were scrambling for a way to reduce e-clutter. Some of the early solutions for data overload include:
- Tapestry - developed by Xerox, they coined the phrase "collaborative filtering"
- Lotus Notes - a component of this software had built-in collaborative filtering mechanisms
- GroupLens - started in 1992, this Open Source project was built on the premise of Tapestry with the intention of simplifying Usenet data by using distributed networks that addressed privacy issues and making suggestions according to others' ratings
Pattie Maes was primarily responsible for collaborative filtering with the advent of her efforts at MIT on a system called Firefly, a recommendation system for music lovers. Firefly was later purchased by Microsoft for an estimated 40 million dollars.
Through the 1990s and beyond, collaborative filtering recommendation systems included:
- Mosaic – First graphical browser allowing users to publish comments to Web pages
- HOMR – Helpful Online Music Recommendations; predecessor to Firefly
- Ringo – Social Information filtering system for music recommendations
- Firefly – Grew out of Ringo project, music and movies
- Yahoo! – Started by Princeton students David Filo and Jerry Yang
- Point’s Top 5% - NYC-based qualitative website rating
- PHOAKS – People Helping One Another Know Stuff
- Fab – Allowed users to create content-based filters
- Webdoggie – Helped people find websites according to their likes
- Alexa Internet – When someone visits a website, Alexa displays other websites they might be interested in
Recommendation systems are now an integral part of Amazon.com's purchasing power!
Classification
The current generation of recommendation methods can be broadly classifed into the following five categories, based on the knowledge sources they use to make recommendations:
- Content-based Recommendations
- Collaborative Recommendations
- Knowledge-based Recommendations
- Demographic Recommendations
- Hybrid Recommendations
General Requirements for Recommendation Systems
To make a viable recommendation, three things are needed:
- Background Information - the information that the system has before the recommendation process begins
- Input Information - the information that a user must enter to the system in order to trigger a recommendation
- An Algorithm - this will combine background and input information to arrive at the recommendation
1. Content-based Recommendation
Content-based systems recommended items to the user similar to the ones he or she preferred in the past. For example, in a book recommendation application, in order to recommend books to user u, the content-based recommendation system looks for the similarities among the books user u has rated highly in the past (specific writers, genres, subject matter). Only the books that have a high degree of similarity to whatever the user’s preferences are would be recommended.
Content-based Systems are designed mostly to recommend text-based items. The preferences that get evaluated are called “keywords.”
Content-based recommendations can either be:
- Memory/Heuristic - uses frequency, inverse document frequency (TF-IDF) text retrieval method
- Model based - uses Decision trees, neural networks, Bayesian classifiers, Clustering or vector-based representations
How Content-based Recommendation Works
The user profile of preferences is stored as a vector of keywords. These profiles are obtained by analyzing the content of the items previously seen and rated by the user and are usually constructed using keyword analysis techniques from information retrieval.
Information retrieval involves allocating various weights to keywords by use of algorithms such as Winnow and Rocchio algorithms.
In the model implementation of content-based RS, other techniques of information retrieval such as Bayesian classifiers and various machine learning techniques, including clustering, decision trees, and artificial neural networks are used.
Limitations of Content-based Recommendation Systems
- Ramp-up Problem
This is a term used to describe a well-known issue with the Content-based systems and recommendation systems in general. For example, new items cannot be recommended to any user until they get some sort of rating. Recommendations for items that are new to the database are essentially relatively weaker than more widely rated products, and this is the same case for users who are new to the system. In other words, until there is a large number of users whose habits are known, the system cannot be useful for most users, and until a sufficient number of rated items has been collected, the system cannot be useful for a particular user.
- Plasticity problem
Once a user’s profile has been set up in the system, it is hard for the system to adapt to changes in the user’s preferences. An alcoholic who becomes a teetotaler will continue to get alcohol recommendations from a content-based or collaborative recommender for some time, until newer ratings kick in.
2. Collaborative RS
Collaborative recommendation systems suggest items that people with similar taste preferred in the past. See also: Collaborative filtering
3. Knowledge-based Recommendation
Knowledge-based recommendation utilizes the knowledge about users and products and reasons out what products meet the users requirements. Some of the systems being used at present effectively walk the user down a discrimination tree of product attributes whereas others have adopted a quantitative decision support tool for this task.
Advantages of Knowledge-based Recommendation It doesn't have the “ramp-up” problem since its recommendations don’t depend on any database of user ratings. Users are encouraged to explore and understand the information space and, by doing so, they elaborate more on their needs.
Disadvantages of Knowledge-based Recommendation It requires an engineered knowledge database to make useful recommendations. This knowledge base has to be updated to keep up with the ever-changing consumer ratings and preferences. This system tends to give static suggestions that limit the user to what is contained in the database.
4. Demographic-based Recommendation
Categorizes the user based on personal attributes and makes recommendations based on demographic classes, e.g. college students, teenagers, women, men, etc. The advantages and disadvantages of this system are similar to those of Knowledge-based Recommendation Systems.
5. Hybrid Recommendation Systems
All the above mentioned systems have complementary strengths and weaknesses. A Hybrid recommendation system combines two or more recommendation techniques to gain better system optimization and fewer of the weaknesses of any individual ones. The most popular Hybrids are those of Content-based and Collaborative Filtering.
Methods/Strategies of Hybridization
There are different strategies by which hybridization can be achieved and they are broadly classified into seven categories:
- Weighted: Implementing Collaborative and Content-based methods separately and then combining their predictions.
- Switching: A certain switching criterion is used by the system to interchange between two recommendation systems operating on the same object.
- Feature Combination: Features from different recommendation systems' data sources are put into a single recommendation algorithm.
- Cascading: For this category, one recommendation system refines the results given by another.
- Meta Level: In this case, a feature such as a model learned by one recommendation is used as input to another. It differs from Feature Augmentation System, in that the entire model is used as input.
- Feature Augmentation: The output of one system is used as an input feature to another; for example, using the model generated by one to generate features that are used by another.
- Mixed :Incorporates two or more techniques at the same time; for example: Content-based and Collaborative Filtering.
Issues
The five most challenging issues recommendation systems face are:
- Changing Data - Trying to keep pace with people's tastes and changing opinions
- Lack of Data - Getting users to rate products and enter information about their likes and dislikes
- Updating User Preferences - Initial preferences are stored but need to be updated by users
- Unpredictable Results - How would you ever guess that someone who listens to Barry Manilow is also a Depeche Mode fan?
- Lots of Work! - Although they might look simple, recommedation systems take lots and lots of computations
More RS Issues
- Constructing accurate user models
- Compatibility among models
- Shilling
- SMS Spam
- Preferences crossing domains
- Costs
Future
The future of recommendation systems is unclear. Options discussed include:
- OpenFolders (OpenCola) is a way of storing recommendations on a user's computer so that when he or she logs on each day, a folder containing current items of interest is available. This idea has had mixed reviews and controversy regarding privacy.
- Other recommendation systems might use commonality to promote diversity - This technique takes advantage of the commonalities between two parties and used to promote a product or service that one favors but not the other. It's based on a trust mechanism: if Sally and Sarah both like yoga, but Sally also likes motorcycle riding, maybe Sarah will, too.
Recommendation systems are also being targeted to the following industries:
- Intelligent tourist and restaurant guides
- Navigation aids
- Shopping systems that recommend based on user behavior
Recent Press
Wired.com recently released an article on Caterina Fake and her work with Hunch.com especially with respect to the cold start problem.[1]