Rating Implementation Choices

Select EU-funded project:
Rating and filtering of scientific, technical and other network documents


Choices in the Implementation of Rating

Abstract  

Collaborative filtering systems are software systems to help Internet users find the most valuable and interesting information, aided by other people's ratings. Some collaborative filtering systems will even tailor the filtering to your particular needs, interests, competence and values, by using rating values produced by other people with similar views as yours. This paper gives a short overview of existing such tools with links to further information, and then discusses some issues in their design and presents an architecture for a rating and filtering system. This architecture is defined such that modules implemented by different people at different places can interwork.

Source data  

By Jacob Palme, e-mail: jpalme@dsv.su.se, at the research group for CMC (Computer Mediated Communication) in the Department of Computer and Systems Sciences at Stockholm University and KTH.

First version January 1997, revised version July 1997, last change 25 July 1998.

This document is available on the WWW in HTML format on URL http://dsv.su.se/jpalme/select/rating-choices.html and in Adobe Acrobat format on URL http://dsv.su.se/jpalme/select/rating-choices.pdf.


Contents   


What is Rating?

By rating is meant services by which your selection of resources to read is guided by the quality of the resources, as specified by people who have read the resource. Rating is also known under the terms "collaborative filtering" or "social filtering".

In the Internet, rating may be applied to many kinds of resources, like web pages, messages, electronic journal papers, public domain software.

The purpose of rating may be to increase the quality of the resources you read, or to avoid certain resources deemed unsuitable in certain communities for certain groups of readers (example: violence, pornography).

In the world before the Internet, rating was commonly provided by services such as:

Rating is further described in [3].

Vocabulary

This vocabulary is partly based on [1]:
Category Value system used in rating, example "1;2;3" or "objectionable; acceptable". Also known as "dimension".
Censorship See Parental control.
Content label A data structure indicating a rating of a particular resource or set of resources. Also known as "rating" or "content rating".
Label bureau A computer system that supplies, via a computer network, ratings of resources. It may or may not also provide the resources themselves.
Parental control Software and services for use by parents and teachers to control children's' usage of the Internet. The main goal of such software is to make it impossible without special privileges do download forbidden information. Such systems might thus also be labelled Censorship systems. Compare with Peer collaborative filtering.
Peer collaborative filtering Collaborative filtering systems to be used among peers to aid each other in finding the most interesting information. Compare with Parental control.
PICS Platform for Internet Content Selection, a W3C specification for format and protocols for rating.
Rating service An individual or organisation that assigns labels according to some rating system, and then distributes them, perhaps via a label bureau or via CD-ROM.
Rating system A method for rating information, consisting of one or more categories.
Resource Object or document on the net which can be rated, such as web page, newsgroup article or downloadable software.
Scale The range of permissible values for a category.

Existing rating software and services

Peer Collaborative Filtering Versus Parental Control

Most rating software and services at present (summer 1997) are defined for the specific goal of protecting children from information which is regarded as unsuitable to them. This is thus a kind of censorship system, meant to be used by parents and teachers to control children's' usage of the Internet.

The software in such systems will partly work differently than collaborative filtering to be used among peers to aid each other in finding the most interesting information. Parental control software will make it impossible, without special privileges, do download forbidden information. Peer collaborative filtering software, on the other hand, aims at giving information to the user, and need not always remove or stop less desirable information. Also, the categories and scales are different. Typical categories in parental control is violence, sex, nudity, language, or age at which children should be allowed to see this information. Typical categories in peer rating systems might be quality or newsworthiness.

Parental control

The PICS standard [1], [2], [4] was mainly developed for parental control, and most existing implementations of PICS have this goal, even though the PICS protocols are equally useful for peer collaborative filtering.

Many systems and services for parental control are available, such as Bess, Cyber Patrol, CyberSitter, Cyber Snoop, Gulliver's Guardian, Net Nanny, NetShepherd, etc. An overview with links to further information on such systems can be found at [5].

Links to parental control systems using the PICS standard can be found at [6].

One very well-known such services is the Recreational Software Advisory Council for the Internet (RSACi) [9]. The basis of RSACi is to give objective descriptive information about rated resources, not subjective judgements. The intention is that this would enable the owner of a resource to rate his own resources. RSACi rates resources on four dimensions: violence, nudity, sex and language. A questionnaire is provided with simple yes/no questions. By answering this questionnaire, RSACi ratings are automatically produced.

The main alternative to RSACi are systems and services based on subjective judgement of what is suitable and not suitable for children of a certain age. Such services typically provide an age level, saying that a certain resource is not suitable for children below this age level. The most well-known such system is the Motion Picture Association of America (MPAA) system [7], [8].

Peer Collaborative Filtering Systems and Services

Some wellknown collaborative filtering systems and services at present (summer 1997) are:

Firefly (http://www.firefly.net) is a company which both sells collaborative filtering software and services. Firefly is used by other Internet service providers, for example Yahoo claims to provide a special ratings-based service MyYahoo (http://my.yahoo.com).

A description of how Firefly collaborative filtering works can be found at [10]. Firefly say that they compute correlations between the scores given to resources by different users, and finds those other users whose score has highest correlation to your scores. Resources which they rate highly are then suggested as of interest to you. Firefly further says that they are using a system called Feature-Guided Automated Collaborative Filtering. This means that the information space is divided into different subject areas, and collaborative filtering is then performed only within such an area.

Net Shepherd (previously at http://www.shepherd.net) started as a parental control service, but has evolved into the area of peer collaborative filtering. The description of their service in [11] seems to indicate that they (summer 1997) are only providing majority ratings by all raters, not individually selected ratings from people with similar interests and values as themselves.

Net Perceptions (http://www.netperceptions.com) markets a collaborative filtering system called GroupLens [13]. GroupLens can collect explicit ratings, or can implicitly estimate ratings based on the time a user uses to view a resource. It is mainly marketed for organisations who want to provide collaborative filtering to their own users, and is not marketed as a global collaborative filtering systems for resources all over the Internet. GroupLens was originally developed at the MIT Centre for Coordination Science [14].

Sepia Technologies, Inc. in Quebec, Canada, has developed a collaborative filtering system for movies, music and books [15].

The PICS standard

The PICS standard, developed by the World Wide Web Consortium [1], [2], [4] is a very general-purpose standard for supplying ratings. Within the PICS standard, it is possible to define your own rating system, with your own categories and scales. Your rating system can contain several different categories with different scales. For example, the four RSAC scales of violence, nudity, sex and language can as easily be accommodated as the MPAA scales of age limits for children.

When you use PICS, you first define your rating categories and scales and specify these in a particular notation [1]. Here is an example of a description of a category in a rating system specification:

     (category
     (transmit-as "hue")
     (label (name "blue") (value 0))
     (label (name "red") (value 1))
     (label (name "green") (value 2)))

When a rating system has been defined, it is then possible to distribute rating labels [2]. A rating label contains a description of one or a set of resources. It is possible to define a rating label for a whole web site, but then to supply different rating labels to subspaces within that web site or to individual resources. The rating for the whole web site is then only used when no more narrow rating label is available for a particular resource.

For HTML resources, the rating labels can be put as META fields in the HEAD of the HTML text, so that it is downloaded as part of the resource. PICS also specifies protocols for a web site to provide a special server for providing ratings of its web pages, and protocols for services which provide ratings also for other web pages than its own.

The resource being rated is identified by its URL. Since URLs [12] are not only available for web pages, but also for e-mail messages, Usenet News newsgroups and messages, etc., PICS can be used to rate all resources for which URLs are defined.

Some Problems with Rating

Some problems which can cause rating to work less well are:

  1. Too few ratings are provided to provide a good basis for rating.
  2. It may be difficult to collect ratings from users. Some systems solve this by implicitly guessing user ratings from the time the user spends reading a resource.
  3. Some raters may not do a good work of rating.
  4. People can unduly influence the rating to favour their own work, or work by their friends, relatives or co-workers.
  5. Ratings may not be set by people with the same values and views at yourself. For example, an expert in an area may prefer other choices than beginners. A resource which experts give bad ratings to, may be good for beginners. Also your values may influence your choices, for example political values may influence whether you prefer analysises based on a conservative, liberal or class struggle viewpoint, or a religious person may have different preferences than a cynical/sophisticated "modern" person.

Design of rating systems which better handle one of the above requirements may be less good for other requirements. For example, restricted selecting of who may provide the ratings may give higher-quality ratings (at least if your values and views are the same as of those providing the rating) but reduce the amount of ratings and rated resources available.

Choices for Rating

The table below discusses the interaction of two choices in rating system design.

The horizontal axis represent the choice of restricting peoples' rights to submit ratings, the vertical axis represents the choice of whose ratings to use for your selection needs.

Table 1: Whose ratings are used where?

Right to rate a resource

Everyone can input any rating (except limitations that you cannot rate your own or your friends' resources) The right to input ratings is limited in some other way, to select people most proficient at providing good ratings in some way

 

 

Use of ratings in filtering

An average of all ratings set by everyone or by members of your peer group. Advantage: Lots of ratings available. Disadvantage: Ratings may not agree with your personal preferences. Advantage: Better rating, may avoid misuse. Disadvantage: May reduce the amount of ratings available.
Ratings of people with similar views to yourself are preferably used through an automatic mechanism of comparing your ratings with those of other people. Complex to implement, but might provide very good ratings for your views and requirements. Also, this might give larger availability of ratings, since only by giving your own ratings on resources can your preferences be matched to those of other people. This combines two different ways of trying to achieve the same thing: Ratings set by those providing good ratings are given priority. This combination should not be used unless carefully analysed, since otherwise the two services can interact in unsuitable ways.

To select only certain people who are allowed to provide ratings, or to let anyone provide ratings, but base your selections on ratings made by people with your values and views, are two alternative methods of getting higher-quality ratings. Is it an advantage to combine both methods, or will they interact so that one method is better than the other?

Resources to be rated

Common goals of rating:

A single common rating for a set of more than one resource (such as a site or all resources with a certain initial part of there URLs) has both pros and cons.

Pro: It is less effort to rate sets than every single resource, which means that more ratings will be available.

Con: The quality may vary between resources within the same set.

To reduce the disadvantage, rating on sets of resources should not encompass the whole of heterogeneous web sites. As an example, a university department should sometimes be rated separately for different researchers or research groups within the department.

Rating systems

Rating services may use different rating systems. A rating system to avoid objectionable resources may for example use terms like "unsuitable for children below 15 years" or "nakedness" while a rating system for movies may use a system of * to *****.

Suggested rating system for rating of other people's resources

A category scale from 1 to 10, defined as follows:

  1. Of no value at all, to be avoided.
  2. Of very little value.
  3. Of little value.
  4. Maybe of some value.
  5. Of some interest.
  6. Of interest, but not essential.
  7. Very interesting and/or valuable.
  8. Highly interesting and valuable.
  9. Close to excellent.
  10. Excellent.

Suggested rating system for rating of your own resources

Note: These categories use terms which are not easy to misuse to give your own resources too high ratings:

  1. Flaming, jokes, advertisements, non-serious items.
  2. Ordinary personal viewpoint or discussion item.
  3. Very well-considered personal viewpoint or discussion item.
  4. Poems, short stories, novels.
  5. Art, music, fictional videos, etc.
  6. Well-considered and researched monograph.
  7. Article published in edited journal, book published by book publishing company of the kind which publishes quality books.
  8. Masters thesis at a university or of comparable quality.
  9. Paper accepted for publication in peer-reviewed scientific journal.
  10. Doctoral thesis or of comparable quality.

Architecture of a rating system

Source

This architecture was developed for and part of the proposal for an EU grant to a research project on intelligent and collaborative filtering with the name SELECT. This proposal has been recommended for acceptance by the EU, and the research project is expected to start in January 1998.

Modularisation of a filtering and rating system

If a rating and filtering systems is to be implemented by people and organisations in many different countries, then the rating and filtering system need be split into well-defined modules with a well-defined interface between them. Here is a first attempt to define this set of modules:

Figure 1: Relations between modules

(Arrows indicate the direction of information flow, not the direction of control)

Table 2: Modules in the system

Name

Description

Relations to other modules

Input of author ratings

An author can give his own resources ratings, using the scale above for author-specified ratings. Input from user interface (20), stored in RFC822 or HTML header (19), retrieved with the resource itself.

Input of reader ratings

A reader can, when reading an article, a message or a web resource, specify a rating using the scale above for reader-specified ratings.
  1. Input from user interface (1).
  2. Ratings are moved to a personal ratings data base (4), which can be used to automatically deduct better intelligent filtering methods for this user, and also:
  3. Ratings are moved to a multi-user ratings data base (2), to aid other people's filtering.

Personal ratings data base

A data base, accessible only by a certain person and agents working for that person. The data base contains a list of messages and ratings.

The data base should have news control, so that an agent connecting to this data base can download the new ratings put into the data base since the last time this agent connected to this data base.
Intelligent filtering controls (5) can scan this data base, and deduct filtering conditions based on its contents.

Social filtering agents (6) can match the personal choices in this data base with the personal choices of other people, found in a multi-user ratings data base, to deduce which other persons have similar preferences to this user, so that their ratings can be used to guide this user.

Multi-user ratings data base

A data base, accessible to rating and filtering agents. The data base contains a list of messages and ratings. For every rating, the data base contains a uni-directional encryption of the e-mail-address of the person who provided this rating. In this way, it is possible to identify ratings made by the same person, without knowing who this person is.

The data base should have news control, so that an agent connecting to this data base can download the new ratings put into the data base since the last time this agent connected to this data base.
Used by and accessible to different kinds of agents like social filtering agents (3) and intelligent filtering controls (21). Can also be used as a research data base for development of better ratings and filtering systems, and should thus be accessible for researchers. To avoid misuse, it should maybe not be accessible to anyone using any kind of software (since there is a risk of deriving the real user ID from the encrypted user ID).

Filter attribute creators

A filter attribute creator is a piece of software which derives filter attributes from a resource. Basic attributes are words (very common words excluded). Words may be transformed to a canonical form and be extended with synonyms. Other attributes are length of original and of quoted text, percentage of multi-syllable words and other genre-indicators, use of graphics and advanced HTML constructs, etc. Takes as input resources (articles, messages and web pages (17)) and produces additional data which is stored in a resource attribute data base (16).

Resource attribute data base

A data base of attributes for a resource. The attributes may be stored in inverted form, so that you can rapidly search for resources with certain attributes or attribute combinations (this is often done by network search engines like Alta Vista or Euroseek). Input from filter attribute creators (16). Output to filtering and searching agents (15).

Intelligent filtering controls

Agent which reads the Personal ratings data base, looks at the resources you liked and disliked, and deduces filtering conditions to find the resources you like and not those you dislike. Note that this agent does not perform the actual filtering, it just provides input to the Personal filtering settings, which are then used to control the actual filtering.. Input from Personal ratings data base (5). Output to Personal filtering settings (8).

Personal filtering settings

Settings which controls your filtering agents. These settings include code in a of language for specifying filtering conditions, probably based on Boolean algebra. Input from Intelligent filtering controls (8), output to Filtering agents (9) and input and output from Personal filtering control (10).

Personal filtering control

Lets you see and modify your personal filtering settings. User interface (11) and input and output to Personal filtering settings (10).

Filtering agent

Agent which uses the personal filtering settings to perform filtering of resources for you. Input from Personal filtering settings (9) and from Resource attribute data base (15), and from the Resource retrieval system (14).

Social filtering agent

Agent which uses the social filtering information to perform filtering of resources for you. Used as a subsystem by your Filtering agent (7), uses data from Multi-user ratings data base (3) and Personal ratings data base (6).

Resource retrieval system

System for getting resources from the Internet. Examples of such systems: E-mail system, Usenet News system, Web browser, Web search index provider, Web4Groups system. Input and output from user interface (13), and input and output from Filtering agent (14).

Resource data base

Existing data bases of Internet resources, such as part or whole of the WWW information space, mailing list archives or Usenet news servers. An author can Input author ratings (1) of the resources he has authored, for example, for HTML documents, such ratings can be stored as META fields in the HEAD.

Active search agent

Agent which automatically scans the net, searching for information of interest to a particular user. Controlled by Personal filtering settings (23), scans the net (Resource retrieval system) (24) and delivers results to the user (22).

Table 3: Interfaces to be defined

No.

Related modules Operation Format Protocol

1

User and Input reader ratings. User interface. To be defined by user interface experts. HTML/HTTP.

2

Input reader ratings and Multi-user ratings data base. Input reader ratings stores ratings in the Multi-user ratings data base. Might be based on PICS. PICS may have to be extended with a method of transmitting the name of the rater? To be defined, probably as a variant of HTTP.

3

Multi-user ratings data base and the Social filtering agent. The Social filtering agent can retrieve information from the Multi-user ratings data base. To be defined. To be defined, probably as a variant of HTTP. We have to decide whether much information is transported to the Social filtering agent, or whether the main processing is done in the Multi-user ratings data base and only the results transported to the Social filtering agent.

4

Input reader ratings and Personal ratings data base. Input reader ratings stores ratings in the Personal ratings data base. Can possibly be similar to 2 above. Can possibly be similar to 2 above.

5, 21

Personal ratings data base, Multi-user ratings data base and the Intelligent filtering controls. The Intelligent filtering controls can retrieve information from the Personal ratings data base. Can possibly be similar to 3 above. but the intelligent filtering controls may need more information. Can possibly be similar to 3 above but the intelligent filtering controls may need more information.

6

Personal ratings data base and the Social filtering agent. The Social filtering agent can retrieve information from the Personal ratings data base. Can possibly be similar to 3 above. Can possibly be similar to 3 above.

7

Social filtering agent and filtering agent The Social filtering agent is used as a subsystem by the Filtering agent. Can PICS be used? Can PICS be used?

8

Intelligent filtering controls and Personal filtering settings. The Intelligent filtering controls can modify the Personal filtering settings. Format for personal filtering settings is needed. Might be based on Boolean algebra, but we should also look at fuzzy logic. We should also look at Compassware (http:/www.compassware.com). To be defined, probably as a variant of HTTP.

9

Personal filtering settings and Filtering agent. The Filtering agent can retrieve the Personal filtering settings. See 8. To be defined, probably as a variant of HTTP.

10

Personal filtering control and Personal filtering settings. The Personal filtering control can retrieve and modify the Personal filtering settings. See 8. To be defined, probably as a variant of HTTP.

11

User and Personal filtering control. User interface There should be a simple mode for people who do not want to learn the language for specifying filtering conditions, and an advanced mode for those who wants to learn this language. To be defined by user interface experts. HTML/HTTP.

12

The Intelligent filtering controls and the Resource data base. The Intelligent filtering controls can retrieve resources from the Resource data base. MIME resource formats. HTTP, FTP, Gopher, NNTP, Web4Groups.

13

User and Resource retrieval system. This is an augmented version of the normal user interface for the Resource retrieval system. To be defined by user interface experts. As used in the resource retrieval system.

14

The Filtering Agent and the Resource retrieval system. The Resource retrieval system can enlist the help (input and output) from the Filtering agent. To be defined. To be defined.

15

Resource attribute data base and Filtering agent. The Filtering agent can retrieve attributes from the Resource attribute data base. Variant of PICS? To be defined, probably as a variant of HTTP.

16

Resource attribute data base and Filter attribute creators. The Filter attribute creators stores its results in the Resource attribute data base. Variant of PICS? To be defined, probably as a variant of HTTP.

17

Resource data base and Filter attribute creators. The Filter attribute creators use the normal access protocol to the Resource data base (such as HTTP, NNTP, POP, Web4Groups access protocol). MIME resource formats. HTTP, FTP, Gopher, NNTP, Web4Groups.

20

User and Input author ratings. User interface. To be defined by user interface experts. HTML/HTTP.

21

See 5 above

22

Active agent and User The Active agent delivers its results to the user To be defined by user interface experts. This might be through the user interfaces already provided by one of the Resource retrieval systems used.

23

Active agent and Personal filtering settings The Personal filtering settings are used by the user to guide the Active agent. See 9 above. See 9 above.

References

[1] Rating Services and Rating Systems (and Their Machine Readable Descriptions), by Jim Miller, Paul Resnick and David Singer, available at URL: http://www.w3.org/PICS/services.html. This document, together with [2], is the official definition of the PICS standard.
[2] PICS Label Distribution Label Syntax and Communication Protocols, by Jim Miller, Tim Krauskopf, Paul Resnick and Win Treese, URL http://www.w3.org/PICS/labels.html. This document, together with [1], is the official definition of the PICS standard.
[3] Voting and Rating: Perspectives for Information Collection, Decision Making and Collaborative Rating Using Web4Groups by Austrian Academy of Sciences. Internal Web4Groups paper, November 1996.
[4] PICS: Internet Access Controls Without Censorship, by Paul Resnick and James Miller, URL: http://www.bilkent.edu.tr/pub/WWW/PICS/iacwc.htm. An introductory overview to PICS.
[5] The Kids on the Web: Safety on the Net, by Brendan Kehoe, URL: http://www.zen.org/~brendan/kids-safe.html. A list of links to different parental control systems and services.
[6] Pics Third-Party Rating Services, URL: http://www.w3.org/PICS/raters.htm. A list of links to services based on the PICS standard.
[7] The MPAA Rating Systems, URL: www.cs.ucla.edu/ficus-members/reiher/film_miscellany/ratings.html. An introduction to the MPAA rating system.
[8] The Voluntary Movie Rating system, by Jack Valenti, URL http://www.mpaa.org/ratings.html. An overview of the MPAA rating system.
[9] Recreational Software Advisory Council, URL: http://www.rsac.org/. Home page for RSAC, with links to many informational documents on RSAC.
[10] Collaborative Filtering Technology: An Overview. URL: http://www.firefly.net/products/CollaborativeFiltering.html. A description of how the Firefly collaborative filtering system works.
[11] Net Shepherd 2.0 Frequently Asked Questions. Previously at URL: http://www.shepherd.net/products/NetShepherd2.0/faqs.HTM. A description of the peer collaborative filtering service provided by Net Shepherd.
[12] T. Berners-Lee, L. Masinter, M. McCahill, "Uniform Resource Locators (URL)", by T. Berners-Lee, L. Masinter, M. McCahill , Internet RFC 1738, December 1994, URL. http://ftp.sunet.se/pub/Internet-documents/rfc/rfc-index.txt. This is the Internet standard for URLs. There are numerous other IETF standards specifying the URL format for different kinds of resources.
[13] Building Customer Loyalty and Profitable 1-to-1 Customer Relationships with Net Perception's GroupLens&tm; Recommendation Engine. URL: http://www.netperceptions.com/product_whitepaper.html. A description of the collaborative filtering system from Net Perceptions.
[14] GroupLens: An Open Architecture for Collaborative Filtering of Netnews, by P. Resnick et al. Proceedings of ACM 1994 Conference on Computer Supported Cooperative Work, Chapel Hill, Pages 175-186, and at URL http://ccs.mit.edu/CCSWP165.html.
[15] Collaborative Filtering The SEPIA Suggestion Box ®, at URL http://www.sepia.com/suggestion_e.html.
[16] Filtering and Collaborative Filtering. Notes from the DELOS workshop, Budapest, November 1997 at URL http://dsv.su.se/jpalme/select/delos-filtering-notes-nov97.htm