skip to content

blogs

"Related modules" block for Drupal.org -- Past and Future

Our research group at the University of Michigan has been working on the "related modules" block for Drupal.org for more than 2 years now. We have published 2 papers on this project so far:

1) Assessment of Conversation Co-mentions as a Resource for Software Module Recommendation. Will be presented at ACM Recommender System Conference'09

2) Conversation Pivots and Double Pivots. Presented at ACM Computer Human Interaction Conference'08

In short, we have tuned the algorithm to recommend related modules based on conversation co-mentions, and showed that it works better than recommendation based on module co-installation data, and works as good as ApacheSolr More Like This module.

Our next steps are:

First, we'll further improve the quality of "related modules" block by using the clickthrough data on the recommended links. Followup details will be at http://drupal.org/node/479812

Then, we'll migrate the "related conversations" block to be based on ApacheSolr. Followup details will be at http://drupal.org/node/511514

I'm looking forward to making more contributions to Drupal.org!

Finally, I'd like to thank Gerhard Killesreiter (killes) and Kieran Lal (amazon) at Drupal.org for their support.

AttachmentSize
recsys09.pdf445.07 KB
chi08.pdf784.26 KB

Announcing the Recommender Bundle

Thanks to the support of Google Summer of Code'09, I was able to develop the Recommender Bundle for Drupal, which includes the following modules:

http://drupal.org/project/recommender
http://drupal.org/project/history_rec
http://drupal.org/project/fivestar_rec
http://drupal.org/project/voting_rec
http://drupal.org/project/uc_rec
http://drupal.org/project/similargroups
http://drupal.org/project/media_rec (not developed)

Basically, the modules make content recommendations based on such criteria as "Users who viewed this node also viewed", or "You might be interested in these nodes because you have visited similar nodes". Please refer to the project description of each module for more details.

Followup new features development and code maintenance would be coordinated through the issue queue of each module. Please feel free to use it.

Have fun using the modules! :)

Assessment of Conversation Co-mentions as a Resource for Software Module Recommendation

This paper is submitted to ACM Recommender System Conference'09, co-authored with my advisor Prof. Paul Resnick.

ABSTRACT

Conversation double pivots recommend target items related to a source item, based on co-mentions of source and target items in online forums. We deployed several variants on the drupal.org site that supports the Drupal open source community, and assessed them through clickthrough rates and subjective assessments of the relevance of the recommendations. Overall, clickthrough rates increased by a factor of 15 from the beginning to the end of the tuning process. A similarity metric based on correlation of mentions rather than mere co-occurrence reduced the problem of over-recommending the most popular modules, but additional corrections for recency and uniqueness of mentions were not helpful. Detection of more module mentions in conversations dramatically improved the quality of recommendations, even though the detection algorithm then had more false positives. Recommendations based on conversation co-mention were more effective than those based on co-installation, because co-installation data only led to recommendations of complementary modules and not substitutes.

Attached is the full-paper.

Discussion of algorithm improvement: http://drupal.org/node/479812

(due to un-controllable spamming, comments were disabled temporarily. sorry...)

AttachmentSize
pivots_v16_submitted.pdf259.16 KB

Announcing my GSoC 2009 project -- Making Drupal Smart: The Recommender Bundle

My Google Summer of Code 2009 proposal was accepted. The basic idea is to develop at least three modules based on Recommender API. For example, one module is to recommend Flash videos based on users' viewing history like in YouTube. A mockup screenshot is like this:

For more details and discussion, please go to http://groups.drupal.org/node/19894.

(This site is heavily abused by spams, and the captchas don't work well for me. Commenting is temporarily disabled. Sorry about that...)

Gaming recommender systems for fun and profit

There's a big demand from the Drupal community to add fivestar-like ratings to the contrib modules. This would be a pretty cool feature, but it has other concerns too.

In a paper presented in WWW'04 conference titled "Shilling recommender systems for fun and profit" (PDF), the researchers talked about different ways and examples of gaming recommender systems. Such things could happen because the hackers what to 1) promote certain items, 2) "nuke" certain items, and 3) disrupt the entire systems. The researchers also simulated 2 kinds of automatic attacks, NormalBot and AverageBot, and showed that recommender systems are indeed vulnerable to such manipulation attacks.

Apart from academic research literature, it's easy to imagine possible manipulation on Drupal module ratings, either for fun or for profit. New modules are especially vulnerable because a few initial false negative ratings could very likely prevent further evaluations. In fact, if you search "rating manipulation" in Google, you'll find eBay, imdb, Amazon, etc are all victims. And I heard the module rating system in Joomla suffered the same thing too.

To cope with this issue, we need more sophisticated algorithms. My advisor, Paul Resnick, is one of the leading researchers in this area. He and Prof. Rahul Sami published a paper proposing a manipulation-resistance algorithm (PDF. However, a simpler and more intuitive alternative might be to maintain two rating scores -- one from all users, one from the experts. And the experts could be defined as users who registered for more than 1 year and submitted issues, etc.

To sum up, in this blog I'm trying to argue:

  1. Module rating system for drupal.org would be helpful,
  2. But the concern of gaming the system is real and legitmate
  3. Measurements should be taken if the rating system is deployed.

My summer funding would probably come from an NSF research grant on manipulation-resistant algorithms. I'll try to make some proposals on how to prevent gaming the module rating system, if d.o. infrastructure team decides to implement it, after I read more papers in the area.

Roadmap for the pivots_block module recommendation on d.o.

A brief history to begin with ...

What's pivots_block?

The idea is to generate "related modules" recommendation based on co-citations. Suppose we have TinyMCE and FCKeditor co-mentioned together in many forum discussions, then we consider the 2 modules related to some extend. Here is a detailed explanation.

Where we are now?

With the help of d.o. infrastructure/webmaster team, we deployed pivots_block on d.o. Google Analytics reports showed that pivots_block invited 3 times higher click-throughs than the simple "New forum posts" block. Also, we found that the classical correlation coefficient algorithm received more click-throughs than the other 3 extended algorithms. In general, we think pivots_block works pretty fine for the d.o. community.

The roadmap to the future ...

The next major improvement

One key factor to pivots_block is to correctly detect module citations in forum discussions. Currently we used 1) the popular aliases such as CCK and 2) the module title names together with the keyword 'module' as detectors to match module citations in discussions. This might have missed quite a few module citations.

To fix that, we recruited a graduate student at University of Michigan and manually read through all 12,742 messages posted to d.o. forum in November 2008. By that we hope to collect a list of module aliases used by the community, and then use that to improve accuracy of detecting module mentions. The work is almost done, and we hope to apply it soon to d.o. and test if it improves the recommendation quality.

Other alternatives

One alternative is to generate module recommendations based on project usage data, which is current running on d.o. Its limitation, however, is that it tends to recommend complement modules than substitute modules, because people rarely use substitute modules in the same sites. Google Analytics showed that this algorithm had slighted lower click-through rates than the original co-citation algorithm, but not statistically significant.

Another alternative is to use ApacheSolr MoreLikeThis. This is promising because d.o. search is running on ApacheSolr already. However, to my knowledge (maybe limited), the relevancy matching algorithm of MoreLikeThis is text-based. That is, modules are related because their project text descriptions are similar. This might or might not work well for d.o. modules. But it's definitely a direction to explore.

The last alternative is to generate related modules based on module ratings such as http://drupalmodules.com/. This is a promising idea too. One concern is that it might be subjective to deliberative manipulation, indicated by some research literature. Besides, this approach is only possible after implementing a module voting system in d.o. redesign.

Action plan

First, I'd like to apply the next (probably last) major improvement of pivots_block to d.o., as described earlier, and measure its click-through rate. That would be the best we can get from the co-citation pivots algorithm.

Second, I'd like to work with the ApacheSolr team and see if we can use ApacheSolr MoreLikeThis to make "related modules" recommendations on d.o.

If ApacheSolr MoreLikeThis receives higher click-throughs, which would indicate that it's more helpful to the community, then it's better to keep MoreLikeThis. And vice versa. If we decide to keep pivots_block, my future plan then is to make it a more general-purpose module and build it on top of ApacheSolr (details will be announced later).

I'll try to finish this research in April and report it back to the community. Drupal ROCKS and hope we'll make d.o. module recommendations work better!!

Announcing the "Recommender API" module

From the experience of developing the "pivots" Drupal module recommendation system, I developed the general purpose Recommender API module. It was released today.

In fact, if you think about it, all recommender systems work pretty much the same way. Take Amazon for example, it knows what users bought what items, and then it can calculate how similar users' tastes are, and finally make recommendations based on that. Take "pivots" (based on project_usage) for another example. It knows what sites are using what modules, and then calculates related modules based on the fact that related modules are used in the same sites.

In Drupal, we have a lot of such relations as the users-items relation in Amazon, or the sites-modules relation in pivots. For example, we have the nodes-votes relation in VotingAPI, or the users-products purchasing relation in Ubercart, or the nodes-terms relation in Taxonomy. You name it. Those relations can be used to calculate similarity among items, and then generate recommendations.

The question is how to calculate similarity among items, and how to generate recommendations. That's where the Recommender API module comes into the picture. It provides a set of general purpose APIs that take into such users-items relation as input, and then calculate similarities and predictions. The module itself doesn't have any end user interaction, but it provides powerful APIs for other module developers to write cool recommender-based modules.

My hope for this module is that it could give rise to a wave of new modules development involving recommendations, e.g. Ubercart purchase recommendation, personalized ads, etc. I also wrote a simple example module User-to-user Recommendation to show how easy it is to use the API. I'm hoping Content Recommendation Engine and Similar By Terms modules could also take advantage of this module too, and support other recommender algorithms that might generate better results in different cases.

The Recommender API would be devoted to provide more algorithms (like PageRank, SVD, PCA, etc.), more features (like manipulation-resistance, etc), and enhanced performance.

You comments/feedback would be much appreciated :)

"Related module" recommenations based on project_usage.

Previously, 'related modules' were generated based on discussions in d.o. forum -- if several modules were mentioned in the same discussion threads, we consider them to be somewhat related. (More detailed explanation of the algorithms can be found in my previous Planet Drupal blogs).

We now developed a new algorithm that generates "related modules" based on project usage -- if several modules were enabled in the same site, then we consider them to be related. We got the project usage information from the "project_usage" module (http://drupal.org/project/usage) The core algorithm was based on the statistical concept, "correlation" (http://en.wikipedia.org/wiki/Correlation).

Usually, a Drupal site will only enable a set of modules that are complementary to each other, rather than a set of modules that substitute each other. Therefore, the new algorithm will have a bias towards generating more complements than substitutes. We don't know if that works better than the previous algorithms.

If everything goes well, the new algorithm will be enabled soon on d.o. We'll continue to use GA to see if the new algorithm works better or not.

Recent GA results for "pivots_block" module recommendation system

From the last Google Analytics (GA) study on the usefulness of "pivots_block" on 4 recommendation algorithms, we learned that the classical "relevancy" algorithm generated the better results. Therefore, we used the relevancy algorithm on D.O. from Dec/4/2008 to Jan/9/2009. And the average click-thru rate was 0.474%.

Then, we were interested in whether 0.474% was indeed a good indicator to claim "pivots_block" was useful to users. So we temporary switched "pivots_block" to the "New forum posts" on the module pages, and try to compare if the "New forum posts" could invite more clicks. It turned out that the average click-thru rate for "New forum posts" between Jan/9/2009 and Jan/18/2009 was only 0.144%. That means, more or less, on the module pages "pivots_block" is almost 3 times more useful than "New forum posts".

Now we are much more confident that "pivots_block" and its intended goal of providing module recommendations is indeed useful to the community. So we are going to switch back to "pivots_block", and continue to make improvements to it.

One thing we learned from this study was that users seem to like more "serendipity" in the "pivots_block". One problem for the current algorithm is that the recommended modules for a particular module remains quite stable over time. And we are going to try out an approach of adding a little bit variability to the recommendations.

The new algorithm is coming soon!

Pivots module recommendation system Google Analysis results

We developed 4 module recommendation algorithms and tested them on Drupal.org. And we used Google Analytics and tracked the click-through rates. The overall click-through rate was 0.263%, co-occurrences 0.097%, relevance 0.141%, recency 0.114% and uniqueness 0.138%. The relevancy algorithm appeared to have the highest click-through rate, but it was only significantly higher than the co-occurrences algorithm. Despite that, we can still say that the relevancy algorithm is the best alternative among the 4.

Now we have the baseline for future module recommendation algorithms, such as recommendations based on download statistics, or on module usage statistics. The new algorithm has to have a better click-through rate than the relevancy algorithm.

The 0.141% click-through rate, however, does not seem to be a large number. A general rule-of-thumb is that the best online ads would have 1% click-through rate. But it might be calculated using different method in different context. Therefore, I'm thinking to test the click-through rate if we put the "New forum posts" block on the module pages. Hope it can be done soon.

Powered by Drupal. CrystalX theme created by Nuvio | Webdesign.