skip to content

blogs

Pivots algorithms: An explanation of recommendations block on Drupal.org

You might have noticed that the pivots_block is enabled on drupal.org module pages, such as http://drupal.org/project/i18n. This blog is trying to explain the algorithms we are using in the pivots recommendation system.

OK. To put things simple, the "conversation pivots" basically finds conversations in the D.O. forum that talks about modules, and then displays the conversation threads on the module pages that got mentioned. The "double pivots" is built on top of "conversation pivots", and it calculates related modules based on the fact that relevant modules are usually mentioned in the same conversations. If you are interested in the details, please take a look at the attached file, Conversation Pivots and Double Pivots (CHI2008). That research paper was published in the ACM CHI conference this year.

However, the original "double pivots" algorithm that was described in the paper was not quite satisfactory after the first few rounds of trials. So now we are using the "cosine similarity" relevancy algorithms. This wikipedia article explains it well. To put things simple, the algorithm not only measure the total number of co-references in the same conversations, it also takes into account the overall popularity of a module.

Based on the relevancy algorithm, we implemented another 2 algorithms. One is called the "recency" algorithm. It weighs more on the recent conversation mentions about a module. This algorithm is supposed to highlight new modules and active modules. The other is called "uniqueness" algorithm. It highlights more on the modules that receive less attention. We hope this algorithm can help users discover new unfamiliar modules.

We are trying to alternate those algorithms to see which one works best for d.o. The idea is that the best algorithm will invite most click through rates. So, we plan to try different recommendation sets generated by different algorithms on D.O, each for a certain amount of time. We then use Google Analytics to measure the click through rate in each time period. If the click through rate is significantly higher in one time period than the others, then the algorithm used in that time period would be the best one.

We can use the similar technique to measure other module recommendations when new algorithms come out. Oh, btw, module recommendations based on project usage is scheduled on the way :)

PS: The database of the "conversation pivots" recommendation is updated once per day on scratch.drupal.org. So, new forum posts that mention some modules would be indexed into the database no longer than one day,given that the indexer is able to detect module references in the conversations. The "double pivots" database is updated occasionally like every 2 weeks because the the results are relatively stable.

AttachmentSize
chi1060-zhou.pdf784.26 KB

Diagram of the pivots recommendation system on d.o.

This is the diagram to illustrate our proposed pivots recommendation system deployment on d.o. To see the description in words, please refer to here.
AttachmentSize
pivots_diag.png5.44 KB

Deployment structure of pivots module recommendation block for Drupal.org

We hope to deploy the pivots block to Drupal.org recently. The block displays on a module page its related discussions and related modules. This article explains how the pivots block will be deployed on Druapl.org.

The pivots recommendation system consists of two components. The first one is the Indexer (written in Java). It's running as a cron job on scratch.drupal.org, and generates related conversations and related modules as recommendations. It has read-only permission to access 8 D.O database tables directly in order to compute the recommendations. The computed results are then saved to a separate pivots database on masterdb-other.d.o.

The second component is a patch to "drupal_org" module. It reads the pivots database, and displays the recommendations on a module page block. In order to access the pivots database, the "settings.php" file on D.O. has to have a line looks like $db_url['pivots']="...". We use db_set_active('pivots') to direct DB queries to the pivots database.

The two-components structure gives us as developers the flexibility to tweak the algorithm. We can easily update the recommendation algorithm on scratch.d.o without harming the production d.o. server. After we receive enough feedback from the community and stabilize the algorithm, we can then merge the two components and make it running only on d.o.

In the next blog, I will explain how the recommendation algorithm works.

Thanks for reading :)

Weekly report: 9.29~10.5, 2008

Last week was a busy week too. The 502 students were going to have the mid-term exam, and the GSIs were busy helping the students. I finished some other works too, including the 703/708 homeworks.

This coming week I'm going to do these things:

1. Work with the Minnesota team and Kieran to come up with a pivots usability study plan. Push pivots to go live at the beginning of the week. Make some improvements to the recency/uniqueness algorithms.

2. Some works on the Digg, MITS, Smearbusters projects.

3. Routine 703/708/GSI works.

4 algorithms used in Drupal module recommendation

At the heart of the pivots Drupal module recommendation system are the secret recommender algorithms. Currently we are playing with 4 algorithms.

  • Co-references: The more frequently module A and module B are mentioned together in the same forum discussion thread, the more related they are in the recommendation list. This algorithm is in favor of the most popular modules because they tend to get more co-references regardless of relevancy.
  • Relevancy: This uses the cosine-similarity technique. In short, this algorithm solves the unbalanced favors towards popular modules such as CCK.
  • Recency: This is based on the Relevancy algorithm, but adds a little preference to the modules that were mentioned recently in the discussions.
  • Uniqueness: This is based on the Relevancy algorithm, but add a little preference to the unpopular modules for users to discover new things.

All those 4 algorithms are implemented already, and you can take a look on scratch.drupal.org, such as http://scratch.drupal.org/project/ecommerce. Recently we receive the "project usage" data too. That helps a lot in terms of improving the quality of the algorithms. We'll try to find out which algorithm works best for the users and make it go live on drupal.org.

The code was merged into drupal.org CVS already. We are currently running some tests on scratch.drupal.org before making it live on drupal.org.

Your comments and suggests are much appreciated! Thanks.

Weekly Report (Sept.22~28, 2008)

Last week was extremely busy. I've got several work done on pivots, and the course workload was pretty heavy. Basically I did what I was supposed to do for the week.

Next week:
1. Pivots: do all the admin stuff listed in my PIM.
2. MITBBS project: finish IRB, proposal
3. 703: lead discussion, China Media ch1., project outline
4. 708: PS3
5. Routine 703/708/GSI stuff.

Good luck and go blue!!

Weekly report (Sept.14~22)

The week is getting very busy with all the course works and GSI teaching responsibilities and research work.

In the coming week, I will do the following:
1. Try to get the tracker info from D.O. module pages and try to get the pivots_block patch into the repository.
2. Work out the recency/uniqueness pivots algorithm.
3. Finish the first iteration of MITBBS program.
4. 703/708 study and 502 GSI work.

Weekly Report (Sept.8~14, 2008)

Last week was pretty busy as always, and quite productive too. I had a talk with Paul and discussed the research agenda for the rest of the semester. The two courses 703/708 and the GSI teaching sessions took much of my time.

This week I will still need to do 703/708/GSI things. In terms of research, I'll think about different algorithms on pivots and try to summarize a few points for the pre-candidacy paper.

Weekly Report (Sept.1~7, 2008)

The new semester begins last week. The first week was fine. I did the first classes in 703/708. Both of them seemed really interesting and promising. And I also did the GSI teaching, which went quite well. I think I liked it. In terms of research, I didn't do much about the pivots research because it's now holding by the patch issue queue.

This week I'm going to talk with my advisor and decide the research agenda for this term. In the mean time, I'm working on new double pivots algorithm. I'm also working on the MITBBS study projects which might fit into both 703/708. Also, I will do the course study and GSI teaching as usual. Looking forward to a productive week!!!

The new term begins

Summer is over, and the new term begins. There are lots of things going on this term -- teaching, research and taking courses -- all is exciting and challenging. Before I totally stress out (which might happen someday), I'll just copy the Olympics Motto and Creed here, that is to lighten me up and to remind me the purpose of this tough choice.

From Wikipedia:

The Olympic Motto is "Citius, Altius, Fortius," a Latin phrase meaning "Swifter, Higher, Stronger." Coubertin's ideals are probably best illustrated by the Olympic Creed:

"The most important thing in the Olympic Games is not to win but to take part, just as the most important thing in life is not the triumph but the struggle. The essential thing is not to have conquered but to have fought well."

Go BLUE!!!!!!!!!!!!!!

Powered by Drupal. CrystalX theme created by Nuvio | Webdesign.