You might have noticed that the pivots_block is enabled on drupal.org module pages, such as http://drupal.org/project/i18n. This blog is trying to explain the algorithms we are using in the pivots recommendation system.
OK. To put things simple, the "conversation pivots" basically finds conversations in the D.O. forum that talks about modules, and then displays the conversation threads on the module pages that got mentioned. The "double pivots" is built on top of "conversation pivots", and it calculates related modules based on the fact that relevant modules are usually mentioned in the same conversations. If you are interested in the details, please take a look at the attached file, Conversation Pivots and Double Pivots (CHI2008). That research paper was published in the ACM CHI conference this year.
However, the original "double pivots" algorithm that was described in the paper was not quite satisfactory after the first few rounds of trials. So now we are using the "cosine similarity" relevancy algorithms. This wikipedia article explains it well. To put things simple, the algorithm not only measure the total number of co-references in the same conversations, it also takes into account the overall popularity of a module.
Based on the relevancy algorithm, we implemented another 2 algorithms. One is called the "recency" algorithm. It weighs more on the recent conversation mentions about a module. This algorithm is supposed to highlight new modules and active modules. The other is called "uniqueness" algorithm. It highlights more on the modules that receive less attention. We hope this algorithm can help users discover new unfamiliar modules.
We are trying to alternate those algorithms to see which one works best for d.o. The idea is that the best algorithm will invite most click through rates. So, we plan to try different recommendation sets generated by different algorithms on D.O, each for a certain amount of time. We then use Google Analytics to measure the click through rate in each time period. If the click through rate is significantly higher in one time period than the others, then the algorithm used in that time period would be the best one.
We can use the similar technique to measure other module recommendations when new algorithms come out. Oh, btw, module recommendations based on project usage is scheduled on the way :)
PS: The database of the "conversation pivots" recommendation is updated once per day on scratch.drupal.org. So, new forum posts that mention some modules would be indexed into the database no longer than one day,given that the indexer is able to detect module references in the conversations. The "double pivots" database is updated occasionally like every 2 weeks because the the results are relatively stable.
| Attachment | Size |
|---|---|
| chi1060-zhou.pdf | 784.26 KB |

