skip to content

blogs

Weekly report (Aug.25 ~ Aug.31)

What I did last week:

  • Attended several GSI seminars, preparing to be the 502 GSI
  • Worked on the Facebook dev competition; discussed the possibility of developing the expertise tagging app on Facebook
  • Developed a Python program that grab content from MITBBS. It would probably be my 708 project, might prepare me for my prelim research.

The new Fall semester is starting next week. What I will do in the coming week:

  • Try to make the pivots_block patch applied to "drupalorg" module, and make it available to users. (Didn't do much things on that last week. Need to catch up.)
  • Prepare and deliver my first two 502 GSI sections.
  • Start the study of 703 and 708 (If that's too much for me, I'll have to select different courses)
  • Extract data from MITBBS raw crawling. Find other ways of automation (maybe through direct telnet programming)

Weekly report (Aug.18 ~ Aug.24)

What I did last week:

  • Developed the patch to "drupalorg" module that migrates the pivots_block from s.d.o to d.o
  • Attended the SI502 GSI meeting, and prepared for the course
  • Finished self-study Python
  • Started to develop a Python app that crawls MITBBS (this might be used in SI708 and my online deliberation research
  • Explored the possibility of attending the fbFunds program

What I'll do this week:

  • Accelerate the process that applies the patch to "drupalorg" module
  • Attend all the required GSI sessions
  • Think about the pre-candidacy paper and the pivots emperical study
  • Finish the Python app development (in order to get some handson experience before SI502 starts)
  • Explore the fbFunds competition (deadline: Friday midnight)

Weekly report (Aug.4~Aug.17)

What I did in the last 2 weeks.

  • Developed the "cosine similarity" algorithm for the pivots system.
  • Finished the Regression II class.
  • Finished the first round study of Stata/Mathematica software packages
  • Read some books on Deliberative Democracy
  • Worked on building China Green Party website, in hope to build an online community for future research.

What I will do in the coming week.

  • Migrate the pivots system from S.D.O to D.O.
  • Read more papers on Deliberative/Conversation research
  • Prepare next semester (teaching and courses)
  • Build an initial version of the China Green Party website

Latest usage analysis and future development plan

We now have two more weeks of usage data on authenticated D.O. users from 7/28/2008 to 8/11/2008. The data shows a steady pattern of usage overtime. Please refer to the attachment for a brief report. We are now quite confident that the pivots block has certain values to users in terms of module recommendation.

The current implementation was deployed on S.D.O. So when a project page is brought up, the pivots block will first fetch data from S.D.O and then render the page. Such a process makes S.D.O a bottleneck and sometimes slows down project page loading. This problem will go away if the pivots block is deployed on D.O.

Currently the pivots block is disabled temporarily because of the problem mentioned above. We have collected 3 weeks of usage data to show that the pivots block is helpful to users, and we are hoping to migrate the system from S.D.O to D.O. (that will eliminate the project page loading problem).

To sum up, the next step is to develop a patch for the Drupal.org internal module, and hopefully it will be approved by the infrastructure team. I hope to finish that job no later than the beginning of next week.

AttachmentSize
pivots_analysis.pdf268.35 KB

Switching "related projects" pivots algorithm

Previously we computed the "related projects" by using the frequency of project co-mentions in discussions. For example, if the module Fivestar and CCK were mentioned together in 33 different discussion threads, whereas Fivestar and jRating in 10 threads, then we would think CCK was more relevant to Fivestar than jRating because it has more co-mentions.

However, that algorithm ignores the fact that some very popular modules like CCK simply have more chances to get mentioned, even if it's not that relevant in the context.

To make a balance, now we employ the new algorithm called "cosine similarity"(a.k.a. correlation coefficient). From my personal observation, the results seem to be more relevant than the previous algorithm.

I'm now collecting data to see whether the new algorithm is more useful and invites more clicks than the other algorithm. If everything goes well, we hope to make it available to anonymous users soon.

You comments would always be much appreciated. Thanks!

Drupal users welcome the Pivots recommendation block

The pivots system was enabled on D.O. to all authenticated users on 2008-07-22. This analysis report is based on the data we collected from 2008-07-22 midnight to 2008-07-28 midnight (6 days in total). There were 10454 distinct user IDs from 14701 distinct IP addresses participating during this period.

Comparing to the previous 1-month study on Web-maintainers, the click-through rate of Conversation pivot drops from 2.76% to 1.81%, and that of Double pivots rises from 1.25% to 1.31%. They are both higher than the 1% click-through rate (which is generally the score of the best online ads). The survey responses are almost 100% positive.

So far so good :)

(The report is in the attachment)

AttachmentSize
pivots_analysis _0728.pdf74.81 KB

Weekly report (July 21 ~ July 27)

What I've done this week:

  • Made some code improvements to the pivots project for Drupal.org
  • Worked on Regression II
  • Finished the "Statistics" book
  • Read some papers in online conversations (but didn't manage to read much of the Deliberative Democracy book
  • Spent some time on developing the Chinese student organization website at http://umcssa.info, which is going to be a potential community study project

What I plan to do next week:

  • Finish the "cosine similarity" double pivots algorithm.
  • Prepare to launch "Recent topics" as the comparison baseline
  • Spend some time on Regression II
  • Spend some time on Deliberative Democracy; try to find research topic in that area through reading, observation and thinking
  • Limit time spend on UM-CSSA community-building, or other extra-curriculum activities

Weekly report (7.14-7.20)

What I did last week:

  • Enabled Pivots to D.O. (thanks to the help from Kieran and the community)
  • Finished ICPSR Regression I class
  • Finished reading the book "Political Science"
  • Finished reading the first 10 chapters of "Statistics"

What I'll do this week:

  • Improve double pivots and fix small interface problems
  • Finish reading the book "Statistics"
  • Read several books/papers in Online Deliberation
  • Start ICPSR Regression II

Pivots recommendation system for Drupal.org: Help people find modules

Greetings all,

From now on, I'm going to report the project progress of the "pivots recommendation system for drupal.org" on this blog.

The "pivots" project is an attempt to generate module recommendations to D.O. users. The idea is to display on a module page the related forum conversations and related modules that are referenced in the same conversations. We think it could provide useful information to users when they evaluate the modules.

We have deployed the system on D.O. to site maintainers and CVS account holders since June 1, 2008. More than 900 users has tried using the system since then. Statistics shows that the conversation pivot block got a 2.76% click-through rate, the double pivot block got a 1.25% click-through rate. Survey shows that 73.8% feedback on the conversation pivot was positive, and 77% positive feedback on the double pivot. To see the brief report, please click here. To see the full report, please click here.

The study gives us reasons to believe that pivots would be helpful to D.O. users in finding module recommendations. And we are now planning to make it available to more users. In the meantime, we will continue to make the algorithm works better. People's feedback and suggestions would be much appreciated.

Lastly, I'd like to thank Kieran Lal and my advisor Prof. Paul Resnick for their constant support.

Thanks!

--Daniel

AttachmentSize
pivots_analysis_brief.pdf287.28 KB
pivots_analysis_full.pdf504.47 KB

Summary: ICPSR Regression I (June 23 ~ July 18, 2008)

This ICPSR summer course was taught by Sandy Schneider from MSU. The focus of the course was to help students learn how to use the Regression Model to 1) describe, 2) explain, and 3) predict real problems. The textbook I used was Basic Econometrics, 4th Edition, by Gujarati.

Just as many former students said before, this was not a difficult course, and we can even say it was an easy one. However, through the course we built up very solid fundamental understanding of the Regression model, and we learned how to use Stata and solve real research problems. We can apply what we learned immediately to our own research! Compare to ECON 671 Econometrics I, I would prefer the ICPSR course. ECON 671 spent too much time on technical details. It would take 3 classes just to prove x-hat is the Best Linear Unbiased Estimator, which I think we only need 10 minutes to understand the concept and the conclusion. Frankly, ECON 671 didn't teach me much about applying Regression to research. So, I would recommend ICPSR Regression I to those who want immediate application.

Here are the things we learned from the course:

  • The interpretation of regression coefficients and the p-values.
  • Assumptions required for regression models
  • Goodness of fit, residues, R-square, adjust R-square, Pearson's r
  • Multiple regression. Model specification. How to compare different models. How to decide whether to include/omit a variable.
  • Standardized form of coefficient, the beta value in Stata report
  • Dummy variables. We need to omit one category as the reference point.
  • Residue plot
  • Muticollinearity (VIF>10)
  • Outliers (dfits - the big the value, the more influential)
  • Some examples of non-linear regression models.
Powered by Drupal. CrystalX theme created by Nuvio | Webdesign.