Monday, July 20, 2009

Eighth Week Update

Before I discuss the progress of the News Feed module, I wanted to recommend an old Joel Spolsky post. Spolsky is the CEO of Fogbugz and co-founder of Stack Overflow, and this post on Evidence-based Scheduling has got me thinking about the next steps for evaluations, statistics, and tasks.

Evidence Based Scheduling (by Joel Spolsky)

Mentors can now evaluate students based on their past performance, and evaluations have been built for an arbitrary number so that in a future cycle an evaluation could theoretically be given once a week or so, but these evaluations are nonetheless discrete, backwards-looking measures.

An improvement to this feature would be to have more continuous data . Since SOC is built specifically for software projects, most projects could likely make use of a post-receive hook or other form of automation. The goal would be to not measure a student's performance in the past, but to be able to make predictions such as whether a project will be completed on schedule. This will allow earlier intervention, and will make it more possible to make the mentoring process generally more data-driven.

This would also help us measure performance at a more fine level. As Spolsky notes, as the size of a task decreases, the ability to accurately measure it increases:

When I see a schedule measured in days, or even weeks, I know it’s not going to work. You have to break your schedule into very small tasks that can be measured in hours. Nothing longer than 16 hours.

This forces you to actually figure out what you are going to do. Write subroutine foo. Create this dialog box. Parse the Fizzbott file. Individual development tasks are easy to estimate, because you’ve written subroutines, created dialogs, and parsed files before.

In the last day, I've posted a new patch for NewsFeed tasks, and I should be posting another shortly that completes e-mail notification functionality.

One major functionality that has been holding up this second patch is the ability to perform an access check that can work for the widest range of feed item senders and receivers (the NewsFeedModule wiki page has a definition of these). Because this feature needs to be able to work with a variety of model schemas, I wanted to find the lowest common denominator. Of course, this should be the use of scope logic to determine the relationships between entities. But the original specification for NewsFeed envisioned some type of updates that are difficult even with scope logic. Because of the complexity of this feature, I think it's all but necessary for me to provide some smoke tests that make sure that these checks are working properly.

And the next step in feed-customization is to provide a UI and logic for users to set their own feed preferences for e-mail notifications (the plan as of now is to allow e-mails to be customized but for XML feeds to only include public info and not be customizable).

I should soon be ready to focus more on the secondary NewsFeed features, and I'm especially interested in prototyping how we could use post-receive hooks and real-time push notifications to make the newsfeed feature more useful for collaboration.

GSoC 2009: GHOP - Eighth Week Status Update!

Hello everyone,
Have spent most of the week with re-locating and figuring out how to make internet work. Most of it is working except for Skype and IRC now. Setting up internet did not turn out to be as easy as I had thought of :(. Not a very satisfactory week this has been.

Spent some more time fixing 505s for task create/edit for mentors. Reviewed some jquery pluggins to be used for creating/editing/deleting tags and found out a plugin called jquery-tagger suitable for this task. Has to be integrated yet. And after this I started working on Task Comment infrastructure. Users are now able to post comments.

While working on Commenting infrastructure, I also re-organized the Task public view page, to now have an actions drop down(this action drop down list is built dynamically based on context and user) and then a submit button instead of different buttons for different actions. This now tightly integrates both actions and comments. Only logged in users can post comments and perform actions if there are any. The common action to all users is, "Comment without any action". In addition, if student is eligible to claim a task, he gets, "Request to claim" the task option in the list. If he has already claimed the task, he gets, "Withdraw from the task" option.

An Org Admin or a Mentor will get an option to, "Accept" or "Reject" requests if a student has requested to claim a task. In all the above cases, the comment is only optional. In case of above actions one can also do the action without posting any comment, even that is optional. The "Changes" line as shown in the issue tracker per action is being recorded in the datastore, but I am still working on display of comments.

My work for the next week will be to complete all the actions that must be available to every role, for the Task Workflow which includes Work Submission for student who has claimed the task. NeedAction, NeedWork, Closed actions for Org Admin or Mentor. In addition to it completing the commenting infrastructure. If time permits I will start working on the task notification work which was put on hold last week due to some adjustments and changes happening in how Task Queue APIs are being used in Melange.

Statistic Module: eight week update

Here the latest update of my work on Statistic Module.

Basically during the last week I almost entirely focused on the backend side. As I already wrote on the blog I was working on some abstraction layer for statistics. The goal of my job was to separate statistics from Python backend code. The situation was that for each single statistic, we needed at least one (but practically a few) functions to process it. For a request to collect statistic, logic looked for a function 'update' + statistic_link_id. At the beginning, it was easy and rather convenient solution, but when the number of statistic grew, the statistic logic ended up in having an awful lot of very similar and short functions. What is more, the worst problem was that every statistic had to be hardcoded in source code. As James Crook pointed out in one of his emails, it was a huge pitfall, because every time we want to add a new statistic, we had to add new code and *redeploy* melange.

Thus, I designed a solution to store some specific information as a json string in statistic model - a new field "instructions" were added. I will describe the meaning of all parameters and dependencies between them in the upcoming days. Generally, some parameters that previously were set by statistic specific functions, are now retrieved from instructions.
Let us take a look a the following example. We have "Students Per Country" statistic, so all students are iterated through and for each of them we checked its country and updated choices list.
Before calling collectStat function, we needed to set up al least two things:
logic: to student_logic (because we iterate students)
choices: to soc.models.countries.COUNTRIES_AND_TERRITORIES.
Of course we needed a special function named updateStudentsPerCountry and we could easily set those parameters there. We could live with that, but now, let us say, we want to add "Mentors Per Country". Previously, we needed to:
1) Add an appropriate entity to the data model.
2) add updateMentorsPerCountry function.
3) set logic to mentor_logic
4) set choices to soc.models.countries.COUNTRIES_AND_TERRITORIES
5) redeploy melange.
A lot, is not it?
Now all that stuff is done by parsing instructions.
Let us take a look at instructions for students_per_country:
instructions = {
"params": {"fields": ["res_country"]},
"field": "country",
"type": "per_field",
"logic": "student"
The most important field is "type", because it determines that we are dealing with "per field" statistic. Actually, all statistics before last week had "per field" nature.
Then, we have "logic" which means which logic will be used for iteration through entities. To get the actual statistic, we looked for the value in logics_dict dictionary.
What we still need is choices. So, we have "field" parameter and we look for the choices list in the choices_dict dictionary.
Last but not least, there is "params". It is a dictionary which is passed to collectStat function. Previously it was also set by statistic specific functions.
And basically that is all. Let us consider what we have to do now in oder to add "Mentors Per Country":
1) Add an entity to the data model with the same params as for "Students Per Country", but with "logic": "mentor" instead.
And that is all! No changes to the source code are necessary (I assume that we already have a value for "mentor" in logics_dicts.
I hope you will agree with me that it is simpler now :-)

The next thing I worked on during the last week was dealing with statistics which have no fixed list of choices. Actually I had worked on that some time before, so I had some concepts, but because of instructions usage, I had to make some changes. Here is how it is done now.

Let us say, we want to have Student Proposals Per Organization statistic. Before we start to iterate through student proposals, we have to have a list of all organizations. So, we just iterate through all organization entities (also in batches) and create list of link_ids.

The question is how do we know if a statistic has to have a list of choices dynamically collected and what to collect. The answer is: by instructions of course ;-) The only effort is to add "choices_logic" parameter to the dictionary.

As I said, I will try to provide more information about the rest of parameters soon on the wiki. The most important one is "checker" which allows to filter iterated entities depending on some criteria. For example we can process only those students who have a project assigned.

Some time ago Pawel and Sverre had a meeting about final goals for Statistic Module project. They put a list of statistics which they would really want to have. Some of the statistics were already present, but some were not. So, I also worked on them during the last week.
The new available statistics are:
* Mentors With Project Per Country',
* Mentors Without Project Per Country',
* Organizations Per Program
* Student Projects Per Country
* Student Projects Per Continent
* Student Proposals Per Country
* Student Proposals Per Continent
* Students With Project Per Country
* Students Without Project Per Country
* Students Per Graduation Year
* Students With Project Per Graduation Year
* Students Without Project Per Graduation Year
Note: As I mentioned above, one of the instruction parameters is "checker" which allows to collect all those "with/without" statistics.
* Number of Students
* Number of Mentors
* Number of Student Proposals
* Number of Student Projects
* Number of Organization Admins
* Number of Mentors With Projects
* Number of Students With Projects
* Number of Students With Proposals
The last batch of statistics is let us say: "per nothing". I mean, of course we can have it "per program", but do we really want to?
So I put them in one single entity "Gsoc2009 Overall". Its "type" in instructions is "overall" and such a statistic consists of many subsstatitics. They also use instructions, so it is quite easy to add new.

Currently, they only small statistics that are supported have "type" "number", so now I am going to add another kind: "average", because there are still two statistics left on our mentors' wish list:
* Average number of projects per mentor
* Average number of student proposals per student

As I said, I work almost entirely on backend, the only thing I did for the client side was reducing the number of columns to 2 as mentors suggested.

The last thing: Last week I sent a first bunch of patches. Lennie and Sverre, thank you for the reviews. I will take them into account and will try to send new ones by Wednesday.