Melange Development Blog: 2009

Thursday, August 20, 2009

GSoC09: Statistics module final weekly update: Javascript fireworks and Code Swarm! :)

It's not a secret that I truly love Javascript language :)

So, as the last week of GSoC went by, I wanted to finish some of the work about Melange Javascript refactoring that I begun on May/June, just to give a real starting point and solid bases for a renewed Javascript layer for Melange.

So, first of all, I've learnt some advanced bash programming, thanks to the awesome Mendel's Cooper online book, "Advanced Bash-Scripting Guide" (PDF version). I found it very well written and really useful. Having learnt at least enough to deal with variables, loops, conditionals, exit statuses and functions, I've made a script to run shrinksafe over all our Javascript files during build and before deploying. In this way we can shrink our JS files to a half, and so it'll download faster once live.

Then, as we're using pylint to check proper style of Python files, I've made a bash script file to run JSLint over all our JS files, and also I've coded a big-mega patch to make all our JS files JSLint compliant! And here the final result..!

Furthermore, I've written JSDoc style comments in melange.js and melange.graph.js files, to let the community have an example on how to document JS files properly (it's very difficult to communicate semanthic of JS files to JSDoc Toolkit!). Running JSDoc Toolkit over those two files made documentation for private (developer) and public (only API) purposes. Following a sample excerpt of the final result.

Apart from that, I've also achieved this:

Seems not so exciting, huh? Well, what this alert box, behind the scenes, means is that Melange now has a facility to expose public JS API more or less like Google does (with Maps API and so on). Using this facility, I've made possible an up-to-date live embedding of statistics data to any web page, by inserting only a a script tag (and a div where the visualization needs to be shown).
There are two kind of exporting:

LIVE: this connects the web page to a widget in the statistics dashboard. This means that if you change the visualization to a pie chart or the "virtual statistic" to another one, the visualization in your web page will be changed as well.
FREEZED: this exports the current visualization in the widget, and it will never change.

So, more things can happen in the future with this new opportunity (manage a wiki of visualizations, for example), but also embedding statistics into document pages in Melange (this is very straightforward to achieve now). And, now, exposing JS APIs, we can think of many exciting new features (what about something to be embedded in the organizations' real home pages?). Along with JS refactoring and seed_db frontend these could be the next steps for my post-GSoC participation in Melange community.

During the beginning of this week I've also made two simple code swarm videos (coloured by programming language).
This one is for our main branch Melange repository:

And this one is for Melange statistics branch repository (which is merged with main repository).

Before ending the post, I would like to thank everyone involved in Melange community, my mentor Pawel Solyga, our project lead Lennard De Rijk, Sverre Rabbelier, my "in-place guide" James Crook (which I met several times here in Dublin, thank you again for everything!) the other GSoC students (Daniel Hans, with whom the statistics module has been possible, Madhusudan C.S. and James Alexander Levy), not forgetting the great Daniel Diniz.
And obviously Google for making GSoC possible, and Leslie Hawthorn and Ellen Ko for their hard work on it.

Also, last but not least, I would like to give a huge hug to OpenStreetMap community (and my former-year mentor Frederik Ramm), with which I've shared GSoC 2008, and with which I've had the opportunity to learn Javascript language in depth. Surely, as I've done during this year, I'm going to stick around and contribute to OSM community as well.

Tuesday, August 18, 2009

Twelfth Week NewsFeed Update

As the GSOC program wraps up, I'm finishing with refactoring and testing, and have added a couple new important features in the last week:

* Subscribe-By-Star. The subscribe-by-star pattern allows for an e-mail subscription for a particular entity to be added or removed by toggling a star on the news feed box on or off. Aside from the popup help message explaining how the feature work, the subscribe-by-star is working on http://jamstage.appspot.com.

The ajax itself is simple. After using jQuery.ajax() to make the call, I first process the POST data so that it is suitable for logic (converting byte strings to native types, mostly):


def edit_subscription(request, *args, **kwargs):
  if request.GET.get('entity_key'):
    entity_key = request.GET.get('entity_key')
  else:
    return http.HttpResponseServerError()
  if request.GET.get('subscribe') == 'true':
    subscribe = True
  elif request.GET.get('subscribe') == 'false':
    subscribe = False
  else:
    return http.HttpResponseServerError()
  subscription_logic.editEntitySubscription(entity_key, subscribe)
  return http.HttpResponse('OK')

I then call the editEntitySubscription subscription logic method:


    entity_key = db.Key(entity_key)
    user = user_logic.getForCurrentAccount() 
    subscriber = self.getSubscriberForUser(user)
    if subscribe and entity_key not in subscriber.subscriptions:
      subscriber.subscriptions.append(entity_key)
      logging.info('added subscription for entity_key %s for user %s'
      % (entity_key, user.key().name()))
    elif not subscribe and entity_key in subscriber.subscriptions:
      subscriber.subscriptions.remove(entity_key)
      logging.info('removed subscription for entity_key %s for user %s'
      % (entity_key, user.key().name()))
    else: return
    db.put(subscriber)

I've also added a model-based class method for determining auto-subscriptions, which will help for what I've found to be the hardest part of this project. While the goal is simply to get all users with at least read-access to an arbitrary Linkable entity, it is actually very difficult, and when possible it's only from manually creating logic. Essentially, this list of users should be sharded into its own relational entity (just a reference to the Linkable and a ListProperty of users). And I've also made various other rafactorings of the subscription code.

I look forward to committing code this week! At almost 2,000 lines, this will be a big commit, but worth the effort.

GSoC 2009: GHOP - The work begins now!

Hello everyone,
That was the last week of GSoC. Worked on very few things last week. Last week, I mostly worked on creating menus for easy accessibility of GHOP Tasks per organizations for general public and students. I also worked on Student data store model conversion to introduce School Type for students and wrote a Appengine task to convert the existing Student entities. I also added another field to hold the student grade if he is a High School Student. All other things like cleaners for these fields and views were already written during the beginning days of GSoC. Just merged them together.

Also made some modifications to access.py module to accommodate program logic as parameter than hard coded logic. Also have fixed the bugs Lennie had noted and made some changes to the subscription star toggling among other things accordingly. My demo instance is up and running at http://melange-madhusudancs.appspot.com/ and includes all the above said features.

Have also started writing the Starter Manual on how to use GHOP features for its users. I have setup a wiki page at http://code.google.com/p/soc/wiki/GHOPUserManual It is a work in progress. I will complete it soon.

And coming to the reason why I said "The work begins now!" is that, I really want to see this programme happen. If that happens I am quite sure there will be many feature requests for GHOP, which I am looking forward to implement. We want to listen to the community for the feedback on how we can improve GHOP and make it better and make it happen.

Last but not the least, I would like to thank Lennie a lot, my mentor in particular for extending his helping hand in every way possible and to have spent so much time for mentoring me. He was there every time I had a doubt or I had a question or I had a problem or I was feeling like pulling out my hair trying to resolve a bug. He spent much more time in mentoring me than I expected at the start of the programme. Thank you so much Lennie. Also I would like to thank Sverre and Pawel and everyone else on the Melange team who helped me to get this project going.

-- Thank you,
Madhusudan.C.S

P.S When can I call myself a Melange developer :P ?

Wednesday, August 12, 2009

Newsfeed Module Update

This week, a lot of the work I'm doing is of the fixing/testing/documenting variety.

There are two new features that change the functionality of the newsfeed in notable ways.

One is PubSubHubbub support. It's working, and was surprisingly easy to get running.

It works with a hook in the newsfeed update process. Remember how the feed update process includes a sender (the updated entity) and a list of receivers? Because it's the feed of the receivers thats is being updated, the pubsub ping is made for each of these receivers.

First we get the feed URL for the receiver:

# resolve the URL of the entity's ATOM feed
entity_feed_url = news_feed.getFeedUrl()

Then a POST request is sent to a central hub that includes the feed_url:

headers = {'content-type': 'application/x-www-form-urlencoded'}
post_params = {
'hub.mode': 'publish',
'hub.url': entity_feed_url,
}
payload = urllib.urlencode(post_params)
try:
# can these be sent in a batch?
response = urlfetch.fetch(HUB_URL, method='POST', payload=payload)
except urlfetch.Error:
logging.exception('Failed to deliver publishing message to %s', HUB_URL)

Wow, that was easy, wasn't it?

The second feature is the subscribe-by-star pattern, which already appears in plenty of other Google products and should be useful for people who don't like/understand RSS feeds, since subscribing by the star signs you up for email updates.

Finally, I've posted questions that would be useful to ask Melange users in an optional survey.

The questions are on a survey on my staging server: http://jamstage.appspot.com/survey/take/program/google/gsoc2009/updatepreferences

Tuesday, August 11, 2009

GSoC09: Statistics module 8th weekly report: Refactoring, fixes... and jQuery 1.3!

So... the penultimate week of GSoC came and passed, being a great moment for maintenance tasks like refactoring, bug fixing and coding small features left behind during project track. Beside that, following a Madhusudan's proposal, we've finally upgraded to jQuery 1.3 (commit 5723f329d0) and to JQueryUI 1.7 (commit 4e2789b8e8)!

Here's the story: following Pawel's suggestions, I've made little changes to dashboard's code to transclude itself when encountering DIVs with class "melangeDashboard" instead of MELANGE tags with attribute type "statistic_dashboard". Furthermore, I removed the fancy stack menu (that was too different from the rest of the application style) and put only a "new chart" link in the upper side of the page. That should be a temporary solution, as more features will come into place we'll integrate another menu.

A couple of bug fixes:

Before this week, if you changed the visualization type right after widget creation, it wouldn't be saved on the backend. It would have happened only if you changed it after one dashboard reloading. Now it's fixed.
Fixed the bug that Daniel Hans mention in his post: now, when switching the virtual statistic, the visualization is properly refreshed.

I've focused also on some refactoring. First of all I've finally changed what Daniel Hans and Lennie suggested me when I've sent patches to the mailing list (mainly on Python code) :)

The main work, however, has been on the Javascript side, mainly to avoid code duplication (for example code for updating a widget in the backend was present three times) and to separate dashboard code and widget code better. For example, before the refactoring, code to move and delete widgets in the dashboard was written in the widget code, while these are functions more related to dashboard logic.

But while porting the logic to move widgets inside the dashboard has been an easy task, as the dashboard already contained all the code to make the widgets move (using jQuery sortable plugin and binding the function to update the backend to the "stop" event), widget deletion has not been so easy to accomplish. Why? Because the deletion button is properly created in the widget class, and the function to delete the widget in the backend is in the dashboard, so in another scope.

That could be easily workarounded if I made the deletion code a "privileged" function of the dashboard... but, to me, this is not correct design. Deleting a widget (and so the chart entity in the backend) is a crucial task, so the function that does this should not be callable from other code. The solution? Publish/Subscribe pattern! While this is a built in behavior in Dojo, this is not the case in jQuery namespaced events (they can be used as topics), but then it happens that using a "click.widgetdeletion" namespace, when it bubbles up through the DOM, it loses the "widgetdeletion" part (so appearing as a common "click" event). Furthermore, to have a clean code, I would have liked to pass a reference to the widget instance with the event, but no data could be passed along with the event in jQuery 1.2.6.

So I've made a quite ugly hack, using common event delegation, detecting the target object and calling the deletion function if it has the "remove" class (which happens to be the class of the remove button... but what if we want to call that class with some other name in the future?), and using jQuery traversing to detect the ID of the widget from the real id of the parent div DOM node (which has a pattern like widget-IDNUMBER, that is extracted using a simple regex) and then using it to retrieve the widget instance from the widget list array that is present as a variable in the dashboard. Something like:


var widget_id = jQuery((jQuery(event.originalTarget).parents(".widget"))[0]).attr("id");
var widget_instance = widgets_list[/^widget-(\d*)$/.exec(widget_id)[1]];

Pretty ugly, isn't it?

But then, after jQuery 1.3, and using the new Event object, all is smoother now:

This is what happens after confirming deletion:


jQuery(this).trigger({type: "widgetdeletion",widget: _self}).remove();

You can notice just an event triggering, with a specific type (as a publish/subscribe topic) and passing the widget instance along with the event. In the dashboard, catching the event is as easy as:


dashboard.bind("widgetdeletion",function (event) {updateOnDelete(event.widget);});

That is, binding the "widgetdeletion" event and calling the proper function, passing it the widget instance reference. Simpler, cleaner and not prone to be messed if something in the HTML code changes (as the classes assigned to a widget or to its remove button).

Some enhancements made during the week has been the live editing of widget titles and support for the new get_available_statistics function (made by Daniel Hans), that has proper rights checking and returns all statistics (GSoC/GHOP, 2008/2009..and so on) with their proper scope path and link ids.

For refactoring purposes again, I've tried to look for a suitable browser side templating engine. In Dojo, there's the faboulous dojox.dtl Django porting. But what about jQuery? There's a bunch of technologies out there (made as jQuery plugin or standalone), as:

Standalone
Google JSTemplate
Ajax Pages
Pure (this can be used also along with jQuery)
JBST
jQFRagment
JSHTML
twoBirds
JsonFX.NET

jQuery plugins/dependent
jTemplate
chain.js
Nano
John Resig's micro templating
Some other guidelines plus fixes to John Resig's micro templating
Custom made 1, custom made 2

It's really not easy to deal with all these different options, and to explore major advantages/drawbacks. I'll try to do my best during this week, because it's really a must-have for us, to avoid ugly mixing between JavaScript and HTML code.

Monday, August 10, 2009

GSoC: Statistic Module update

Here is my next status update of the eleventh week of Google Summer of Code 2009:

First of all I finished working on displaying multiple number of visualizations inside one widget. Let us consider the standard 'Students Per Degree' example. Now you create a new widget and change visualization between:
'Students Per Degree (all)',

'Students Per Degree (with projects)':

'Students Per Degree (without projects)':

'Students Per Degree (cumulative)':

The last one represents all three types in a single visualization.

This new option works pretty well on statistic 'visualize' page (you can get there by clicking 'visualize' link on the page which lists all statistics).
It does not perfectly work on the actual statistic dashboard. There is a bug: when you choose a new statistic to be displayed, the visualization does not change. Anyway, when you change a visualization type, a new visualization for the new statistic is displayed. To be fair, I have not found a solution for this issue yet. I tried to debug that JavaScript scripts for a couple of hours and even discovered what was wrong, but could not fix that. I asked Mario for some help and he said he would try to take care of that.

The next thing I did was preparing a new series of patches based on Sverre's suggestions. The patches that were already fine, I already pushed to the main repository.

Then I started to work on access issues. Each statistic should have information about who is allowed to see it. Some statistics should be visible only to program administrators, while some others can be seen also by organization admins. I added a new 'read_access' parameter to the statistic model. Currently frontend may retrieve a complete list of statistics available for the current user by calling /statistic/get_available_statistics request.

GSoC 2009: GHOP - Eleventh Week Status Update!

Hello everyone,
A week of frustrations come to an end. The goal for last week was to get the Tag system in place for GHOP. The goal included giving the Program Admin to Add/Delete and Edit Tags. And you might already know that we are using a modified version of Taggable-mixin for tagging in Melange.

Although Adding and Deleting tags looked very trivial to me, edit was not so trivial. The first reason being, the key name for the tag stored in the Datastore used the tag name. So every time you edit a tag a new key needs to be created, but all other properties must be retained, so tag must be copied to the new entity. Also over and above this, on the UI side I was not even sure how to allow Program Admin to Add/Delete and Edit a tag in the same interface. After a lot of struggling and battles within and frustrations, I thought for a GHOP Program number of types of Program defined tags are fixed, one is for Task Difficulty level and other is for Task Type. So I created 2 pages, one for each where one could do all the 3 actions(Add/Edit/Delete).

For accomplishing this per Task page, I have used a JQuery plugin called, in-place-edit. This allows us to edit a tag in-place. So one can click on Add button to add new tags, and make the contents of the tag empty to delete it or edit the tag name. The editing of the tag name automatically copies the old Tag entity to the newly created Tag entity. And each creation, edit and delete happens via AJAXy calls, so one need have to save the changes after editing the page completely. Program Admin can just exit the page.

Once I solved this problem, a new problem arose. Usually tags have to be sorted in some order, say for example for difficulty level tags, it must be sorted from easiest to hardest. But tags need not be created in the same order. For example, Program Admin might initially think, only Easy, Medium and Hard are sufficient, but might later realize a Trivial difficulty level is required. So this needs to be ordered in the order enforced by the Program Admin. Once again I had to goto JQuery for help. I used JQuery UI's sortable plugin already in Melange code base to drag and drop sort the list of tags. So ordering is implemented as well.

After this, the interface to Add Tags to create/edit tasks had to be added. For a difficulty level tag, it is a normal drop down. But a Task can be of multiple types, like Documentation and Translation. So for this I have used an HTML Multiselect box. Also there is a text box for adding arbitrary tags. At the moment, these two types of tags are merged and stored as the entities of same TaskTypeTag model in the datastore with mandatory property=True for Task type tags and False for arbitrary tags. But Lennie wanted this implementation to be changed and separate them into 2 models. This requires a good amount of re-work. I will do it this week.

After all these, I started working on Search for tasks and happily created multi selects for Organizations, Task status, Difficulty Level, Task Type and started generating queries for them. It was then Lennie, made me realize that there can be only 30 subqueries per query. So this kind of search mechanism won't work. I have currently stalled my work on it. I am thinking of work arounds to this. Once we get a fair idea of what to do, I will resume the work on Task search.

Amidst all these work, I have been fixing bugs, making some implementation changes like Task state transitions etc. This week, I will be mostly working on writing a starter manual, fixing bugs and student data model conversion to include whether student belongs to High School or University.

Last but not the least, all my work till now is available on my demo instance at, http://melange-madhusudancs.appspot.com

Tuesday, August 4, 2009

Tenth Week Newsfeed Development Update

At the end of last week, I submitted a major refactoring of the newsfeed component, which now includes a separate subscription component that manages e-mail subscriptions (which can contain more private information than the public ATOM feed).

After receiving feedback on this latest patch, the plan is to commit the Newsfeed. Before I commit, one very important change is to taskify the sendMail method of the mail dispatcher. Obviously, this doesn't just affect the Newsfeed component, but it's very important for newsfeed subscriptions as I discovered after deploying and testing on my jamstage.appspot.com staging instance. The e-mail notification processing is done from within a task, and when a task returns a 500 error the task is repeated. Because the task often sends out many e-mails at once, I hit a OverQuotaError from this task, which re-triggered the task, and it wasn't long before my e-mail inbox was flooded with e-mails.

This has been a good lesson about working with the Task Queue API. It's good that a task will retry if it fails, but this means that only atomic (and idempotent) functionality should be contained within each task, so that if it fails the retry behavior will not bring about undesired behavior.

Once the Newsfeed is commited, I'd like to finish the UpdateLogic which is necessary for automatically compiling subscriptions. This means that we need a list of users with read-access. I describe the problem further in the NewsFeed wiki page:

The subscriptions logic contains an UpdateLogic class that is meant to automatically compile a list of users with read-access to an arbitrary entity. Unfortunately, setting subscriptions manually is the only option right now, since the automatic subscription handler is not nearly where I hoped it would be by now. The idea is very simple - just subscribe users to the entities where they have read-access. But the reality has been much more difficult to do without creating an entirely new infrastructure, and I'm entertaining the idea of simply relying on a manual interface (subscribe-by-star). The takeaway from this is that perhaps the access component should be abstracted to a high-level API, because it appears to have gained dependencies and complexities that prevent it from being useful to the news feed logic.

Finally, I plan to implement a simple PubSub example implementation, and do more in-depth testing (especially of edge cases, etc.)

GSoC 2009: GHOP - Tenth Week Status Update!

Hello everyone,
Last week has been a very nice week to me :P. As in most of the stuffs got completed or there are only very minor issues to be fixed. I started out the week with completing GHOP Task Workflow and I completed it. The complete work flow for the GHOP Task is up and running on my demo instance at http://melange-madhusudancs.appspot.com.

Later I worked on GHOP Task Subscription and also the Notification system. Now a user can subscribe to a GHOP Task and can get both On-site notifications of Task updates and also emails. Even this is deployed on my demo instance. Along with that I worked on GHOP Task Work Submissions for students who have claimed the Task and want to submit it. The Work Submissions are listed just after the Comment box and actions, followed by the list of comments.

In addition to all these I added permalinks to the Comments and Work Submissions that are displayed along with a GHOP Task. One can use those links to directly scroll to that comment or work submission on the comment page.

This week I will be mostly working on Tags Views and GHOP Task search filters.

Monday, August 3, 2009

GSoC09: Statistics module 7th weekly report: Organization home page map and fixes

Last week has been an "exciting" one, because Lennie and I worked close together to the common objective of solving a too-old issue of Melange: we've now again (as last year) a cool map for each organization showing the connections between students and mentors!

A nice and obvious example is the Melange organization home page, which shows our own GSoC connections :).

That's been a lot of work, mainly because we had to deal with privacy issues and legal stuff: we couldn't obviously make the exact lat/lon of people publicly available :) So, the agreement is for a "city level" precision, which is something quite vague.

I proposed something more mathematical like losing precision (for example like one decimal, so about 40000 (Earth's circumference) /360 (degrees) /100 = roughly 11 km of precision) on the lat/lon pairs given, but then ... we had obviously to do something definitely compatible with the legal agreement, so...

Lennie ran a script that he made with geopy to reset all lat/lon pairs in Melange to city level and than.. all the suff I made in Javascript (commits 9c7d31b824, c76c9c5916, 40cf7eaa03) in the meantime came to life :)

Also we took care of showing only "one world": if the resolution of the screen was too high, then the div containing the map would have stretched horizontally, showing more than one world :) So I've put the exact dimension of the world (using a specific zoom) to the div (commit 37abba547f). More complications (like zooming exactly to contain each organizations markers without showing the whole world) postponed for later, if needed.

More about Statistics module update: during the last week, apart from talking with Daniel Hans about the steps to be taken for the next last 2 weeks of GSoC, I've fixed some style issues that Pawel pointed out in the dev mailing list (still have to fix what Daniel Hans and Lennie sent).

Then I worked to integrate Google Chart API Visualization to the list of available visualizations, which is needed to do some kind of exporting of the graphs. Problem is that, when I was programming everything I didn't take into account (should have I?) that one visualization can contain "sub-visualizations", because that's what happens with Google Chart Visualization! I mean, the constructor is the same, "ImageChart", but the real kind of chart to be displayed should be put into the options! That led me to change some logic on the JavaScript side and finally also Google Chart are available, showing also an "Export" button (that appears and disappears properly to be shown only when Google Chart Visualization is selected) that basically shows an alert box with the HTML source to be copied and pasted into a page to show that chart.

After that, I worked on a bug that prevented to show the available Visualizations (sent by the cool work on the backend Daniel Hans has made during the week) when the widget was initially created (they were shown only if the dashboard was reloaded). That was a little hard because I had to change some stuff also on the backend to send the data along with the statistic at first, instead of only with the saved chart like it was.

That's all folks! :)

Statistic Module: Ninth and tenth weeks raport

This is my blog update which describes the last two weeks of my work.

I mostly worked on supporting visualization options from the backend side, but to better understand my work, let us take a look at how visualizations are handled by the frontend. As it was mentioned a number of times, we use Google Visualization API. In order to display a visualization, there has to be a script which needs to be passed a dataTable object. The dataTable contains actual data. Of course we can create the object on the client side (for example by sending raw JSON with statistic data and parsing that string to construct dataTable), but we decided to send dataTable objects directly from the server.

As you could see in some previous revisions, the functions which were responsible for that were hardcoded in the python code. It was an easy and quick solution at the beginning of the program, but now, we do not want anything like this in the final version.

When I was dealing with stats collecting functions, I added a new JSON attribute to the statistic model which contained some information on how to collect a particular statistic. This time I did a very similar thing: a new field - chart_options.

This is also a JSON string. Its format is still changing, but the field which refers to dataTable creation is "description". It describes all columns for the visualization and is used by dataTable constructor.

For the last couple of years I have been working on adding support for different visualizations for a single statistic. Some initial work had been done by Mario who added a select list inside a widget. Anyway, the list of options was static and contained all possible visualizations. It was not sufficient, because for example we want a GeoMap for statistics like 'Students Per Country', but certainly not for 'Students Per Degree'. List of visualization applicable for a single statistic should be remembered in data model. I used chart_options field again.

Having completed that, I started to work on the next issue. In the final goals list the mentors asked for some statistic for students with projects / students without projects / all students. I added those statistics a few weeks ago, but for each statistic we ended up with three different entities. It was pretty much sufficient, but we had to have also three different widgets - I really wanted to add possibility to display all three options in a single visualization.

Now all three kinds of statistics are stored in one single entity whose final_json_string has for example the following format:
{undergraduate: [1200, 700, 500], master: [100, 40, 60], phd: [100, 90, 10]}
The first number is the number for all students, the second number for students with projects and the last for students without projects.

So for every entity, one can define a multiple number of virtual statistics by choosing a subset of column. For example, we have defined 4 statistics:
Students Per Degree (all) [column 1]
Students Per Degree (with projects) [column 2]
Students Per Degree (without projects) [column 3]
Students Per Degree (cumulative) [columns 1, 2, 3]

What is interesting (or not:-), we can set up a different list of possible visualizations for each virtual statistic. This is important for example for Students Per Country. For the first three kinds we can use GeoMap, but it is not possible to use it for the forth one.

Recently I have worked on adding the ability to switch kind of virtual statistic on /statistic/show/page. It is already done, but now I need to add support for that on the dashboard page which is a little bit more sophisticated. Anyway, I am expecting to have it done by Wednesday. That day I am also going to send new patches with Sverre's suggestions taken into account.

That is basically all from me. As usually I also took some time on fixing some older bugs and so on, but it is not worthy to mention that.

Wednesday, July 29, 2009

GSoC09: Statistics module 6th weekly report: idling, than back to work :)

It's been a while since my last post. The reason is that I had to travel to Italy (and then back to Dublin) for an University exam that unfortunately kept me busy more than I foreseen.

During those days, I was basically in sync with dev list posts and almost always idling and pingable in the irc channel. So, even if not so productive, at least I mantained constant communication with the community.

Apart from that, I've made a very very little commit for a style fix in Javascript and sent some patches (that Pawel is reviewing) to the mailing list, just to be sure that my first contact with Python side architecture is not totally clueless :P

I've started again to be up and running for a part of the last week, in which, basically, I've generalized a bit the code to produce a visualization in a widget, so now the visualization can be selected from a drop down menu. In this way, even if every widget is created as a Table, it can be switched later to a Pie Chart, a Geo Map or an Area Chart.

Apart from that, I coded (and still coding right now) the Javascript side to have the Organization Home Page map (with markers for students/mentors and their links) finally working... Lennie, expect the patch soonish :P

Tuesday, July 28, 2009

GSoC 2009: GHOP - Ninth Week Status Update!

Hello everyone,
A week of fair productivity again. Started out this week with implementing the actions that must be available to all users. The complete GHOPTask workflow has been implemented now. The code is also up and running in my demo instance at http://melange-madhusudancs.appspot.com.

To explain a bit on how it all works, any user irrespective of whether he is registered as a GHOP Student (except Org Admins, Mentors and Program Host) can request to claim the task, if he has not claimed a task before. Once this is done, the task is locked down and no other user can claim it. Also let us say there is a limit of Max 1 task simultaneously as it is with GHOP. So the User cannot request to claim any more tasks. This is true for registered students also. Now a Mentor/OrgAdmin of that org(irrespective of who created or modified that task) can accept/reject the request. The student can also withdraw from the task. If student withdraws or Mentor rejects the task goes to reopened state.

If the student's claim is accepted, an automatic update Appengine task is spawn, so the task state changes to Action Needed if no work is submitted by student, there by extending the deadline by 24 hours. And after this deadline also is passed the task is automatically reopened by another Appengine task which will be spawn by the previous ActionNeeded Task.

However if student wishes to submit his work, he can provide a link to his work in the Submit Work box along with the comment and action. The Mentor either accepts the work by Closing down the task or requests for more work, optionally extending the deadline. The student in this case can again submit his work.

Along with implementing this workflow, I have fixed a few 505s this week and mentor_list field validation on task edit form. Also the approve button doesn't appear for the tasks that have already been approved on the task edit form. I am currently working on subscriptions and sending out notifications to the users subscribed to the task. This will be over by this weekend and I will start working on implementing views for tags based on taggable.

Monday, July 27, 2009

Ninth Week NewsFeed Update

With the code for the NewsFeed module nearing completion, I'm beginning to focus more on testing and refactoring to increase the readability and general orthogonality of the NewsFeed code. The newsfeed module is unlike some of the other components in the codebase in that outside of the core model, logic, and view code, there needs to be code snippets in each logic and view module that incorporates newsfeed in some way.

That famous Voltaire quote that "the perfect is the enemy of the good" is very relevant to the testing stage of development, since while it can be reasonably easy to be sure that a feature is 95% bug-free, determining that it is 100% bug-free is exponentially harder, since it requires every possible permutation (or at least every significantly different permutation) to be tried, which makes it practically necessary to have automated testing in place. This is especially true for an open source project, where someone else may likely pick up where you left off, and you don't want them to treat your code contribution in the cargo cult manner.

Seed_db serves as a very basic smoke test, since I've added onCreate hooks that in turn create news feed items. While this smoke test does test the creation of new feed items, it does not test the consumption of news feed items, which is at least as important as their creation. Therefore I'll be working on unit tests this week, modeled after the test examples already in the codebase.

I've also done some refactoring, with help from Sverre's code review last week. Instead of a feed item created for each sender-receiver pair, now there is only one entity created for each feed update, with a list of receiver keys. This actually makes it easier to determine which updates should be sent to which users, and reduces the amount of entities that need to be created. Because this makes the feature more simple, it also makes it likely to contain less bugs, and makes it possible to reduce the amount of testing and maintenance required.

Monday, July 20, 2009

Eighth Week Update

Before I discuss the progress of the News Feed module, I wanted to recommend an old Joel Spolsky post. Spolsky is the CEO of Fogbugz and co-founder of Stack Overflow, and this post on Evidence-based Scheduling has got me thinking about the next steps for evaluations, statistics, and tasks.

Evidence Based Scheduling (by Joel Spolsky)

Mentors can now evaluate students based on their past performance, and evaluations have been built for an arbitrary number so that in a future cycle an evaluation could theoretically be given once a week or so, but these evaluations are nonetheless discrete, backwards-looking measures.

An improvement to this feature would be to have more continuous data . Since SOC is built specifically for software projects, most projects could likely make use of a post-receive hook or other form of automation. The goal would be to not measure a student's performance in the past, but to be able to make predictions such as whether a project will be completed on schedule. This will allow earlier intervention, and will make it more possible to make the mentoring process generally more data-driven.

This would also help us measure performance at a more fine level. As Spolsky notes, as the size of a task decreases, the ability to accurately measure it increases:

When I see a schedule measured in days, or even weeks, I know it’s not going to work. You have to break your schedule into very small tasks that can be measured in hours. Nothing longer than 16 hours.

This forces you to actually figure out what you are going to do. Write subroutine foo. Create this dialog box. Parse the Fizzbott file. Individual development tasks are easy to estimate, because you’ve written subroutines, created dialogs, and parsed files before.

In the last day, I've posted a new patch for NewsFeed tasks, and I should be posting another shortly that completes e-mail notification functionality.

One major functionality that has been holding up this second patch is the ability to perform an access check that can work for the widest range of feed item senders and receivers (the NewsFeedModule wiki page has a definition of these). Because this feature needs to be able to work with a variety of model schemas, I wanted to find the lowest common denominator. Of course, this should be the use of scope logic to determine the relationships between entities. But the original specification for NewsFeed envisioned some type of updates that are difficult even with scope logic. Because of the complexity of this feature, I think it's all but necessary for me to provide some smoke tests that make sure that these checks are working properly.

And the next step in feed-customization is to provide a UI and logic for users to set their own feed preferences for e-mail notifications (the plan as of now is to allow e-mails to be customized but for XML feeds to only include public info and not be customizable).

I should soon be ready to focus more on the secondary NewsFeed features, and I'm especially interested in prototyping how we could use post-receive hooks and real-time push notifications to make the newsfeed feature more useful for collaboration.

GSoC 2009: GHOP - Eighth Week Status Update!

Hello everyone,
Have spent most of the week with re-locating and figuring out how to make internet work. Most of it is working except for Skype and IRC now. Setting up internet did not turn out to be as easy as I had thought of :(. Not a very satisfactory week this has been.

Spent some more time fixing 505s for task create/edit for mentors. Reviewed some jquery pluggins to be used for creating/editing/deleting tags and found out a plugin called jquery-tagger suitable for this task. Has to be integrated yet. And after this I started working on Task Comment infrastructure. Users are now able to post comments.

While working on Commenting infrastructure, I also re-organized the Task public view page, to now have an actions drop down(this action drop down list is built dynamically based on context and user) and then a submit button instead of different buttons for different actions. This now tightly integrates both actions and comments. Only logged in users can post comments and perform actions if there are any. The common action to all users is, "Comment without any action". In addition, if student is eligible to claim a task, he gets, "Request to claim" the task option in the list. If he has already claimed the task, he gets, "Withdraw from the task" option.

An Org Admin or a Mentor will get an option to, "Accept" or "Reject" requests if a student has requested to claim a task. In all the above cases, the comment is only optional. In case of above actions one can also do the action without posting any comment, even that is optional. The "Changes" line as shown in the code.google.com issue tracker per action is being recorded in the datastore, but I am still working on display of comments.

My work for the next week will be to complete all the actions that must be available to every role, for the Task Workflow which includes Work Submission for student who has claimed the task. NeedAction, NeedWork, Closed actions for Org Admin or Mentor. In addition to it completing the commenting infrastructure. If time permits I will start working on the task notification work which was put on hold last week due to some adjustments and changes happening in how Task Queue APIs are being used in Melange.

Statistic Module: eight week update

Here the latest update of my work on Statistic Module.

Basically during the last week I almost entirely focused on the backend side. As I already wrote on the blog I was working on some abstraction layer for statistics. The goal of my job was to separate statistics from Python backend code. The situation was that for each single statistic, we needed at least one (but practically a few) functions to process it. For a request to collect statistic, logic looked for a function 'update' + statistic_link_id. At the beginning, it was easy and rather convenient solution, but when the number of statistic grew, the statistic logic ended up in having an awful lot of very similar and short functions. What is more, the worst problem was that every statistic had to be hardcoded in source code. As James Crook pointed out in one of his emails, it was a huge pitfall, because every time we want to add a new statistic, we had to add new code and *redeploy* melange.

Thus, I designed a solution to store some specific information as a json string in statistic model - a new field "instructions" were added. I will describe the meaning of all parameters and dependencies between them in the upcoming days. Generally, some parameters that previously were set by statistic specific functions, are now retrieved from instructions.
Let us take a look a the following example. We have "Students Per Country" statistic, so all students are iterated through and for each of them we checked its country and updated choices list.
Before calling collectStat function, we needed to set up al least two things:
logic: to student_logic (because we iterate students)
choices: to soc.models.countries.COUNTRIES_AND_TERRITORIES.
Of course we needed a special function named updateStudentsPerCountry and we could easily set those parameters there. We could live with that, but now, let us say, we want to add "Mentors Per Country". Previously, we needed to:
1) Add an appropriate entity to the data model.
2) add updateMentorsPerCountry function.
3) set logic to mentor_logic
4) set choices to soc.models.countries.COUNTRIES_AND_TERRITORIES
5) redeploy melange.
A lot, is not it?
Now all that stuff is done by parsing instructions.
Let us take a look at instructions for students_per_country:
instructions = {
"params": {"fields": ["res_country"]},
"field": "country",
"type": "per_field",
"logic": "student"
}
The most important field is "type", because it determines that we are dealing with "per field" statistic. Actually, all statistics before last week had "per field" nature.
Then, we have "logic" which means which logic will be used for iteration through entities. To get the actual statistic, we looked for the value in logics_dict dictionary.
What we still need is choices. So, we have "field" parameter and we look for the choices list in the choices_dict dictionary.
Last but not least, there is "params". It is a dictionary which is passed to collectStat function. Previously it was also set by statistic specific functions.
And basically that is all. Let us consider what we have to do now in oder to add "Mentors Per Country":
1) Add an entity to the data model with the same params as for "Students Per Country", but with "logic": "mentor" instead.
And that is all! No changes to the source code are necessary (I assume that we already have a value for "mentor" in logics_dicts.
I hope you will agree with me that it is simpler now :-)

The next thing I worked on during the last week was dealing with statistics which have no fixed list of choices. Actually I had worked on that some time before, so I had some concepts, but because of instructions usage, I had to make some changes. Here is how it is done now.

Let us say, we want to have Student Proposals Per Organization statistic. Before we start to iterate through student proposals, we have to have a list of all organizations. So, we just iterate through all organization entities (also in batches) and create list of link_ids.

The question is how do we know if a statistic has to have a list of choices dynamically collected and what to collect. The answer is: by instructions of course ;-) The only effort is to add "choices_logic" parameter to the dictionary.

As I said, I will try to provide more information about the rest of parameters soon on the wiki. The most important one is "checker" which allows to filter iterated entities depending on some criteria. For example we can process only those students who have a project assigned.

Some time ago Pawel and Sverre had a meeting about final goals for Statistic Module project. They put a list of statistics which they would really want to have. Some of the statistics were already present, but some were not. So, I also worked on them during the last week.
The new available statistics are:
* Mentors With Project Per Country',
* Mentors Without Project Per Country',
* Organizations Per Program
* Student Projects Per Country
* Student Projects Per Continent
* Student Proposals Per Country
* Student Proposals Per Continent
* Students With Project Per Country
* Students Without Project Per Country
* Students Per Graduation Year
* Students With Project Per Graduation Year
* Students Without Project Per Graduation Year
Note: As I mentioned above, one of the instruction parameters is "checker" which allows to collect all those "with/without" statistics.
* Number of Students
* Number of Mentors
* Number of Student Proposals
* Number of Student Projects
* Number of Organization Admins
* Number of Mentors With Projects
* Number of Students With Projects
* Number of Students With Proposals
The last batch of statistics is let us say: "per nothing". I mean, of course we can have it "per program", but do we really want to?
So I put them in one single entity "Gsoc2009 Overall". Its "type" in instructions is "overall" and such a statistic consists of many subsstatitics. They also use instructions, so it is quite easy to add new.

Currently, they only small statistics that are supported have "type" "number", so now I am going to add another kind: "average", because there are still two statistics left on our mentors' wish list:
* Average number of projects per mentor
* Average number of student proposals per student

As I said, I work almost entirely on backend, the only thing I did for the client side was reducing the number of columns to 2 as mentors suggested.

The last thing: Last week I sent a first bunch of patches. Lennie and Sverre, thank you for the reviews. I will take them into account and will try to send new ones by Wednesday.

Monday, July 13, 2009

Sixth/Seventh Week Status Update

It appears as though there are now some diffs about which week we're on! Depends on whether we're indexing from 0 or 1?

This week I find myself working on two different tasks with a very similar theme.

The first task is in the News Feed module. When we generate a feed item, we're doing it on the logic side. the onCreate, onUpdate, and onDelete hooks that are used to create the feed item is in the model logic layer, and the feed item process itself (now being converted to a scheduled Task) is also in the logic layer.

However, the payload for a certain feed item requires access to view params in many cases so that model-specific payload templates can be re-used to render the payload for the activity item.

And you could say, "yes logic can't retrieve from view, but isn't the view retrieving from the logic anyways?" This is true, and this is why normally this is not a problem. But the problem arises when the view for one model type (site) wants to access view params for another model type (document, or survey).

This is also a challenge for the view for org admins to view the results from the manage a student project page. In this situation, the view model is for StudentProject, but the most obviously DRY approach would be to also instantiate a view class for survey view models so that we could render a list of ProjectSurveys or GradingProjectSurveys.

Short version: the view layer seems somewhat impenetrable from the logic layer, as it is tied up in the HTTP request. As a view should be. But what do we do about these edge cases?

I've been impressed with the orthognalness (is that a word?) of Melange, so it's entirely possible that I haven't yet figured out how to properly instantiate a view from the logic -- or why I don't want to do that in the first place.

On an unrelated note, another good learning experience has been in how I've been thinking about privacy in regard to how news feed activity can be consumed. While at first I was mostly alright with the idea of generating a hash key for hard-to-guess URLs. But the news feed activity is mostly HTML, content, and key words. In other words, if we're not careful, it could very easily end up crossing paths with a robot (edit: not if we set our robots.txt correctly!).

Statistic module: seventh week update

This is my update of the work I did during the last week.

Basically it was a pretty intensive week. Not only was it time of midterm evaluations and we were expected to show 'a working demo', but also we were integrating the statistic dashboard page with Python backend which was not a piece of cake for me, as it required a lot of JavaScript/jQuery work.

Let me describe shortly what was achieved in this area. Before that week the statistic dashboard page had already some functionalities: users could create new widgets, but they could only move them. After the page was reloaded, they always appeared in the third column in random order. The first feature that I added was an option to collapse widgets so that the user could shrink the ones which he or she did not need.
Here is an example of standard widget:

And here you can see how the widget looks when it is collapsed.

As you may also notice, the widgets below are moved upwards.

The second thing was to allow users to remove widgets from their dashboards. One may not want to have the widgets forever... Now they may be easily and intuitively removed by pressing X button in the upper right corner.
At that point it was already good: we have to important functionalities added, but they were still worth next to nothing, because the results were gone whenever the dashboard page was reloaded. Therefore it was the most important issue, to have all the settings being downloaded from server and saved on server if something changed. Of course it all had to be done in AJAXy way, because the whole point of the dashboard page is that the page should not be reloaded. jQuery requests were used and I have to admit that I am quite impressed by them after I got to know them.
The next thing was to remember widget positions, as we could create them, remove or collapse, but they always reappeared in the third column in random order :-) Once I got to know that sortable library can handle events that are triggered after movements are finished, It was simple to remember column changes. I had some problems with positions in columns, but after Mario showed me Firefox tools like DOM Inspector which allows to see real DOM structure, it got easier :-) Thank you for that! :-)
As a result, all changes that a user makes are saved on server, so you can do whatever you want and you should see the same dashboard after reloading the page.

Then I was working for some time on adding support for Task Queue API for Melange. I know it is not a part of statistic module, but in my opinion it is important for the whole project. Basically, I took care of comments that were sent by Pawel. There is just one thing I had to do, so probably I will send a new patch tomorrow.

I also started to work on abstraction of statistic gathering functions. It is almost ready, but I will describe it more in the next week, because I expect some changes.

Tomorrow I am also going to send the first patches for the statistic module. After they are reviewed I hope to push them to the main repository.

GSoC 2009: GHOP - Seventh Week Status update!

Hello everyone,
Last week has not been so very productive. Since last week was midterm evaluations week, I updated my demo instance and pushed some new patches my patch queue at bitbucket.

I started out this week with smoothing out some of the issues related to Students Task claim, like putting message headers about the status of the task among other things. The task page shows if the task is open/reopened to be claimed, claimed. If claimed by the student viewing the page it provides the functionality to withdraw from the task.

Other than this I have spent this week, mostly reading Task Queue API docs, existing Task API examples and code, trying out few small snippets outside Melange. Other than that I have mostly spent this week, fixing bugs (505s) that Lennie used to show in my demo instance while reviewing, repairing the Taggable-mixin patch to be integrated into Melange(have also sent the patches the dev list) and implementing small functionalities Lennie suggested on the dev list.

In the next week, I will be working on adding Task API code to automatically update the status and deadlines of the task and mentors/OrgAdmins side of Task management like, "Accept Student claim", "Reject Student Claim" and related things. If possible I will start working on the commenting infrastructure for Tasks.

Friday, July 10, 2009

Statistic module: sixth week update

This is an update of my work on statistic module for Melange.

Basically for the last week I studied JavaScript. It mostly included more advanced programming techniques like anonymous functions (function () {})() or closures, learning jQuery library and learning how it is all constructed in Melange source code. I am very grateful to Mario for all his explanations and for giving me links to some very interesting resources.

So my first JavaScript task was to make 'Task' button on the collect_stats page work in AJAXy way. Before that, it started stats collecting task and the page was reloaded which was completely unnecessary. I changed it so that the Task button generates XMLHttpRequest and also some message is shown after a task is successfully started. Work on that button took me more time that I expected. Anyway, it was my first jQuery job which I did mostly on my own. Before that week JavScript was rather like a strange Java to me and now I am exploring more and more advantages of it. I am sure that my efficiency will improve (and based on this week I am right :-)

Then I started to help with work on statistic dashboard page. Anyway, the first task that I assigned myself came out to be too ambitious. I wanted to fix a bug that when a widget with a pie chart visualization is moved, the pie chart disappears. I could not find a place in code which was responsible for it. The problem was also that only pie chart disappeared while everything with tables was fine. Fortunately this bug does not have a very high priority at this stage, so I put off working on it and moved on to some simpler but more important tasks.

On the backend side we mainly worked on integrating it with frontend. Some methods were added to statistic_chart view - they were needed for generating json responses for frontend ajax calls. Most of these methods were implemented by Mario and I hope it helped him to get more involved in Python side of module.

Apart from direct work on statistic module I created wiki page on introducing Task Queue API in Melange and how I imagine it. Thank you for all comments, I will try to work on that and improve it as soon as possible.

Wednesday, July 8, 2009

GSoC09: Statistics module 5th weekly report: backend/frontend ajax communication

This week has been focused on backend/frontend integration for Statistics dashboard and on helping with survey module bug fixing. Apart from that, I feel this week has been very productive because I've begun to work more actively on the Python backend, and similar beginning has been done by Daniel Hans, who begun an awesome commitment on the frontend Javascript side. I think this is the major step on our projects that have been done after we begun committing in the same repository and organizing the work with meetings and issue tracking.

About my work, the week started introducing Daniel Hans on the depth of advanced Javascript programming in general and, in particular, on the Javascript layer features and architecture for statistics module (and, well, if we'll come up with a good architecture, who knows, it could be the main JS architecture for Melange in the future :)).

On Wednesday I've pushed a new demo instance that was able to get a list of available statistics from the backend and created a new widget in the dashboard out of it. The biggest part of this work has been the generalization of the former code to have the statistic data joined dynamically from an Ajax call instead of hard coded stuff like before.

After that, I kept working on the integration to achieve a real ajaxy communication between frontend and backend. If only I could place a skeleton, then every feature afterwards would be easier to put in place. So, I've begun working on the data model for dashboard and chart and then on the ajax communication. In that way, a dashboard entity is now created for user at the first visit of the statistic dashboard page, and the correct entity is loaded if present.

On Friday, after I worked a bit on the Safari bug for surveys, I've met James Crook, and we talked also about the project, mainly focusing on what has been done and what can be done in the future with the current architecture. On Saturday, then, I've focused on the double quotes bug for survey project (which has been committed on the main repository).

On Sunday, at the end, I've helped fixing Safari 4 appearance bug (at least one of the two) for Survey's take.html template (committed on the main repository as well). About statistics module, I've added support to load/save widgets in backend chart entities. In that way, every time a new widget is created out of a statistic, it's automatically saved on backend. Every time you reload the dashboard the same widgets will appear on the dashboard.

Tuesday, July 7, 2009

GSoC 2009: GHOP - Sixth Week Status update!

Hi everyone,
Another productive week came to an end! As said in the previous week report, I started out this week by completing Bulk Approve and Publishing of tasks for Org Admins, which took over a day for making changes to the way Mentors list of db.Key items being rendered and displayed. This is already available in my demo instance at [0]

After completing it, I started working on Taggable-mixin. Taggable mixin's current architecture supports only 1 tag per Appengine datastore entity. But GHOP tasks require 2 tags per entity, one for difficulty and another for task type. So did some research on how to extend it. Finally, based on some suggestions from the developer of Taggable-mixin, Adam Crossland and Lennie, I wrote property wrappers for creating any number N, tags in the model. Have made all the necessary changes to the extended Taggable-mixin class. The core tagging framework now works as required, but still some code cleanup is needed. Along with that a cool UI for adding, deleting and editing tags must be implemented. Also the extended framework needs to be integrated with the Melange code base.

After this, I started working on Students side of tasks. Till now the UI to view and claim the task is ready. Students can also claim the task. The restriction to claim max number of simultaneous tasks as allowed in the program configuration is also done. Other minor details like storing student reference if already signed up and storing user reference are all done. Few things left out are, task header displaying if the task is already claimed or closed and enforcing a student to sign up before he claims the next task after completing at least one. I will be implementing them this week. Expect the demo instance to be up and running with whatever I have promised in a day or 2.

This week, I will be mostly concentrating on completing the student side of tasks and will be moving on to commenting infrastructure for tasks. I may also spend some time integrating Tags with Melange code base and student datastore entities conversion to support high school level students using Task APIs if everything goes well.

[0] - http://melange-madhusudancs.appspot.com/

Monday, July 6, 2009

Sixth Week Update

It's great to finally see surveys in the wild, and while there are a few rough edges, the feature should serve admirably, and the thorough refactoring that has taken place over the last week and a half highly reduces the amount of technical debt and will make it very easy to add new features and tweaks through subclassing/superclassing.

There are two outstanding requirements for surveys. And yes, they are indeed requirements, as if we are finishing construction of the railroad after the train has already left the station. Fun!

These requirements are 1) nagging notifications and 2) grade activation.

The SurveyRecordGroup entity kind that I've written about before makes both of these much more simple than they would otherwise be. At first, I was thinking that we could just use some logic at runtime to figure out whatever we'd need to send notifications for untaken surveys and to activate grades.

But after a plethora of hypothetical scenarios, it became apparent that unless we created a new entity kind tying together a given survey, a given project, and the status of that project before and after the survey, we'd open ourselves to all kinds of implausible but still possible bugs that might take much longer to find and fix if not nipped in the bud.

For instance, what if one mentor takes the midterm survey, and then the mentor has to drop the project for some reason and another mentor has to take the next survey? Or what if - as will likely happen for GHOP - there is not simply two surveys, but an arbitrary 'n' number of surveys, with an arbitrary choice of statuses for the project in question?

And what about mentors who must take a given Grading Survey more than once for several student projects? How would we avoid any complications or mistakes that might come up, such as giving the wrong grade to a project?

And finally, how could we be sure - positive, in fact - that we're not activating grades for a given set of surveys more than once? If all the students end up getting paid twice - or not at all - I *really* don't want to be the one responsible.

Luckily, SurveyRecordGroup solves this by using the status of the project when the survey is taken, and after the survey is activated. It's very simple and intuitive to use, such as in the following snippet:

# get SurveyRecordGroup for this survey and survey taker
program = survey.scope
for project in program.student_projects.fetch(1000):
this_record_group = SurveyRecordGroup.all().filter(
"project = ", project).filter(
"initial_status = ", project.status).get()

In addition to nagging notifications and activation, the third related feature that I'll be working on this week is a complete series of smoke tests, not that we need it right now, but they'll make it much easier and safer for someone to develop on top of the surveys and make sure they haven't broken anything.

After finishing up these final survey features, I look forward to being able to now dedicate much more attention the news feed that is my primary GSOC project.

I've already submitted patches of News Feed code and the Newsfeed Module wiki page is fairly up to date, but the current state of the news feed feature is a jumping off point, and I'd love to hear any ideas about news feed features that haven't yet been discussed. If you do have any, let me know!

Tuesday, June 30, 2009

Fifth Week Update - Surveys and Newsfeed

The beautiful chaos of open source development has many implications. It requires a level of cooperation that perhaps goes beyond that which we find in the ordinary workplace. Open source development requires collaboration, and constant communication.

At times, it's a struggle to maintain a good level of communication. This is especially true when the contributors are separated by thousands of miles, and have a wide variety of schedules. But communication in open source development is crucial. A lack of communication results in unnecessary emergencies, and can ultimately mean the difference between success and failure in an open source software development project.

Currently, I'm anxious about how to approach the challenges of these next few days, as we make the final preparations for deploying surveys. Ordinarily, I'd like to be testing, optimizing, and generally winding down development at the 11th hour before deploying a feature.

However, it's surprisingly easy to find yourself in "putting out fires" mode at these times, which can be a scary place to be, considering that last minute changes are often precursors to a variety of problems that can end up affecting the end user experience.

Part of the reason I find myself in this situation is do to my own relative inexperience, and my inability to anticipate what features and code organization would be needed and desired. Being a newcomer to Melange development, I'm often unsure of design decisions and end up implementing a feature that either don't match the functional or structural needs, and either me or someone else will have to go back and make changes after dependencies and documentation have already been created.
If I did a better job of communicating specifications for all my work, I probably would have saved a good deal of development time.

While in "putting out fires" mode, these design decisions can be even more difficult to make, since there is a great incentive to do things the quick and ugly way.

For instance, now that the Survey class has been subclassed as ProjectSurvey and GradingProjectSurvey, there needs to be a way for a user creating a survey to specify whether the survey is a Survey, a ProjectSurvey, or a GradingProjectSurvey.

The dynaform workflow does not suggest an obvious solution, and there are at least several possible implementations. Earlier tonight, I implemented one solution and then realized that my solution, while convenient from a short-term development perspective, would certainly simply lead to more changes and more "putting out fires" mode, and would result in a net-loss of time. So I decided to start over, with a different approach.

The new approach is to confirm my decisions through communication, rather than solely self-confirmation. This approach may require some humility and patience, but it's surely better than staying up all night banging my head against the proverbial wall.

Knowing that Melange developers are all intelligent and friendly to the Nth degree, I'm certain that all have it within us to strive to be more communicative and receptive to communication, and ultimately create a better product in the process.

A quick update about developments for the newsfeed feature -- the changes in the last few days have mostly been style guide changes. Since we've been sweeping through the survey code and updating style, I've used the opportunity to also get my newsfeed code up to style guide standards. I've segregated the news feed into a hibernating working branch now that there's more direct interaction between the soc hg trunk and my personal github repo. Since newsfeed requires changes in many view and logic files in addition to distinct "newsfeed.py" type modules, it's far easier for me to not even attempt to keep these changes in my survey-related work.

Monday, June 29, 2009

Statistic module - 4th and 5th week update

First of all, let me apologize for not updating the blog a week ago. I know there are no excuses - I was just putting off this task so many times that the whole week passed.

To begin with, I will try to describe my 'two-weeks-ago' update.

First of all, I added support for some more sophisticated statistics. The question is how to define which statistics are in this group? At first, let us take a look at a simple statistic that had been already correctly counted. For example Students Per Degree. Collection process for such a statistic looks like this: we have static list of choices for degrees ('undergraduate', 'master', 'PhD') and we iterate through all student entities (in batches of course because of Google Application Engine limitations), for each entity we check degree field and increase value in the appropriate basket. But unfortunately we do not know the list of choices before starting collection. Let us take a look at statistic like Applications Per Students. In this case we could also have static choices (numbers from 1 through 20, because every student may submit up to 20 application) and gather data by iterating through all student entities, and for each student iterate through all student proposals entities, check how many of them belongs to the student. Anyway, it would be at least as inefficient as bubble sort for large data input - its complexity could be
O(proposals * students).
Of course there is better way: to have a list of all students as choices, iterate through student proposals and for each of them increase number connected with the student in scope. There is just one problem: we do not have the list of all students at the beginning, but we have a simple walk-around: at the beginning list of choices is empty and we add students to this list dynamically. When we process a single proposal, we check if the student in scope is already in the list. If so, we increase his number by one, otherwise we add him and set his number to one. This is quite a smooth way. Of course I know this algorithm is not linear depending on number of proposals, because we look up for a student in dictionary, but in worst case it works like
O(proposals * log(_students_)),
where _students_ is a number of students who submitted at least one proposal, but this attempt is better than the first one.
Anyway, it still has a disadvantage: we have no information about the students who did not submit any proposals - and there is certainly a bunch of them. For other similar tasks this problem is not so substantial. For example Applications Per Organization: we may assume that each organization receives at least one proposal; for Students Per Program we may also assume that for each program (at least hosted by Google:-) someone registers himself, and so on. Nonetheless, Mario and I decided that all statistics should be fully covered.
Therefore, the statistics, which we do not know their full list of choices for, will be collected in the following way: first, we collect full list of choices (also in batches, because their number can be large) and after that we collect actual data. For example, let us consider Applications Per Student again: first we iterate through all student entities and then, having a full list of students, we iterate through proposals and mach proposals with the students.
Anyway, this way has not been merged into our repository yet. Collecting statistic this way in batches would be very awkward, because the number of batches automatically increases. Anyway, it is ready and may be merged quickly after a conversation with mentors and/or other developers.

Some other things which were achieved during that week includes: creating standard views for statistic entities, like create, list, delete; adding support for a few new statistics.

And last but not least, Mario and made some important decisions. Firstly, Mario set up Issue Tracker on our bitbucket repository. Secondly, we decided to organize daily meetings. Thirdly, we postponed abstraction of statistics until we add support for statistics for surveys.

And now, let me move on to the very last week. First of all, I was sick on Thursday and Friday so I could not finally do everything that I had planned.
The most important thing that has been done is that statistics are now collected using Task Queue API. I tried to use this API wisely and make the code at least a little bit reusable, because this API will be probably useful also for other problems that Melange is to encounter in the future. I am going to describe my solution in the wiki and on dev list. It would be great if I got some feedback so as to improve it. The most important thing about it is that we can divide a long task into smaller subtasks and then start a task. When we execute one task and find that the whole task is not completed, we may repeat the same task or start a new one. Statistic collection may be just an example, but as we know, they are collected in batches. So we start a task which collects the first batch saves its data and because the whole collection is not completed, it restarts the task. User is obligated just to turn on the first task and does not have to click several times.

This week I also started getting involved in Java Script side of the module. Mario is the one who created the whole skeleton and I had to understand his conception. Until now I have taken a good look at his code and gone through a jQuery tutorial . Today Mario briefly described me the skeleton (thank you for that!:) and I am going to write some code on my own very soon.

GSoC09: Statistics module 4th weekly report: dashboard!

While midterm is approaching, this week has been very productive to start integrating Javascript and Python ends together and begin producing the Ajaxy dashboard for statistics module, pushing almost all ehancement that were planned. It also helped me a lot to introduce me better on the Python end, at least for basic tasks. Furthermore, I've continued helping with JS code review for survey module and helping fixing Melange bugs.

So, here the story: starting from Tuesday, I've set up my Windows partition to try to help with Organization home Google map bug, but at the end I was not successful as I couldn't reproduce the bug itself in my box :( I've then created the dashboard model, logic, view and template for the Statistics dashboard page. After helping with another code review for survey module, I've fixed (well, almost fixed, discussion is still ongoing) Melange issue 645.

After that, I worked on producing the skeleton for the Dashboard page, borrowing and taking inspiration from How to mimic iGoogle interface Nettuts tutorial (without widgets moving at first), so building bases to have the widgets created dinamically (in that way, we can retrieve settings for each chart from the Python end) and also added an OSX style stack menu (even if we will drop it out, as it's not so homogeneous with the rest of Melange application GUI).

To make the widgets inside the dashboard moving, I had to integrate sortable and draggable extensions of jQuery UI. But these were already been integrated by survey module, so I thought to merge with the main branch instead of doing something duplicate, and that introduced me on using kdiff3. At the end of the merge, and after some use of Javascript, I had the widgets finally moving on the dashboard!

On Friday, after another code review for Survey module Javascript side, I started a thread with Daniel Diniz (ajaksu) about Javascript style, and found it very very interesting and full of new ideas. That is the beauty of open source! :) Discussion need to be continued, but we came with some improvements of Javascript style guide wiki page.

The same day I worked on bug fixing dashboard, because it seemed not to work on Safari and IE. I've then found that I was passing to an anonymous function a variable called class, which turns to be a reserved keyword for both Safari and IE... renamed it to _class and all begun to work!

On Saturday I kept trying to work on Melange issue 645, finding also a good article about event capturing (which is somewhat the opposite of event delegation).

On Sunday, at the end, I've created model, logic and view skeleton for charts in Python (thanks Lennie ^__^), which will store and retrieve single chart instances for every user. I used then jQuery thickbox to display a "new chart" window, beginning to integrate Javascript code with Python to have a list of available statistics data (made available by Daniel Hans in a JSON object to be retrieved asynchronously) to be injected in a widget in the dashboard.

Things are going to be exciting next week, when we'll probably come with more interaction between Javascript and Python ends and see finally everything beginning to work!