Tuesday, August 4, 2009

Tenth Week Newsfeed Development Update

At the end of last week, I submitted a major refactoring of the newsfeed component, which now includes a separate subscription component that manages e-mail subscriptions (which can contain more private information than the public ATOM feed).

After receiving feedback on this latest patch, the plan is to commit the Newsfeed. Before I commit, one very important change is to taskify the sendMail method of the mail dispatcher. Obviously, this doesn't just affect the Newsfeed component, but it's very important for newsfeed subscriptions as I discovered after deploying and testing on my jamstage.appspot.com staging instance. The e-mail notification processing is done from within a task, and when a task returns a 500 error the task is repeated. Because the task often sends out many e-mails at once, I hit a OverQuotaError from this task, which re-triggered the task, and it wasn't long before my e-mail inbox was flooded with e-mails.

This has been a good lesson about working with the Task Queue API. It's good that a task will retry if it fails, but this means that only atomic (and idempotent) functionality should be contained within each task, so that if it fails the retry behavior will not bring about undesired behavior.

Once the Newsfeed is commited, I'd like to finish the UpdateLogic which is necessary for automatically compiling subscriptions. This means that we need a list of users with read-access. I describe the problem further in the NewsFeed wiki page:




The subscriptions logic contains an UpdateLogic class that is meant to automatically compile a list of users with read-access to an arbitrary entity. Unfortunately, setting subscriptions manually is the only option right now, since the automatic subscription handler is not nearly where I hoped it would be by now. The idea is very simple - just subscribe users to the entities where they have read-access. But the reality has been much more difficult to do without creating an entirely new infrastructure, and I'm entertaining the idea of simply relying on a manual interface (subscribe-by-star). The takeaway from this is that perhaps the access component should be abstracted to a high-level API, because it appears to have gained dependencies and complexities that prevent it from being useful to the news feed logic.



Finally, I plan to implement a simple PubSub example implementation, and do more in-depth testing (especially of edge cases, etc.)

GSoC 2009: GHOP - Tenth Week Status Update!

Hello everyone,
Last week has been a very nice week to me :P. As in most of the stuffs got completed or there are only very minor issues to be fixed. I started out the week with completing GHOP Task Workflow and I completed it. The complete work flow for the GHOP Task is up and running on my demo instance at http://melange-madhusudancs.appspot.com.

Later I worked on GHOP Task Subscription and also the Notification system. Now a user can subscribe to a GHOP Task and can get both On-site notifications of Task updates and also emails. Even this is deployed on my demo instance. Along with that I worked on GHOP Task Work Submissions for students who have claimed the Task and want to submit it. The Work Submissions are listed just after the Comment box and actions, followed by the list of comments.

In addition to all these I added permalinks to the Comments and Work Submissions that are displayed along with a GHOP Task. One can use those links to directly scroll to that comment or work submission on the comment page.

This week I will be mostly working on Tags Views and GHOP Task search filters.

Monday, August 3, 2009

GSoC09: Statistics module 7th weekly report: Organization home page map and fixes

Last week has been an "exciting" one, because Lennie and I worked close together to the common objective of solving a too-old issue of Melange: we've now again (as last year) a cool map for each organization showing the connections between students and mentors!

A nice and obvious example is the Melange organization home page, which shows our own GSoC connections :).


That's been a lot of work, mainly because we had to deal with privacy issues and legal stuff: we couldn't obviously make the exact lat/lon of people publicly available :) So, the agreement is for a "city level" precision, which is something quite vague.

I proposed something more mathematical like losing precision (for example like one decimal, so about 40000 (Earth's circumference) /360 (degrees) /100 = roughly 11 km of precision) on the lat/lon pairs given, but then ... we had obviously to do something definitely compatible with the legal agreement, so...

Lennie ran a script that he made with geopy to reset all lat/lon pairs in Melange to city level and than.. all the suff I made in Javascript (commits 9c7d31b824, c76c9c5916, 40cf7eaa03) in the meantime came to life :)

Also we took care of showing only "one world": if the resolution of the screen was too high, then the div containing the map would have stretched horizontally, showing more than one world :) So I've put the exact dimension of the world (using a specific zoom) to the div (commit 37abba547f). More complications (like zooming exactly to contain each organizations markers without showing the whole world) postponed for later, if needed.

More about Statistics module update: during the last week, apart from talking with Daniel Hans about the steps to be taken for the next last 2 weeks of GSoC, I've fixed some style issues that Pawel pointed out in the dev mailing list (still have to fix what Daniel Hans and Lennie sent).

Then I worked to integrate Google Chart API Visualization to the list of available visualizations, which is needed to do some kind of exporting of the graphs. Problem is that, when I was programming everything I didn't take into account (should have I?) that one visualization can contain "sub-visualizations", because that's what happens with Google Chart Visualization! I mean, the constructor is the same, "ImageChart", but the real kind of chart to be displayed should be put into the options! That led me to change some logic on the JavaScript side and finally also Google Chart are available, showing also an "Export" button (that appears and disappears properly to be shown only when Google Chart Visualization is selected) that basically shows an alert box with the HTML source to be copied and pasted into a page to show that chart.

After that, I worked on a bug that prevented to show the available Visualizations (sent by the cool work on the backend Daniel Hans has made during the week) when the widget was initially created (they were shown only if the dashboard was reloaded). That was a little hard because I had to change some stuff also on the backend to send the data along with the statistic at first, instead of only with the saved chart like it was.

That's all folks! :)

Statistic Module: Ninth and tenth weeks raport

This is my blog update which describes the last two weeks of my work.

I mostly worked on supporting visualization options from the backend side, but to better understand my work, let us take a look at how visualizations are handled by the frontend. As it was mentioned a number of times, we use Google Visualization API. In order to display a visualization, there has to be a script which needs to be passed a dataTable object. The dataTable contains actual data. Of course we can create the object on the client side (for example by sending raw JSON with statistic data and parsing that string to construct dataTable), but we decided to send dataTable objects directly from the server.

As you could see in some previous revisions, the functions which were responsible for that were hardcoded in the python code. It was an easy and quick solution at the beginning of the program, but now, we do not want anything like this in the final version.

When I was dealing with stats collecting functions, I added a new JSON attribute to the statistic model which contained some information on how to collect a particular statistic. This time I did a very similar thing: a new field - chart_options.

This is also a JSON string. Its format is still changing, but the field which refers to dataTable creation is "description". It describes all columns for the visualization and is used by dataTable constructor.

For the last couple of years I have been working on adding support for different visualizations for a single statistic. Some initial work had been done by Mario who added a select list inside a widget. Anyway, the list of options was static and contained all possible visualizations. It was not sufficient, because for example we want a GeoMap for statistics like 'Students Per Country', but certainly not for 'Students Per Degree'. List of visualization applicable for a single statistic should be remembered in data model. I used chart_options field again.

Having completed that, I started to work on the next issue. In the final goals list the mentors asked for some statistic for students with projects / students without projects / all students. I added those statistics a few weeks ago, but for each statistic we ended up with three different entities. It was pretty much sufficient, but we had to have also three different widgets - I really wanted to add possibility to display all three options in a single visualization.

Now all three kinds of statistics are stored in one single entity whose final_json_string has for example the following format:
{undergraduate: [1200, 700, 500], master: [100, 40, 60], phd: [100, 90, 10]}
The first number is the number for all students, the second number for students with projects and the last for students without projects.

So for every entity, one can define a multiple number of virtual statistics by choosing a subset of column. For example, we have defined 4 statistics:
Students Per Degree (all) [column 1]
Students Per Degree (with projects) [column 2]
Students Per Degree (without projects) [column 3]
Students Per Degree (cumulative) [columns 1, 2, 3]

What is interesting (or not:-), we can set up a different list of possible visualizations for each virtual statistic. This is important for example for Students Per Country. For the first three kinds we can use GeoMap, but it is not possible to use it for the forth one.

Recently I have worked on adding the ability to switch kind of virtual statistic on /statistic/show/page. It is already done, but now I need to add support for that on the dashboard page which is a little bit more sophisticated. Anyway, I am expecting to have it done by Wednesday. That day I am also going to send new patches with Sverre's suggestions taken into account.

That is basically all from me. As usually I also took some time on fixing some older bugs and so on, but it is not worthy to mention that.