Friday, May 21, 2010

GSoC 2010 Melange Testing Project: the last week of Community Bonding - starting to code

In this week, I started to add more test cases to and I first learned the code of the corresponding code base and the existing test cases in more detail in order to understand their functions and logic. Then I tried to design some new test cases. Due to the fact that I am still not very familiar with the implementation detail of Melange and met problems when setting up the coding environment, e.g. could not get the new version of ctags to work well with Vim and Python, the process is a bit slow. However, these preparations have built a good foundation for the official coding stage, which will start next week. Hope that during that period I can code fast and leave more time to improve and clean up my code as well as do some extra work for the project.

GSoC 2010 Data Seeder - Project design

Another week of the community bonding period has gone by as we are getting closer and closer to actually having some code written :D

So here are some of the features that have been decided upon so far:
  • The configuration for the seeding operation will be customizable by the user using a web interface which will conclude with the creation of a JSON configuration sheet. This configuration fill will be usable both locally and on-site and can later be edited using the same web interface.
  • The configuration sheet is meant to be very easily customizable and extendable. Different scenarios for different needs can be saved in predefined configuration files and then executed at any times. Examples include saved states in different phases of a GSoC program (proposal phase, midterm survey etc.).
  • There will be an option to export data to Python fixture files, either directly by the use of a script along with a JSON configuration file, or by saving the state of a running instance.

The script that I have used to generate the diagrams from my previous post has been committed to my Melange clone. You can try it out anytime, no further setup is required. Here it is:

Week 3: Getting Ready to Code

The community bonding period is drawing to an end this week, which means we begin the actual coding work on our projects starting next week! I am quite excited to start writing code :).

So far, I have finalized most of the design and workflow for the two features I am developing for my project. There are still a few loose ends with regards to design hanging around, which will be tied up soon. My project wiki is now updated with the design details and awaiting the approval of my mentor, Madhu :).

The main additions and modifications to the design this week were:

  • Some changes to the data model. The multi-valued properties and data models were scrapped to be replaced by a simple biography text field where users can write whatever they would like to share with the community.
  • Detailed workflows for Calendars and Maps feature (addition/request new events). Here are the usecase diagrams.
  • Both events and user_page data models have a tags property which is basically a comma-separated string with tags related to the user or event(e.g. C#, Java, Django). Users can also be given to the option to subscribe for feeds to be notified of events with certain tags.
  • I have also done a detailed timeline, with a list of deliverables for every week up until the mid-term to keep be oriented and directed with aims for every week.

Coming up next week:
  • Coding, coding all the way!
  • Follow timeline closely to reach the weeks' goals.
  • Resolve any bugs/issues, if assigned.

Thursday, May 20, 2010

New Document Editor: HTML diffs

One of the goals of my project is to create revision control for Melange documents. Each RC framework must provide some kind of diff engine. The engine should find changes in the document and represent them to the user. It is pretty simple with text, but Melange stores documents as HTML.

Talking about HTML diffs, we should consider two cases:
  1. HTML as text. A bunch of tags, attributes, values and content.
  2. HTML as a rendered document (e.g. image)
Another question is: what causes representation changes in a rendered HTML document?
  1. Tags (through browser's default CSS or applied CSS).
  2. Attributes class and id (through applied CSS)
  3. Applied CSS (server-side, in-document or in-line).
  4. Style modification with DOM.
Let's consider first case - HTML as text. This case seems pretty obvious, cause changes can be tracked with textual diff engines. But it's not that simple. There are several cases of changes to HTML which don't influence representation:
  1. Tags are changed, but CSS is the same:
    <h1>Hello, world!</h1>
    is changed to
    <h2>Hello, world!</h2>
    but the CSS is
    h1, h2 {font-size: 12px; font-style: normal;}
  2. Some pieces of HTML are rendered the same:
    <div class="alert">Hello, world!</div>
    <div class="alert">
    <p>Hello, world!</p>
  3. Class or id is changed but CSS is the same:
    <div class="original"></div>
    is changed to
    <div class="changed"></div>
The second case - HTML as an image. I mean HTML with applied CSS which is displayed to the user. Tracking changes to images is the correct way of handling HTML diffs. It can be performed with several tools. One of them is convert utility from ImageMagick. This approach is, however, a little tricky and CPU consuming.

Let's return to HTML as text. If we can guarantee that CSS represents different tags different and there are no class or id changes, then everything seems right. We can focus on textual diffs for HTML. Considering TinyMCE (which is a defaul editor for Melange) all representation changes are made with tags. If there is no appropriate tag, then the style is applied with <span> tag or a chain of <span> tags.

Textual HTML diffs can be generated by several tools. One of them is HTML diff for Python. I'm now thinking about using it as a skeleton and try to make more intellectual engine with Beautiful Soup