Saturday, December 30, 2006

Handling Method Arguments in Jython

I have been away from Jython for a while over the Christmas period but now I am back and eager to make some progress porting the csv module. So where was I? Oh yes, last time I managed to get into a position where test_csv.py ran successfully (success meaning that the test script executed - not that the tests passed!), so now I can start trying to get a test to pass.

I have decided to tackle test_reader_arg_valid1() test first simply because it's the first test in test_csv.py:


def test_reader_arg_valid1(self):
self.assertRaises(TypeError, csv.reader)
self.assertRaises(TypeError, csv.reader, None)
self.assertRaises(AttributeError, csv.reader, [], bad_attr = 0)
self.assertRaises(csv.Error, csv.reader, [], 'foo')
class BadClass:
def __init__(self):
raise IOError
self.assertRaises(IOError, csv.reader, [], BadClass)
self.assertRaises(TypeError, csv.reader, [], None)
class BadDialect:
bad_attr = 0
self.assertRaises(AttributeError, csv.reader, [], BadDialect)

As you can see, a series of tests are being performed on the csv.reader() method so I need to concentrate on implementing just enough of it to get the test to pass.

From the python documentation, csv.reader() is defined as follows:

reader(csvfile[, dialect='excel'[, fmtparam]])

csvfile is the only required argument and it can be any object that supports the iterator protocol. Next, the dialect name can be specified as an optional parameter or omitted (in which case, the dialect will default to excel). The other optional fmtparam keyword arguments can be given to override individual formatting parameters in the current dialect.

So csv.reader() has it all - mandatory, optional and keyword arguments. I am going to need to figure out how this works in Jython to pass the test_reader_arg_valid1() test.

In Jython three types of method are supported:

  • StandardCall: Mandatory, Positional arguments. (i.e. void method(PyObject arg1, PyObject arg2) {} )

  • PyArgsCall: List of optional, positional arguments. (i.e. void method(PyObject[] args))

  • PyArgsKeywordsCall: List of optional, positional or keyword arguments. (i.e. void method(PyObject[] args, String[] keywords))


As csv.reader() must support keyword arguments, it must be defined as follows:

public static void reader(PyObject[] args, String[] keywords) {
}

The parameter list must be specified exactly like this (except for the identifier names which can differ) because Jython uses reflection to make a method of type PyArgsKeywordsCall only if it has exactly one PyObject array as the first argument and one String array as the second argument. If you add another argument to the beginning of the parameter list, then the method will automatically be of StandardCall type and won't support keyword arguments.

I can use the handy helper class, ArgParser to parse the arguments and extract the relevant values. First, I need to create an instance of ArgParser as follows:

public static PyObject reader(PyObject[] args, String[] keywords) {

ArgParser ap = new ArgParser(
"reader",
args,
keywords,
new String[] {
"csvfile", "dialect", "delimiter",
"doublequote", "escapechar", "lineterminator",
"quotechar", "quoting", "skipinitialspace"
});
//..
}

args and keywords are passed into ArgParser along with a list of the names of each argument that the method supports. Then it is possible to simply pick out the value of a parameter by invoking a getXXX() method specifying the position of the argument. So to get the value of "quotechar", you'd ask for the value at position 6 as follows:

String quotechar = ap.getString(6, "'");

Simple, eh? It is equally as easy to support optional arguments. For example, the dialect argument would be extracted as follows:

String dialect = ap.getString(1, "excel");

Here, if dialect is not specified then it will default to "excel".

Now that I have learned how to support positional, optional and keyword arguments in Jython I can focus on type checking the arguments and throwing the appropriate exceptions in order to pass test_reader_arg_valid1().

Wednesday, December 13, 2006

Module Methods and Failing Tests

I have been looking at how CPython handles keyword arguments in methods today. I've had my fair share of experience with Python over the years (though, not so much in the last few months) but I was totally unaware that methods may or may not support keyword arguments! Maybe that's because I often used the PyQt GUI toolkit bindings which didn't support keywords arguments anyway, I'm not sure.

At the end of my last entry my _csv module was in a position where I was ready to implement the register_dialect() method. To do this I needed to figure out how Jython handles arguments as I thought I would need to support keyword arguments for register_dialect() - it's a python method after all and all python methods support keyword arguments don't they? In fact, as it turns out this isn't always the case! Although not explicity mentioned in the documentation, some CPython methods don't support keyword arguments and if you try to use them you will get a TypeError. Indeed, csv.register_dialect() is one such method:


Python 2.3.6 (#1, Nov 17 2006, 22:32:43)
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> csv.register_dialect(dialect=None, name="excel")
Traceback (most recent call last):
File "", line 1, in ?
TypeError: register_dialect() takes no keyword arguments

Presumably register_dialect() behaves like this because it is not really a Python method. The csv.py module just exposes register_dialect() from the C Module but a normal python developer would not know this and would quite rightly expect the method to support keyword arguments. This inconsistency is less than ideal and it's tempting to fix it for Jython but I think that would be a mistake. Jython is supposed to mimic CPython's behaviour whether rightly or wrongly. From a Jython perspective it's right if it's the way CPython behaves.

So, in the case of register_dialect() I can explicitly specify the arguments as follows:

public static void register_dialect(PyObject name, PyObject dialect) {
}

If I try to run test_csv.py now I get the following error:

Traceback (innermost last):
File "dist/Lib/test/test_csv.py", line 9, in ?
ImportError: no module named gc

The test_csv.py module uses the gc module which isn't supported by Jython yet. For now, I have just completely side-tracked this problem by making a copy of test_csv.py and removing all the tests that involve gc! Problem solved (temporarily at least)!

Now, when I run my own copy of test_csv.py without the gc calls I get yet another annoying error:

Traceback (innermost last):
File "test_csv.py", line 363, in ?
File "test_csv.py", line 364, in TestEscapedExcel
File "/jy/dist/Lib/csv.py", line 39, in __init__
None: Dialect did not validate: quoting parameter not set

This, and no doubt many other future cryptic errors are due to the fact that all the identifiers are the wrong type - they are all PyObjects which confuses Jython a great deal. Now is the right time to revisit each identifier and change it to the correct type.

It's worth noting at this stage, my goal for today is to get _csv into a state where it is good enough to fail all tests. Wow, what a statement - lets say that again: I want _csv to be good enough to fail all tests! What a strange goal to aim for. Well, actually once I have _csv in a state where test_csv.py can properly execute I am in a far better position than I was before. I can analyse the output of test_csv and tackle one test at a time, gaining satisfaction and confidence as I go. This is one of the primarily advantages of Test Driven Development and it's surprising how effective it is.

First, I will tackle the methods. Rather than figure out the parameters for each method I have simply specified "PyObject[] args" as the parameter list which just means the method supports 0 or more arguments. For example, I have implemented unregister_dialect() as follows:

static public PyObject unregister_dialect(PyObject[] args) {
return null;
}

For all the QUOTE_xxx identifiers I looked in the _csv.c module and saw they were enums. In Java I just make these separate integers to get them to work initially. I left Error as a PyObject as I will need to spend some time looking at exceptions at a later date. Similarly, I have left Dialect well alone and will look into it when the time is right. Finally, I changed __doc__ and __version__ to empty Strings to complete the process.

With all the identifiers now the correct type, Jython is happy to run test_csv.py. Of course, not many tests pass and there is a lot of output - here's a sample of it:

======================================================================
FAIL: test_reader_arg_valid1 (__main__.Test_Csv)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/jy/dist/Lib/unittest.py", line 229, in __call__
File "test_csv.py", line 19, in test_reader_arg_valid1
File "/jy/dist/Lib/unittest.py", line 295, in failUnlessRaises
AssertionError: TypeError
----------------------------------------------------------------------
Ran 65 tests in 0.666s

FAILED (failures=4, errors=55)
Traceback (innermost last):
File "test_csv.py", line 716, in ?
File "test_csv.py", line 0, in test_main
File "/jy/dist/Lib/test/test_support.py", line 262, in run_unittest
File "/jy/dist/Lib/test/test_support.py", line 246, in run_suite
TestFailed: errors occurred; run in verbose mode for details

I may have 59 failures but this is a much better position than before. I now have something to focus on - I can tackle each test as it comes and gain confidence as the number of failures decrease and the number of passes increase until the porting process is complete. Yippee!

Now I am ready to implement the module proper, the first task is to find out what "c.s.v" stands for! ;) :)

Porting C Modules to Jython

CPython includes many library modules, some of which are written in pure Python (which is great because these will work in Jython (hopefully) without modification), but others are written in C, which means they must be rewritten in Java in order to work with Jython. Take the csv module for example. It is a Python library module so it should be possible to use it in Jython as-is without any extra work. If only it were that that simple! You see, if you look at the code for csv.py you'll notice it uses another module called _csv for most of it's behaviour and it just so happens that _csv is written in C. Therefore, it is necessary to port this module to Jython. It's worth noting that there are many cases where a python library module is just a wrapper for an underlying C module, but not always - for example, cStringIO and cPickle are first-class library modules implemented in C.

Before actually creating the _csv module, it's always helpful to see things fail first then you get a nice feeling of satisfaction when you get the test to pass (or fail less!). So to prove that csv doesn't work in current Jython builds I ran the following test:

bash# jython dist/Lib/test/test_csv.py

which resulted in the following predictable error:


Traceback (innermost last):
File "dist/Lib/test/test_csv.py", line 8, in ?
File "/work/jython/dist/Lib/csv.py", line 7, in ?
ImportError: no module named _csv

As expected csv.py is unable to import the _csv module because it doesn't exist. To create it I followed the guidelines by Charlie Groves in the wiki (which - funnily enough - uses the csv module as an example - what a coincidence!). I created a _csv.java file in $JYTHON_HOME/src/org/python/modules" then added "_csv" to the list of modules in Setup.java. After building Jython and running test_csv.py again I saw the following error:

Traceback (innermost last):
File "dist/Lib/test/test_csv.py", line 8, in ?
File "/work/jython/dist/Lib/csv.py", line 7, in ?
ImportError: cannot import names Error, __version__,
writer, reader, register_dialect, unregister_dialect,
get_dialect, list_dialects, QUOTE_MINIMAL, QUOTE_ALL,
QUOTE_NONNUMERIC, QUOTE_NONE, __doc__

Now, I have a different error, but the fact that the first error has disappeared means that Jython has recognised my new _csv module! The new error is just Jython complaining because _csv doesn't define any of the identifiers that it is expecting.

Some of the missing identifiers are simple to resolve, like __doc__ which is just a string. Others are more difficult and will require further investigation like Error which is an exception and I don't know how to do exceptions in Jython yet. For now, I will just add everything as a PyObject to get past the error, then I will revisit each in turn. Here's _csv.java as it looks after adding all the missing identifiers:

public class _csv {
public static PyObject Error;
public static String __version__ = "1.0";
public static PyObject Dialect;
public static PyObject writer;
public static PyObject reader;
public static PyObject register_dialect;
public static PyObject unregister_dialect;
public static PyObject get_dialect;
public static PyObject list_dialects;
public static PyObject QUOTE_MINIMAL;
public static PyObject QUOTE_ALL;
public static PyObject QUOTE_NONNUMERIC;
public static PyObject QUOTE_NONE;
public static String __doc__;
}

Now, when I run test_csv.py I get the following error:

Traceback (innermost last):
File "dist/Lib/test/test_csv.py", line 8, in ?
File "/work/jython/dist/Lib/csv.py", line 87, in ?
TypeError: call of non-function ('NoneType' object)

Although the error isn't very helpful, I can go to line 87 in csv.py (just by clicking on the error in Eclipse) and see that Jython is unhappy because register_dialect is supposed to be a method yet I have defined it as a PyObject, so now I can forget about the other identifiers and focus on getting this method to work.

This is where this entry ends while I go and figure out how methods and dynamic arguments work in Jython!

Monday, November 20, 2006

Trac: More Than Just a Bug Tracker

Introducing Trac
Trac is an enhanced wiki and issue tracking system for software development projects.

Note: Click each image to see full size

Trac uses a minimalistic approach to web-based software project management. As you can see from the navigation bar above Trac includes a wiki and a bug tracker (where bugs and tasks are referred to as tickets), as well as other less obvious features:
  • Timeline - lists all Trac events that have occurred in chronological order, a brief description of each event and if applicable, the person responsible for the change.
  • Roadmap - provides a view on the ticket system that helps planning and managing the future development of a project.
  • Browse Source - Trac is fully integrated with Subversion - more on this later!
  • Lots More!

Creating a New Ticket
Entering a new ticket is simple. Just select the type, enter a description, select the relevant properties then hit "Submit Ticket".


One of the major advantages of Trac is that it's extremely easy to add new fields to the ticket system.

Integrated Wiki
A fully featured wiki is integrated into the Trac system with fully history and diff support.

Queries
Searches and queries can be done through the SQL-style reports or the more user-friendly "Custom Query" screen show here.


The query interface supports custom fields and the results can be sorted by column.

Project Management
The roadmap section provides an interactive graphical overview of progress for each milestone.

Clicking on the filled part of the bar takes you to a query showing all completed tickets and clicking on the empty part shows all active tickets.

Milestones


Clicking on a particular milestone from the Roadmap will take you to a detailed view showing more statistics, this time for various different properties. You can view tickets by owner, severity, etc.



Timeline
The timeline page shows a chronological list of all events and is a good way to see what's change since your last visit.


It includes all sorts of interesting events from wiki changes to subversion commits to milestone completions. Each event provides a link to more detailed information. For example, an svn commit links directly to a visual diff of the changeset, which neatly brings me onto the next feature...

Changesets
Trac is extremely well integrated with Subversion and provides a nifty diff viewer. Show here is the in-line viewer but you can alter it to show changes side-by-side.


Diffs aren't limited to the previous change - you can do a diff on any revision in the repository as illustrated in the next section.

Source Browser


The source browser lists all the changes in the repository and allows you to compare any two revisions - very powerful!


Links, Links, Links!
Trac provides extensive support for linking to various events and items within the system for both wiki pages and ticket comments.


Linking to source code changes is particularly powerful. When a fix is detected for a particular bug, the developer can easily link to the changeset from the ticket allowing readers to jump to a diff showing exactly what changes were required to fix the bug.

Inline Diffs
If linking to a particular changeset isn't immediate enough for you, then why not display the diff directly in the wiki page or ticket comment?


Summary

For more information on Trac refer to the website. There is also a demo project that you can checkout to evaluate and play around with Trac before downloading.

Tuesday, June 13, 2006

Doxygen Versus Javadoc

As a C++ programmer accustomed to Doxygen I was always curious to learn a language where automatic code generation was taken seriously and supported as standard. When I finally moved over to a Java project I was shocked to discover how obtrusive and "in your face" Javadoc is. It seems to go out of it's way to get in my way!

By comparison Doxygen is about as good as it gets. It is designed to produce great looking documentation with the least amount of developer effort. Javadoc, on the other hand expects developer contribution in areas that I feel are perfect candidates for automation. Take paragraphs for example; Javadoc expects the developer to use the standard HTML paragraph tag <p> in the comments. Why? Why? Why? Surely it would be quite simple to automatically detect an empty line as the start of a new paragraph?

Many Java developers - including the Javadoc development team I'm sure - would take the view that HTML is the obvious choice for Javadoc text formatting and I agree that, at least theoretically it seems an obvious choice. In practice however, there is simply no need for HTML for simple text formatting such as marking text as bold, italic, etc. Using HTML for anything else is overkill in a source comment and only serves to make the comment unreadable in source form.

Examine the following Javadoc comment:


/**
This is <i>the</i> Rectangle class.
<p>
Refer to <a href="./doc-files/shapes-overview.html">
shape-overview</a> for more details.
<p>
There are four types of supported {@link Shape}:
<ul>
<li>{@link Rectangle} (this class)</li>
<li>{@link Circle}</li>
<li>{@link Square}</li>
<li>{@link Triangle}</li>
</ul>
*/

Here is the equivalent comment using Doxygen:

/**
This is <i>the</i> Rectangle class.


Refer to \ref shape-overview for more details.


There are four types of supported Shape:
- Rectangle (this class)
- Circle
- Square
- Triangle
*/


Points to note in this comparison are:

  • Doxygen will automatically recognise all code objects and insert a hyperlink, hence there is no need for a @link tag.

  • Doxygen provides a very convenient shorthand notation for lists.

  • Notice how easy it is to reference another page in the documentation compared to Javadoc's use of the HTML HREF tag.


The most important problem with the Javadoc comment in the comparison is how much I need to concentrate on formatting issues while writing it. When writing Javadoc I am constantly thinking about what should and shouldn't be linked, whether the list will look right, etc. This is frustrating because, while I do want to document my code well I also want to focus on coding. Therefore, due to the effort involved in commenting Javadoc-style, I usually focus on the code while in a heavy development session then I go through and document everything afterwards. I'd much rather document my code incrementally during development, but Javadoc, it seems, almost strives to make this as difficult as possible! Doxygen allows me to use HTML where it works well (marking text as bold, etc.) but also supports convenient shorthand for lists and is intelligent enough to realise that an empty line should be converted into the start of a new paragraph in the generated documentation.

I have provided the generated documentation of the Shape example for both Doxygen and Javadoc so you can decide for yourself which approach you prefer. Note the following Doxygen features:

  • Doxygen provides a hyperlinked graphical class hierarchy although it is initially well hidden! From the main page, select the "Classes" tab, then the "Class Hierarchy" sub-tab then click "Goto the graphical hierarchy".

  • Doxygen will produce a hyperlinked graphical class hierarchy for every class at the top of the page.


  • In the Doxygen page for Rectangle, notice that there is a link to the source code. Doxygen generates a hyperlinked HTML source browser for all source code.


  • Doxygen has a "\todo" command and will automatically generated a hyperlinked todo page. Handy! It supports a bug list and test list.


  • Doxygen provides many more features including full support for graphical class charts, grouping of classes, mathematical formulas, multiple output formats (HTML, LATEX, PDF, man pages, HTMLHelp, etc.).


So what do you think? If you are a Java developer are you surprised how powerful Doxygen is or do you feel that the Javadoc approach is better? I'd be interested to here from developers who really prefer the Javadoc approach as it baffles me, that's for sure!

Tuesday, May 16, 2006

Vim7.0 Released

A new and much improved version of my favourite editor was released last week (I would've posted earlier but my blog didn't exist then!). It includes handy features such as tabbed windows, visual on-the-fly spell checking and an intellisense/auto-complete feature called "omni-completion".

I have played with it a little but haven't been able to get the omni-completion working for Java yet. I will figure it out when I have more time, maybe later in the week.