Wednesday, December 13, 2006

Module Methods and Failing Tests

I have been looking at how CPython handles keyword arguments in methods today. I've had my fair share of experience with Python over the years (though, not so much in the last few months) but I was totally unaware that methods may or may not support keyword arguments! Maybe that's because I often used the PyQt GUI toolkit bindings which didn't support keywords arguments anyway, I'm not sure.

At the end of my last entry my _csv module was in a position where I was ready to implement the register_dialect() method. To do this I needed to figure out how Jython handles arguments as I thought I would need to support keyword arguments for register_dialect() - it's a python method after all and all python methods support keyword arguments don't they? In fact, as it turns out this isn't always the case! Although not explicity mentioned in the documentation, some CPython methods don't support keyword arguments and if you try to use them you will get a TypeError. Indeed, csv.register_dialect() is one such method:


Python 2.3.6 (#1, Nov 17 2006, 22:32:43)
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> csv.register_dialect(dialect=None, name="excel")
Traceback (most recent call last):
File "", line 1, in ?
TypeError: register_dialect() takes no keyword arguments

Presumably register_dialect() behaves like this because it is not really a Python method. The csv.py module just exposes register_dialect() from the C Module but a normal python developer would not know this and would quite rightly expect the method to support keyword arguments. This inconsistency is less than ideal and it's tempting to fix it for Jython but I think that would be a mistake. Jython is supposed to mimic CPython's behaviour whether rightly or wrongly. From a Jython perspective it's right if it's the way CPython behaves.

So, in the case of register_dialect() I can explicitly specify the arguments as follows:

public static void register_dialect(PyObject name, PyObject dialect) {
}

If I try to run test_csv.py now I get the following error:

Traceback (innermost last):
File "dist/Lib/test/test_csv.py", line 9, in ?
ImportError: no module named gc

The test_csv.py module uses the gc module which isn't supported by Jython yet. For now, I have just completely side-tracked this problem by making a copy of test_csv.py and removing all the tests that involve gc! Problem solved (temporarily at least)!

Now, when I run my own copy of test_csv.py without the gc calls I get yet another annoying error:

Traceback (innermost last):
File "test_csv.py", line 363, in ?
File "test_csv.py", line 364, in TestEscapedExcel
File "/jy/dist/Lib/csv.py", line 39, in __init__
None: Dialect did not validate: quoting parameter not set

This, and no doubt many other future cryptic errors are due to the fact that all the identifiers are the wrong type - they are all PyObjects which confuses Jython a great deal. Now is the right time to revisit each identifier and change it to the correct type.

It's worth noting at this stage, my goal for today is to get _csv into a state where it is good enough to fail all tests. Wow, what a statement - lets say that again: I want _csv to be good enough to fail all tests! What a strange goal to aim for. Well, actually once I have _csv in a state where test_csv.py can properly execute I am in a far better position than I was before. I can analyse the output of test_csv and tackle one test at a time, gaining satisfaction and confidence as I go. This is one of the primarily advantages of Test Driven Development and it's surprising how effective it is.

First, I will tackle the methods. Rather than figure out the parameters for each method I have simply specified "PyObject[] args" as the parameter list which just means the method supports 0 or more arguments. For example, I have implemented unregister_dialect() as follows:

static public PyObject unregister_dialect(PyObject[] args) {
return null;
}

For all the QUOTE_xxx identifiers I looked in the _csv.c module and saw they were enums. In Java I just make these separate integers to get them to work initially. I left Error as a PyObject as I will need to spend some time looking at exceptions at a later date. Similarly, I have left Dialect well alone and will look into it when the time is right. Finally, I changed __doc__ and __version__ to empty Strings to complete the process.

With all the identifiers now the correct type, Jython is happy to run test_csv.py. Of course, not many tests pass and there is a lot of output - here's a sample of it:

======================================================================
FAIL: test_reader_arg_valid1 (__main__.Test_Csv)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/jy/dist/Lib/unittest.py", line 229, in __call__
File "test_csv.py", line 19, in test_reader_arg_valid1
File "/jy/dist/Lib/unittest.py", line 295, in failUnlessRaises
AssertionError: TypeError
----------------------------------------------------------------------
Ran 65 tests in 0.666s

FAILED (failures=4, errors=55)
Traceback (innermost last):
File "test_csv.py", line 716, in ?
File "test_csv.py", line 0, in test_main
File "/jy/dist/Lib/test/test_support.py", line 262, in run_unittest
File "/jy/dist/Lib/test/test_support.py", line 246, in run_suite
TestFailed: errors occurred; run in verbose mode for details

I may have 59 failures but this is a much better position than before. I now have something to focus on - I can tackle each test as it comes and gain confidence as the number of failures decrease and the number of passes increase until the porting process is complete. Yippee!

Now I am ready to implement the module proper, the first task is to find out what "c.s.v" stands for! ;) :)

Porting C Modules to Jython

CPython includes many library modules, some of which are written in pure Python (which is great because these will work in Jython (hopefully) without modification), but others are written in C, which means they must be rewritten in Java in order to work with Jython. Take the csv module for example. It is a Python library module so it should be possible to use it in Jython as-is without any extra work. If only it were that that simple! You see, if you look at the code for csv.py you'll notice it uses another module called _csv for most of it's behaviour and it just so happens that _csv is written in C. Therefore, it is necessary to port this module to Jython. It's worth noting that there are many cases where a python library module is just a wrapper for an underlying C module, but not always - for example, cStringIO and cPickle are first-class library modules implemented in C.

Before actually creating the _csv module, it's always helpful to see things fail first then you get a nice feeling of satisfaction when you get the test to pass (or fail less!). So to prove that csv doesn't work in current Jython builds I ran the following test:

bash# jython dist/Lib/test/test_csv.py

which resulted in the following predictable error:


Traceback (innermost last):
File "dist/Lib/test/test_csv.py", line 8, in ?
File "/work/jython/dist/Lib/csv.py", line 7, in ?
ImportError: no module named _csv

As expected csv.py is unable to import the _csv module because it doesn't exist. To create it I followed the guidelines by Charlie Groves in the wiki (which - funnily enough - uses the csv module as an example - what a coincidence!). I created a _csv.java file in $JYTHON_HOME/src/org/python/modules" then added "_csv" to the list of modules in Setup.java. After building Jython and running test_csv.py again I saw the following error:

Traceback (innermost last):
File "dist/Lib/test/test_csv.py", line 8, in ?
File "/work/jython/dist/Lib/csv.py", line 7, in ?
ImportError: cannot import names Error, __version__,
writer, reader, register_dialect, unregister_dialect,
get_dialect, list_dialects, QUOTE_MINIMAL, QUOTE_ALL,
QUOTE_NONNUMERIC, QUOTE_NONE, __doc__

Now, I have a different error, but the fact that the first error has disappeared means that Jython has recognised my new _csv module! The new error is just Jython complaining because _csv doesn't define any of the identifiers that it is expecting.

Some of the missing identifiers are simple to resolve, like __doc__ which is just a string. Others are more difficult and will require further investigation like Error which is an exception and I don't know how to do exceptions in Jython yet. For now, I will just add everything as a PyObject to get past the error, then I will revisit each in turn. Here's _csv.java as it looks after adding all the missing identifiers:

public class _csv {
public static PyObject Error;
public static String __version__ = "1.0";
public static PyObject Dialect;
public static PyObject writer;
public static PyObject reader;
public static PyObject register_dialect;
public static PyObject unregister_dialect;
public static PyObject get_dialect;
public static PyObject list_dialects;
public static PyObject QUOTE_MINIMAL;
public static PyObject QUOTE_ALL;
public static PyObject QUOTE_NONNUMERIC;
public static PyObject QUOTE_NONE;
public static String __doc__;
}

Now, when I run test_csv.py I get the following error:

Traceback (innermost last):
File "dist/Lib/test/test_csv.py", line 8, in ?
File "/work/jython/dist/Lib/csv.py", line 87, in ?
TypeError: call of non-function ('NoneType' object)

Although the error isn't very helpful, I can go to line 87 in csv.py (just by clicking on the error in Eclipse) and see that Jython is unhappy because register_dialect is supposed to be a method yet I have defined it as a PyObject, so now I can forget about the other identifiers and focus on getting this method to work.

This is where this entry ends while I go and figure out how methods and dynamic arguments work in Jython!