Saturday, December 30, 2006

Handling Method Arguments in Jython

I have been away from Jython for a while over the Christmas period but now I am back and eager to make some progress porting the csv module. So where was I? Oh yes, last time I managed to get into a position where test_csv.py ran successfully (success meaning that the test script executed - not that the tests passed!), so now I can start trying to get a test to pass.

I have decided to tackle test_reader_arg_valid1() test first simply because it's the first test in test_csv.py:


def test_reader_arg_valid1(self):
self.assertRaises(TypeError, csv.reader)
self.assertRaises(TypeError, csv.reader, None)
self.assertRaises(AttributeError, csv.reader, [], bad_attr = 0)
self.assertRaises(csv.Error, csv.reader, [], 'foo')
class BadClass:
def __init__(self):
raise IOError
self.assertRaises(IOError, csv.reader, [], BadClass)
self.assertRaises(TypeError, csv.reader, [], None)
class BadDialect:
bad_attr = 0
self.assertRaises(AttributeError, csv.reader, [], BadDialect)

As you can see, a series of tests are being performed on the csv.reader() method so I need to concentrate on implementing just enough of it to get the test to pass.

From the python documentation, csv.reader() is defined as follows:

reader(csvfile[, dialect='excel'[, fmtparam]])

csvfile is the only required argument and it can be any object that supports the iterator protocol. Next, the dialect name can be specified as an optional parameter or omitted (in which case, the dialect will default to excel). The other optional fmtparam keyword arguments can be given to override individual formatting parameters in the current dialect.

So csv.reader() has it all - mandatory, optional and keyword arguments. I am going to need to figure out how this works in Jython to pass the test_reader_arg_valid1() test.

In Jython three types of method are supported:

  • StandardCall: Mandatory, Positional arguments. (i.e. void method(PyObject arg1, PyObject arg2) {} )

  • PyArgsCall: List of optional, positional arguments. (i.e. void method(PyObject[] args))

  • PyArgsKeywordsCall: List of optional, positional or keyword arguments. (i.e. void method(PyObject[] args, String[] keywords))


As csv.reader() must support keyword arguments, it must be defined as follows:

public static void reader(PyObject[] args, String[] keywords) {
}

The parameter list must be specified exactly like this (except for the identifier names which can differ) because Jython uses reflection to make a method of type PyArgsKeywordsCall only if it has exactly one PyObject array as the first argument and one String array as the second argument. If you add another argument to the beginning of the parameter list, then the method will automatically be of StandardCall type and won't support keyword arguments.

I can use the handy helper class, ArgParser to parse the arguments and extract the relevant values. First, I need to create an instance of ArgParser as follows:

public static PyObject reader(PyObject[] args, String[] keywords) {

ArgParser ap = new ArgParser(
"reader",
args,
keywords,
new String[] {
"csvfile", "dialect", "delimiter",
"doublequote", "escapechar", "lineterminator",
"quotechar", "quoting", "skipinitialspace"
});
//..
}

args and keywords are passed into ArgParser along with a list of the names of each argument that the method supports. Then it is possible to simply pick out the value of a parameter by invoking a getXXX() method specifying the position of the argument. So to get the value of "quotechar", you'd ask for the value at position 6 as follows:

String quotechar = ap.getString(6, "'");

Simple, eh? It is equally as easy to support optional arguments. For example, the dialect argument would be extracted as follows:

String dialect = ap.getString(1, "excel");

Here, if dialect is not specified then it will default to "excel".

Now that I have learned how to support positional, optional and keyword arguments in Jython I can focus on type checking the arguments and throwing the appropriate exceptions in order to pass test_reader_arg_valid1().

Wednesday, December 13, 2006

Module Methods and Failing Tests

I have been looking at how CPython handles keyword arguments in methods today. I've had my fair share of experience with Python over the years (though, not so much in the last few months) but I was totally unaware that methods may or may not support keyword arguments! Maybe that's because I often used the PyQt GUI toolkit bindings which didn't support keywords arguments anyway, I'm not sure.

At the end of my last entry my _csv module was in a position where I was ready to implement the register_dialect() method. To do this I needed to figure out how Jython handles arguments as I thought I would need to support keyword arguments for register_dialect() - it's a python method after all and all python methods support keyword arguments don't they? In fact, as it turns out this isn't always the case! Although not explicity mentioned in the documentation, some CPython methods don't support keyword arguments and if you try to use them you will get a TypeError. Indeed, csv.register_dialect() is one such method:


Python 2.3.6 (#1, Nov 17 2006, 22:32:43)
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> csv.register_dialect(dialect=None, name="excel")
Traceback (most recent call last):
File "", line 1, in ?
TypeError: register_dialect() takes no keyword arguments

Presumably register_dialect() behaves like this because it is not really a Python method. The csv.py module just exposes register_dialect() from the C Module but a normal python developer would not know this and would quite rightly expect the method to support keyword arguments. This inconsistency is less than ideal and it's tempting to fix it for Jython but I think that would be a mistake. Jython is supposed to mimic CPython's behaviour whether rightly or wrongly. From a Jython perspective it's right if it's the way CPython behaves.

So, in the case of register_dialect() I can explicitly specify the arguments as follows:

public static void register_dialect(PyObject name, PyObject dialect) {
}

If I try to run test_csv.py now I get the following error:

Traceback (innermost last):
File "dist/Lib/test/test_csv.py", line 9, in ?
ImportError: no module named gc

The test_csv.py module uses the gc module which isn't supported by Jython yet. For now, I have just completely side-tracked this problem by making a copy of test_csv.py and removing all the tests that involve gc! Problem solved (temporarily at least)!

Now, when I run my own copy of test_csv.py without the gc calls I get yet another annoying error:

Traceback (innermost last):
File "test_csv.py", line 363, in ?
File "test_csv.py", line 364, in TestEscapedExcel
File "/jy/dist/Lib/csv.py", line 39, in __init__
None: Dialect did not validate: quoting parameter not set

This, and no doubt many other future cryptic errors are due to the fact that all the identifiers are the wrong type - they are all PyObjects which confuses Jython a great deal. Now is the right time to revisit each identifier and change it to the correct type.

It's worth noting at this stage, my goal for today is to get _csv into a state where it is good enough to fail all tests. Wow, what a statement - lets say that again: I want _csv to be good enough to fail all tests! What a strange goal to aim for. Well, actually once I have _csv in a state where test_csv.py can properly execute I am in a far better position than I was before. I can analyse the output of test_csv and tackle one test at a time, gaining satisfaction and confidence as I go. This is one of the primarily advantages of Test Driven Development and it's surprising how effective it is.

First, I will tackle the methods. Rather than figure out the parameters for each method I have simply specified "PyObject[] args" as the parameter list which just means the method supports 0 or more arguments. For example, I have implemented unregister_dialect() as follows:

static public PyObject unregister_dialect(PyObject[] args) {
return null;
}

For all the QUOTE_xxx identifiers I looked in the _csv.c module and saw they were enums. In Java I just make these separate integers to get them to work initially. I left Error as a PyObject as I will need to spend some time looking at exceptions at a later date. Similarly, I have left Dialect well alone and will look into it when the time is right. Finally, I changed __doc__ and __version__ to empty Strings to complete the process.

With all the identifiers now the correct type, Jython is happy to run test_csv.py. Of course, not many tests pass and there is a lot of output - here's a sample of it:

======================================================================
FAIL: test_reader_arg_valid1 (__main__.Test_Csv)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/jy/dist/Lib/unittest.py", line 229, in __call__
File "test_csv.py", line 19, in test_reader_arg_valid1
File "/jy/dist/Lib/unittest.py", line 295, in failUnlessRaises
AssertionError: TypeError
----------------------------------------------------------------------
Ran 65 tests in 0.666s

FAILED (failures=4, errors=55)
Traceback (innermost last):
File "test_csv.py", line 716, in ?
File "test_csv.py", line 0, in test_main
File "/jy/dist/Lib/test/test_support.py", line 262, in run_unittest
File "/jy/dist/Lib/test/test_support.py", line 246, in run_suite
TestFailed: errors occurred; run in verbose mode for details

I may have 59 failures but this is a much better position than before. I now have something to focus on - I can tackle each test as it comes and gain confidence as the number of failures decrease and the number of passes increase until the porting process is complete. Yippee!

Now I am ready to implement the module proper, the first task is to find out what "c.s.v" stands for! ;) :)

Porting C Modules to Jython

CPython includes many library modules, some of which are written in pure Python (which is great because these will work in Jython (hopefully) without modification), but others are written in C, which means they must be rewritten in Java in order to work with Jython. Take the csv module for example. It is a Python library module so it should be possible to use it in Jython as-is without any extra work. If only it were that that simple! You see, if you look at the code for csv.py you'll notice it uses another module called _csv for most of it's behaviour and it just so happens that _csv is written in C. Therefore, it is necessary to port this module to Jython. It's worth noting that there are many cases where a python library module is just a wrapper for an underlying C module, but not always - for example, cStringIO and cPickle are first-class library modules implemented in C.

Before actually creating the _csv module, it's always helpful to see things fail first then you get a nice feeling of satisfaction when you get the test to pass (or fail less!). So to prove that csv doesn't work in current Jython builds I ran the following test:

bash# jython dist/Lib/test/test_csv.py

which resulted in the following predictable error:


Traceback (innermost last):
File "dist/Lib/test/test_csv.py", line 8, in ?
File "/work/jython/dist/Lib/csv.py", line 7, in ?
ImportError: no module named _csv

As expected csv.py is unable to import the _csv module because it doesn't exist. To create it I followed the guidelines by Charlie Groves in the wiki (which - funnily enough - uses the csv module as an example - what a coincidence!). I created a _csv.java file in $JYTHON_HOME/src/org/python/modules" then added "_csv" to the list of modules in Setup.java. After building Jython and running test_csv.py again I saw the following error:

Traceback (innermost last):
File "dist/Lib/test/test_csv.py", line 8, in ?
File "/work/jython/dist/Lib/csv.py", line 7, in ?
ImportError: cannot import names Error, __version__,
writer, reader, register_dialect, unregister_dialect,
get_dialect, list_dialects, QUOTE_MINIMAL, QUOTE_ALL,
QUOTE_NONNUMERIC, QUOTE_NONE, __doc__

Now, I have a different error, but the fact that the first error has disappeared means that Jython has recognised my new _csv module! The new error is just Jython complaining because _csv doesn't define any of the identifiers that it is expecting.

Some of the missing identifiers are simple to resolve, like __doc__ which is just a string. Others are more difficult and will require further investigation like Error which is an exception and I don't know how to do exceptions in Jython yet. For now, I will just add everything as a PyObject to get past the error, then I will revisit each in turn. Here's _csv.java as it looks after adding all the missing identifiers:

public class _csv {
public static PyObject Error;
public static String __version__ = "1.0";
public static PyObject Dialect;
public static PyObject writer;
public static PyObject reader;
public static PyObject register_dialect;
public static PyObject unregister_dialect;
public static PyObject get_dialect;
public static PyObject list_dialects;
public static PyObject QUOTE_MINIMAL;
public static PyObject QUOTE_ALL;
public static PyObject QUOTE_NONNUMERIC;
public static PyObject QUOTE_NONE;
public static String __doc__;
}

Now, when I run test_csv.py I get the following error:

Traceback (innermost last):
File "dist/Lib/test/test_csv.py", line 8, in ?
File "/work/jython/dist/Lib/csv.py", line 87, in ?
TypeError: call of non-function ('NoneType' object)

Although the error isn't very helpful, I can go to line 87 in csv.py (just by clicking on the error in Eclipse) and see that Jython is unhappy because register_dialect is supposed to be a method yet I have defined it as a PyObject, so now I can forget about the other identifiers and focus on getting this method to work.

This is where this entry ends while I go and figure out how methods and dynamic arguments work in Jython!