skip to navigation
skip to content

Planet Jython

Last update: July 04, 2008 05:47 PM

June 27, 2008


Frank Wierzbicki

Welcome Leonardo Soto, Jython's Newest Committer

I just flipped Leonardo Soto's commit bit this morning. Leo has been working with the Jython project for about a year now. He was instrumental in getting Django running on Jython, submitting patches to both the Django project and to the Jython project. Many of the patches to the Jython project fixed features in the very core of Jython that prevented Django from running. Leo continues his work on Django on Jython this summer with a Google Summer of Code grant.

Please join me in welcoming Leo aboard!

June 27, 2008 11:20 AM


A. Sundararajan

Working from an office -- for a change!

I work from home in Chennai, India. There is maintenance power shutdown in my part of the city today [from 9.00 AM to 5.00 PM). I'm writing this blog from a Sun office in Apeejay Business Centre, Chennai. It is nice to be in an office after quite some time - at least as a change! But, I think I'd rather prefer to avoid travel, preparation to go office etc. every day :-)

June 27, 2008 06:32 AM

June 26, 2008


A. Sundararajan

BTrace aggregations - contribution from community

If you have used DTrace, chances are that you have used aggregations. For performance issues, aggregated data is often more useful than individual data points. With BTrace, aggregating data is bit painful (you have to manage using Maps explicitly). It would be nice to have DTrace-style aggregation functions such as sum, max, min and so on. Glencross, Christian M (cited in my previous entry) has contributed code changes, doc and a sample for easy-to-use aggregation facility for BTrace. Please refer to the sample code (JdbcQueries.java) that demonstrates aggregations.

Now something unrelated to aggregations, but related to BTrace : I came to know about another use-case of BTrace. See also http://blog.igorminar.com/2008/06/btrace-dtrace-for-java.html

June 26, 2008 06:58 AM

June 24, 2008


Jim Baker

Django on Jython: Minding the Gap

Summary

The most important thing to know about Django on Jython is that we are almost there, and with clean code. End-to-end functionality is demonstrated by the admin tool running in full CRUD, along with a substantial number of unit tests and syncdb. But this has been achieved by so far requiring only 6 lines of code in changes to Django trunk. (There will be more, however, see below.)

Running on Jython

To run Django on Jython, with a PostgreSQL backend, the following steps are necessary:

Status

Here's what works:

syncdb and the very cool Django admin run; many unit tests pass. You can run with internationalization enabled. You do need to run the dev server with --noreload for now. We need to document here how to run with modjy, which is Alan Kennedy's servlet container for WSGI apps.

In running the model unit tests, here are the things we seem to be missing, accounting for most of the approximately 75 failures:

There may be some other rough categories, we need to look at the failures more systematically. All that doctest noise is certainly annoying!

Next Steps

On the Django front, get more of the unit tests running!

Before we can push modern into trunk, the following needs to be done:

Updates - 2008-06-24: I should have put this up a while ago, but Django on Jython is becoming a reality. Most importantly, Leo Soto is working with me through the Google Summer of Code on this project. The modern branch was merged into a trunk earlier this year, and has since been retired. CHM in fact has the right semantics, something I may discuss at a future point. Django has redone the doctest dict literals that were causing problems, and Leo provided a general solution when used with XML/XHTML.

June 24, 2008 02:02 PM

Realizing Jython 2.5

Jython 2.5 is really, finally, unbelievably coming together. This is the next release of Jython, after last summer's 2.2. In a nutshell, we have completed all new language features using an Antlr parser, except for absolute imports. All bytecode generation work, now using an ASM backend, is done. Of course, there are many outstanding bugs. And Python is not just a language; we need to support fully the fact that "batteries are included". But let's look at where we are. Through the prism of what's new in 2.3, 2.4, and 2.5, here's what working:
  • 2.3: sets (PEP 218), generators (255), source code encoding (263), universal newline (278), enumerate (279), logging (282), Boolean (285), distutils (301), new import hooks (302), pickle enhancements (307), extended slices, datetimes, optparse. Still to go: csv, removing a dictionary in builtin that ensures that interned strings don't get GC'ed (pre-2.3 behavior!, it helps to read what's new). Also various string, Unicode, and regex changes are mostly done in a separate utf16 branch that I'm currently in the midst of merging against trunk.

  • 2.4: unifying long integers (237), generator expressions (289), string.Template (292, but also needs new utf16 work), decorators (318), reverse iteration (322), subprocess module (324), multi-line imports (328), removal of OverflowWarning, min & max with keyword support, sorted. But we still need partial import with sys.modules, and I'm sure some more stuff I forgot. Decimal and -m support are working in student branches, we just need to incorporate.

  • 2.5: conditional expressions (308), partial functional (309, but we're cheating with a pure-Python version), distutils metadata (314), unified try/except/finally (341), coroutines and other generator functionality (342), with-statement, including contextlib (343), any, all. But we haven't done the exceptions remapping to new-style classes, absolute and relative imports, or all of the context manager support, such as in file. ctypes was a proposed Google Summer of Code project, but apparently PyPy has some work that's 95% the way there; we will talk with them at EuroPython. We need to look into what is necessary to make ElementTree work. sqlite3 depends on ctypes. As I was writing this, I tried out wsgiref; it works and I just committed it to the asm branch. (At some point, we will repoint everything like this to CPythonLib, but for now we are mixing it up as we go. Bear with us!)

Even quit() and exit() now work; I don't know when these oh-so-major features were added. We even now support large string constants. And of course, who can forget our support for the GIL (global interpreter lock) in Jython, something that Tobias Ivarsson, my Google Summer of Code student who is now working on an advanced compiler, added to __future__ as an Easter egg:

>>> from __future__ import GIL
Traceback (most recent call last):
(no code object) at line 0
File "", line 0
SyntaxError: Never going to happen!
I would imagine that's definitive, we go against Java's native threads and compile to Java bytecode. It would be hard to have a GIL, even if we wanted one.

However, we are just turning the corner. The The Antlr parser in the asm branch currently does not support partial parses, and this breaks not only interactive sessions but doctests. Until this is solved - and Frank Wierzbicki is working like mad on this - we can't merge this branch onto trunk. But that should happen very soon.With few exceptions, we simply go against the standard Python unit tests. Straightforward, cunning, or devious, we have labored against these unit tests. And in others, we have used Python as our foil: we support the same 2.5 AST parse tree, and we know this by comparing our parses with CPython's for all of the standard library - including those unit tests.

There's a lot more going on. I can't say enough about the work done by Charlie Groves, Philip Jenvey, Alan Kennedy, Nicholas Riley, and others to make this happen. Leo Soto, my other GSoC student, is making amazing progress on supporting Django on Jython, while finding and fixing bugs in Jython itself. Supporting Django forces us to find those gaps in compatibility. Similar efforts are going on with Pylons, TurboGears 2 (Ariane Paola, GSoC), and Zope (Georgy Berdyshev, GSoC). I'm also working on greenlet/Stackless support and involved in a collaboration with Jeremy Siek and Joe Angell at the University of Colorado to add gradual typing (yes types! but only when you want to) to Jython. We have a T2000 contributed by Sun to let us see how much concurrency - in this case 32 hardware threads, 64 GB of memory - Jython can take advantage of. And so on.

Back to work!

Updates - 2008-06-24: we have support for new-style exceptions, the parser is now usable (but there are a couple of bugs left there), and Unicode support has been updated to UTF-16. See this posting, Flipping the 2.5 Bit for Jython.

June 24, 2008 01:49 PM

Flipping the 2.5 Bit for Jython

Something worth pointing out; as of 8 AM this morning (MDT) in rev 4748, Frank Wierzbicki flipped the bits and pronounced this about the ASM branch:

jbaker:~/jythondev/asm jbaker$ dist/bin/jython
Jython 2.5a0+ (asm:4750, Jun 24 2008, 10:56:16)
[Java HotSpot(TM) Client VM ("Apple Computer, Inc.")] on java1.5.0_13
Type "help", "copyright", "credits" or "license" for more information.
>>>

Yesterday there were easily the most commits we have seen in the Jython project. The real threshold was reached when we incorporated the UTF-16 and new-style exception branches into this branch, fixed the grammar to support most incremental parses, while repointing the standard library to CPythonLib 2.5. Along with a flurry of other fixes!

There's a lot more to go, but this should be an encouraging sign for everyone interested in Jython!

June 24, 2008 12:10 PM

Adopting UTF-16

Jython 2.5 standardizes on Java 5 as the base version for its implementation. Jython has always mapped both unicode and str types to java.lang.String, but the semantics of String changed as of Java 5. Instead of encoding characters as UCS-2, that is just the basic multlingual plane of 65536 code points, Java - like .Net - adopted the UTF-16 encoding. UTF-16 can represent all 1114112 Unicode code points (U+0 to U+10FFFF), except for isolated surrogates (U+D800 to U+DFFF). These surrogates act as escape characters in the UTF-16 encoding.

This makes things somewhat more complicated, to put it mildly. And this is without even considering combining characters!

Instead of a simple uniform encoding that we see in the narrow (UCS-2) or wide (UCS-4) builds of CPython, we get a variable-length encoding. And unlike UTF-8, it's usually not too efficient. In addition, we lose the ability to represent the isolated surrogates. Finally, because UTF-16 is so very close to UCS-2, it's prone to bugs.

Here's the implementation strategy we adopted. In supporting the unicode type with PyUnicode, we first determine if it's in the basic plane or not:


private enum Plane {
UNKNOWN, BASIC, ASTRAL
}

private volatile Plane plane = Plane.UNKNOWN;

public boolean isBasicPlane() {
if (plane == Plane.BASIC) {
return true;
} else if (plane == Plane.UNKNOWN) {
plane = (string.length() == getCodePointCount()) ?
Plane.BASIC : Plane.ASTRAL;
}
return plane == Plane.BASIC;
}

getCodePointCount is in turn implemented using String#codePointCount. Like other code point methods, it decodes any surrogate pairs.

String immutability means we can cache the result in the volatile field plane; idempotence of this operation ensures consistency. This allows us to equate code units (char) to code points (int), and use the implementations provided by PyString. As it turns out, this was always done before, the only difference between str and unicode was in the encoding rules.

In the rather rare case it isn't, we read with our SubsequenceIteratorImpl (which does a decode and then moves forward in the string, rather useful) or String#codePointAt and write with StringBuilder#appendCodePoint using iterators. A seemingly good alternative would be to use String#offsetByCodePoints. Too bad it doesn't reliably work. So instead we have our iterator implementations, lots and lots of them. And sometimes crazy stuff like this, seen in the implementation of PyUnicode#unicode_strip:

        return new PyUnicode(new ReversedIterator(
new StripIterator(sep,
new ReversedIterator(
new StripIterator(sep,
newSubsequenceIterator())))));
If strip method was used extensively on strings that weren't in the basic plane, it might make sense to rewrite this to decode to an int[] buffer. But that's not likely to be case.

That's also the reason we avoid making the basic plane test unless we have to. There are many situations where Unicode can pass in and out of Jython - specifically to/from Java - without us caring about what planes its characters are drawn from. We assume some overhead from boxing with PyUnicode (although HotSpot mitigates the indirection cost), but we don't have to overdo it by computing this test on construction.When comparing this with CPython, we do lose the ability to include isolated surrogate code points in Unicode strings. There are even some unit tests for this case. But ultimately this seemed like an implementation detail like testing ref counting, one certainly not worth time spent supporting.

It's worth mentioning that one alternative is to create our own representation, much like JRuby. Ruby's strings are mutable, unlike Python's. This forced the issue for the JRuby developers, because Ruby, like Python, needs good string performance. So JRuby uses byte arrays for strings, although they do use UTF-16 encoded, interned java.lang.String's to uniquely represent symbols (:xyz). Given that symbols are not strings, this works well. Ruby doesn't say anything about the encoding of such strings (ouch!), but JRuby does assume they're UTF-8 encoded when crossing the boundary with Java.

Supporting widened Unicode means having support for this in regular expressions. The first step was to just widen the SRE engine used by Jython to represent characters with int instead of short. So we always unpack to int in this case; see strip above. This engine is a direct translation of the CPython equivalent: it's a mini-VM, much like the pickle VM, and regexes are compiled to SRE bytecode. In the future, we may consider using JRuby's implementation (Joni, a port of Oniguruma to Java), but the devil is in supporting some specifics to Python. As was seen in the CPython case, it was quite straightforward to just doing the widening.

At this point, the biggest outstanding issue is backporting the changes to SRE to support wide character classes (aka big character sets), a pickle problem, as well as various bug fixes. A total of four test cases are currently failing in test_re in the asm branch.

And then that's it, at least until we start doing performance profiling.

June 24, 2008 11:49 AM

June 21, 2008


Frank Wierzbicki

EuroPython - Anyone have some cool demos?

So Jim Baker and I have a talk at this years EuroPython Cool Stuff With Jython and I was hoping that some of you have cooler stuff than I do :). I'll be sure to give credit for demos I show -- and I'll promise a t-shirt for any demo code that I use -- though I have yet to have any made (stealing the idea from the JRuby guys) -- you'll have to wait until I have them :). The best place to send demo ideas and code is to the jython-dev or jython-users mailing list. I'll watch the comments here too of course.

June 21, 2008 11:52 AM

Jython 2.5 Approaches an Alpha Release

Jim Baker just published some great analysis of the remaining issues that are left before releasing an alpha of Jython 2.5. I'll add that we need to get re/sre support fixed so that we can run the 2.5 Lib tests and pull in CPython's 2.5 Lib. I just checked in some grammar changes that make the asm branch interpreter much better. So close... stay tuned!

June 21, 2008 11:48 AM

June 19, 2008


Ed Taekema

Good Low Carb Resources

Videos

Articles

Book

June 19, 2008 04:48 AM

June 16, 2008


A. Sundararajan

BTrace in the real world

In the last few weeks, I came to know about two cases of real world use of BTrace.

  1. Glencross, Christian M (his blog?) wrote about attempting to write a script to track SQL statements executed by a Java application (private email). Thanks to him for permitting me to blog about his BTrace script. I've made few formatting changes to fit his code in this blog and added few explanatory comments (staring with "VERBOSE:").
    
    
    import static com.sun.btrace.BTraceUtils.*;
    
    import java.sql.Statement;
    import java.util.Map;
    import java.util.concurrent.atomic.AtomicLong;
    
    import com.sun.btrace.*;
    import com.sun.btrace.annotations.*;
    
    /**
     * BTrace script to print timings for all executed JDBC statements on an event.
     * <p>
     * 
     * @author Chris Glencross
     */
    @BTrace
    public class JdbcQueries {
    
        private static Map preparedStatementDescriptions = newWeakMap();
    
        private static Map statementDurations = newHashMap();
    
        // VERBOSE: @TLS makes the field "thread local" -- sort of like using java.lang.ThreadLocal
        @TLS
        private static String preparingStatement;
    
        @TLS
        private static long timeStampNanos;
    
        @TLS
        private static String executingStatement;
    
        /**
         * If "--stack" is passed on command line, print the Java stack trace of the JDBC statement.
         *
         * VERBOSE: Command line arguments to BTrace are accessed as $(N) where N is the command line arg position.
         * 
         * Otherwise we print the SQL.
         */
        private static boolean useStackTrace = $(2) != null && strcmp("--stack", $(2)) == 0;
    
        // The first couple of probes capture whenever prepared statement and callable statements are
        // instantiated, in order to let us track what SQL they contain.
    
        /**
         * Capture SQL used to create prepared statements.
         *
         * VERBOSE: +foo in clazz means foo and it's subtypes. Note the use of regular expression
         * for method names. With that BTrace matches all methods starting with "prepare". The
         * type "AnyType" matches any Java type.
         * 
         * @param args - the list of method parameters. args[1] is the SQL.
         */
        @OnMethod(clazz = "+java.sql.Connection", method = "/prepare.*/")
        public static void onPrepare(AnyType[] args) {
            preparingStatement = useStackTrace ? jstackStr() : str(args[1]);
        }
    
        /**
         * Cache SQL associated with a prepared statement.
         *
         * VERBOSE: By default, @OnMethod matches method entry points. Modifying with @Location 
         * annotation to match the method return points.
         * 
         * @param arg - the return value from the prepareXxx() method.
         */
        @OnMethod(clazz = "+java.sql.Connection", method = "/prepare.*/", location = @Location(Kind.RETURN))
        public static void onPrepareReturn(AnyType arg) {
            if (preparingStatement != null) {
                print("P"); // Debug Prepared
                Statement preparedStatement = (Statement) arg;
                put(preparedStatementDescriptions, preparedStatement, preparingStatement);
                preparingStatement = null;
            }
        }
    
        // The next couple of probes intercept the execution of a statement. If it execute with no-args,
        // then it must be a prepared statement or callable statement. Get the SQL from the probes up above.
        // Otherwise the SQL is in the first argument.
    
        @OnMethod(clazz = "+java.sql.Statement", method = "/execute.*/")
        public static void onExecute(AnyType[] args) {
            timeStampNanos = timeNanos();
            if (args.length == 1) {
                // No SQL argument; lookup the SQL from the prepared statement
                Statement currentStatement = (Statement) args[0]; // this
                executingStatement = get(preparedStatementDescriptions, currentStatement);
            } else {
                // Direct SQL in the first argument
                executingStatement = useStackTrace ? jstackStr() : str(args[1]);
            }
        }
    
        @OnMethod(clazz = "+java.sql.Statement", method = "/execute.*/", location = @Location(Kind.RETURN))
        public static void onExecuteReturn() {
    
            if (executingStatement == null) {
                return;
            }
    
            print("X"); // Debug Executed
    
            long durationMicros = (timeNanos() - timeStampNanos) / 1000;
            AtomicLong ai = get(statementDurations, executingStatement);
            if (ai == null) {
                ai = newAtomicLong(durationMicros);
                put(statementDurations, executingStatement, ai);
            } else {
                addAndGet(ai, durationMicros);
            }
    
            executingStatement = null;
        }
    
        // VERBOSE: @OnEvent probe fires whenever BTrace client sends "event" command.
        // The command line BTrace client sends BTrace events when user pressed Ctrl-C 
        // (more precisely, on receiving SIGINT signal)
        @OnEvent
        public static void onEvent() {
            println("---------------------------------------------");
            printNumberMap("JDBC statement executions / microseconds:", statementDurations);
            println("---------------------------------------------");
        }
    
    }
    
    

    And he has expressed few wish lists for BTrace based on his experience with DTrace. We plan to investigate those items in near future.



  2. Binod P.G exchanged private e-mails about BTrace usage to track down a memory leak. Subsequently, he has blogged about the same.

June 16, 2008 04:12 AM