Pimp my Java!

There seem to be lots of new programming languages popping up at the moment. Clojure, Scala, Opa, Dart, Go, Rust, and so on. I’m always a fan of new programming languages, as I feel there is still so much untapped innovation to explore. However, it can take years, or even decades for a new language to reach commercial acceptance (a sad reality). In the meantime, mainstream languages can incorporate proven features from more experimental cousins. Java, for instance, has adopted generics and annotations, both features I have some fondness for. In the spirit of encouraging further progress I offer the following as a list of largely conservative enhancements to Java that are in keeping with it’s fundamental design as an imperative object-oriented language. I don’t wish to turn Java into Haskell, because Haskell is already good enough.

So, here’s my top five “easy” extensions to Java:

tl;dr summary

For those who do not have time to read a ~3,300 word article, here’s an executive summary. Essentially, I propose 5 simple extensions to Java that naturally extend existing mechanisms to make certain programming tasks simpler:

  1. AutoCloseable iterators (and for-each support) to allow cleaning up of resources even in case of early termination;
  2. Ability for iterators to throw checked exceptions to improve exception safety, via new CheckedIterator and CheckedIterable interfaces (super-interfaces of the existing unchecked versions);
  3. Changing throw to be an expression rather than a statement;
  4. Allowing for-each to iterate over multiple iterators in parallel;
  5. Support for generator methods allowing much simpler definition of custom iterators (including a 100% pure Java implementation that you can try right now).

As a result of these changes we demonstrate how these changes radically simplify writing producer-consumer designs with a simple example. We also compare and contrast the approach to the recent Java Lambda proposal (currently scheduled for Java 8) and argue that, while lambdas are a welcome addition to the language, the emphasis on internal iterators in the Collections API is limited. Generator methods represent a much nicer approach, that both fits better with the existing iterator model and generalises more easily to iterating over multiple collections at once.

For those who want to see the final code, here it is. Firstly, our “consumer” program that prints out a line-numbered listing of some text file:

public void printListing(final TextFile file) throws IOException {
    for (final String line : file.lines();
         final int lineNum : Integers.from(1)) {
        System.out.printf("%5d %s%n", lineNum, line);
    }
}

And the corresponding “producer” backend:

public class TextFile extends java.io.File {
    public TextFile(final String path) { super(path); }
    public BufferedReader reader() throws FileNotFoundException {
        return new BufferedReader(new FileReader(this));
    }
    public Generator<String,IOException> lines() {
        return new Generator<String,IOException>() {
            public void generate() throws IOException {
                String line;
                try (final BufferedReader in : reader()) {
                    while ((line = in.readLine()) != null)
                        yield(line);
                }
            }
        };
    }
}

Now on to the main article:

1. AutoCloseable iterators and for-each.

The first enhancement is a simple one: that the for-each construct should also respect the AutoCloseable interface as for try-with-resources. That is, if the Iterator consumed in the for-each loop implements AutoCloseable then the close() method should be called whenever the loop is exited – whether normally, or by break or return. For example, suppose the java.io.File class had a lines() method that returns an Iterable<String> to allow iterating over the lines of the file. We might try and implement this as follows (note: this code is deliberately rough; we will be successively refining it with new features):

/**
 * Extension of {@link java.io.File} that provides a convenience
 * method for iterating over the lines of text in the file.
 *
 * @author Neil Madden
 */
public class TextFile extends java.io.File {

    private static final long serialVersionUID = 1L;

    public TextFile(final String path) {
        super(path);
    }

    /**
     * Get a buffered reader for the file using the platform
     * default character set.
     */
    public BufferedReader reader() throws FileNotFoundException {
        return new BufferedReader(new FileReader(this));
    }

    /**
     * Get a buffered reader for the file using the given character
     * set.
     * @param charset the character set to use.
     */
    public BufferedReader reader(final Charset charset) 
    throws FileNotFoundException {
        return new BufferedReader(
               new InputStreamReader(
               new FileInputStream(this, charset)));
    }

    /**
     * Return an {@link Iterable} view of the lines in the text file.
     */
    public Lines lines() throws FileNotFoundException {
        return new Lines(this.reader());
    }

    /**
     * Return an {@link Iterable} view of the lines in the text file,
     * using the given character set.
     */
    public Lines lines(final Charset charset) throws FileNotFoundException {
        return new Lines(this.reader(charset));
    }

    /**
     * Simple iterable view of the lines in a text file.
     */
    public static class Lines implements Iterable<String> {
        private final BufferedReader in;

        public Lines(final BufferedReader in) {
            this.in = in;
        }

        @Override
        public LineIterator iterator() {
            return new LineIterator(in);
        }
    }

    /**
     * {@link Iterator} over the lines in a text file.
     */
    public static class LineIterator implements Iterator<String> {

        private final BufferedReader in;
        private String next;

        public LineIterator(final BufferedReader in) {
            this.in = in;
        }

        @Override
        public boolean hasNext() {
            if (next == null) {
                try {
                    next = in.readLine();
                    if (next == null)
                        in.close();
                }
                catch (final IOException ex) {
                    RuntimeException toThrow = new RuntimeException(ex);
                    try { in.close(); }
                    catch (final IOException ex2) {
                        toThrow = new RuntimeException(ex2);
                        toThrow.addSuppressed(ex);
                    }
                    throw toThrow;
                }
            }
            return (next != null);
        }

        @Override
        public String next() {
            try {
                if (hasNext()) {
                    return next;
                } else {
                    throw new NoSuchElementException();
                }
            }
            finally {
                next = null;
            }
        }

        @Override
        public void remove() {
            throw new UnsupportedOperationException("Not supported.");
        }
    }
}

Wow! What a lot of work! OK, we’ve probably done more than we strictly need to, in pursuit of a nice API, but still — this is an absurd amount of code for such a basic task. We’re also not really handling exceptions very well; simply re-throwing them as generic runtime exceptions. We’ll come back to these issues, and show how the new features proposed result in a dramatic reduction in line count, together with an increase in exception safety. For now, let’s consider how we would use this class to print a simple code listing with line numbers:

public void printListing(final TextFile file) throws IOException {
    int lineNum = 0;
    for (final String line : file.lines()) {
        System.out.printf("%5d %s%n", ++lineNum, line);
    }
}

This seems nice enough, but what if we want to exit the loop early using break or return? The iterator is not notified of this event, and so has no chance to cleanup and close the underlying FileReader. We could refactor our for-loop to have an outer try/finally (or try-with-resources) to cleanup, but what do we call close() on? Any way we look at it, we end up having to push more of the responsibility for resource management onto clients and the initially appealing lines() method becomes quite cumbersome both to implement and to use. Yuck! This post is about a number of tweaks that could be made to Java to make these kinds of cases much simpler for both providers and consumers.

The first tweak is to let the for-each construct understand AutoCloseable. The idea is that for-each takes on some of the responsibilities of try-with-resources: if the iterator returned from the iterable also implements AutoCloseable, then the for-each loop guarantees that close() will be called exactly once on that iterator, no matter how the loop exits. This ensures that our simple client remains the same, but is now robust against early exits without needing to know anything about the underlying resources involved. The implementation can also become cleaner, as we no longer need to try to close the FileReader in various cases: the for-each will ensure that close() is called for us (even in error cases). Our LineIterator is then a little better:

    public static class LineIterator 
        implements Iterator<String>, AutoCloseable {

        private final BufferedReader in;
        private String next;

        public LineIterator(final BufferedReader in) {
            this.in = in;
        }

        @Override
        public boolean hasNext() {
            if (next == null) {
                try { next = in.readLine(); }
                catch (final IOException ex) {
                    throw new RuntimeException(ex);
                }
            }
            return (next != null);
        }

        @Override
        public String next() {
            try {
                if (hasNext()) {
                    return next;
                } else {
                    throw new NoSuchElementException();
                }
            }
            finally {
                next = null;
            }
        }

        @Override
        public void remove() {
            throw new UnsupportedOperationException("Not supported.");
        }

        @Override
        public void close() throws IOException {
            in.close();
        }
    }

2. Iterators with checked exceptions

The next step in our tidy up is to redefine the Iterator interface itself so that we don’t have to wrap those IOExceptions in generic RuntimeExceptions. The idea is simple: just provide an alternative Iterable/Iterator interface that allows checked exceptions, as follows. While we’re at it, we’ll factor out the optional remove() method into a separate interface (After all, why is remove granted special status? Why not insert?):

public interface CheckedIterator<T, E extends Exception> {
    boolean hasNext() throws E;
    T next() throws E;
}
public interface CheckedIterable<T, E extends Exception> {
    CheckedIterator<T, E> iterator();
}

The for-each construct can then be adjusted to also throw this type of exception when it is passed a CheckedIterable. This further simplifies our implementation:

    public static class LineIterator 
        implements CheckedIterator<String, IOException>,
                   AutoCloseable {

        private final BufferedReader in;
        private String next;

        public LineIterator(final BufferedReader in) {
            this.in = in;
        }

        @Override
        public boolean hasNext() throws IOException {
            if (next == null)
                next = in.readLine();
            return (next != null);
        }

        @Override
        public String next() throws IOException {
            try {
                if (hasNext()) {
                    return next;
                } else {
                    throw new NoSuchElementException();
                }
            }
            finally {
                next = null;
            }
        }

        @Override
        public void close() throws IOException {
            in.close();
        }
    }

Again, we have improved both the code and the interface. The implementation is simpler as we don’t have to catch and re-throw the checked exceptions. The interface is better as it now makes clear that IOExceptions can be thrown and should be handled.

3. Make throw an expression

I am personally a big fan of functional programming techniques. For an example, I’m a big fan of using the ternary operator instead of simple if statements. For one-liners I believe this makes the code terser and easier on the eye. For example, I would like to write the next() implementation above as follows:

public String next() throws IOException {
    try { 
        return hasNext() ? next 
                         : throw new NoSuchElementException(); 
    }
    finally { next = null; }
}

However, the above won’t compile, because throw is a statement and not an expression. After all, what would it’s return type be? Well, it turns out there’s a good answer to that question: any type at all! As it never returns, it can effectively have any type required by the context. So this is my next proposal: make throw an expression.

It turns out that we can actually write this using generics, even if it is a little cumbersome:

public static <T, E extends Exception> T raise(
        final Class<T> cls, final E ex) 
throws E {
    throw ex;
}
public String next() throws IOException {
    try { 
        return hasNext() 
               ? next 
               : raise(String.class, new NoSuchElementException()) 
    }
    finally { next = null; }
}

Just about workable, but a keyword would be better.

4. Multiple for-each

Before we show the final enhancement that will really clean up our implementation, let’s turn back to the consumer code. It may seem that this code could not be improved much, as it is already quite concise. However, there is one niggle there: the need to manually manage the line counter. Ideally, we would like the for-each loop to handle both iterating over the collection and keeping track of the loop counter. We could add a special case to the for-each construct for this, but there is a more general concept here: that of iterating over multiple collections in parallel. In this case, the second collection we wish to iterate over is the set of positive integers from 1 to the total number of lines in the file.

In order to achieve this, we propose that the for-each construct is extended to allow multiple iterators to be specified. On each loop, we fetch the next value from each iterator and assign it to the given variable and execute the loop body. The loop exits when any of the iterators is exhausted (calling close() on those that support it). Together with a utility class to generate iterators over sub-sets of the integers (code left as an exercise), we can then write our consumer code even more simply:

public void printListing(final TextFile file) throws IOException {
    for (final String line : file.lines();
         final int lineNum : Integers.from(1)) {
         System.out.printf("%5d %s%n", lineNum, line);
    }
}

In this case, the improvement is not massive. However, imagine we were concatenating lines from multiple files into a single output file. Here, the code is much simpler than the alternative:

try (final PrintWriter out = new ...) {
    for (final String lineA : fileA.lines();
         final String lineB : fileB.lines()) {
        out.writeln(lineA + " " + lineB);
    }
}

5. Generators

The final improvement is the biggest, both in terms of shift in programming model, and also in the radical reductions in line count that it enables. The idea is to adopt the notion of generators found in Python, Icon, Lua, and other programming languages. A generator (and it’s more general cousin, the coroutine) allows us to write the equivalent of an Iterator without having to break our logic into separate hasNext() and next() methods. Instead, we can write the iteration logic as a normal method which “returns” more than once. The Java runtime executes the method as normal, but when it encounters this new kind of return statement (usually called yield) it suspends execution and returns the new value. When the next value is requested, the JVM resumes the suspended method call and continues where it left off.

If you’re not familiar with generators from other languages, this may all sound like madness. How can a method return more than once? Well, let’s take a look at how our lines() example might look in a hypothetical Java with generators:

public Generator<String, IOException> lines() {
    return new Generator<String, IOException>() {
        public void generate() throws IOException {
            String line;
            try (final BufferedReader in : reader()) {
                while ((line = in.readLine()) != null)
                    yield(line);
            }
        }
    }
}

Wow! Look back at the first version of this code we wrote at the top of this post, and then look at this final version. The line count reduction is extraordinary. We no longer need two nested utility classes. We no longer need to catch and awkwardly convert checked exceptions. We no longer need to worry about early termination of the loop leaking our file descriptor. The code is much shorter and simpler: we simply open the file and then yield each line that we read from it, using a try-with-resources to ensure cleanup.

You may at this point be thinking “OK, sure, but doesn’t this require some exotic machinery in the JVM?” Well, the answer turns out to be “No!”. The JVM already contains all the exotic machinery needed, in the form of threads. (I deliberately made the code mimick the code for constructing a thread for this reason). So let’s look at what it takes to make this happen.

A generator consists of a producer thread communicating to the consumer via a BlockingQueue.

The figure shows the overall design. A generator consists of two parts: a producer and a consumer. The producer is the user-supplied code of the generator, which runs in a new thread and produces new values by calling yield(). The yield() method wraps up the code to push new values onto a BlockingQueue. The consumer is the Iterator (here CheckedIterator) interface of the generator. Calls to next() on this iterator interface consume elements off the queue and returns them to the caller. The remaining details concern how to handle exceptions in the generator body, and how to communicate early termination back to the caller. For the first problem, we can add an additional one-shot channel for communicating an exception to the consumer (via an AtomicReference). For the latter, we add a new BreakException that the consumer can use to communicate back to the producer.

Without further ado, here’s the code. Consider this code to be Copyright (C) 2012, Neil Madden, and released under the terms of the MIT License. (Note: you can convert this code to work with normal Iterators fairly simply, by discarding the exception handling code. I leave this as an exercise for the reader).

package com.wordpress.neilmadden;

import java.util.NoSuchElementException;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicReference;
import java.util.logging.Logger;

/**
 * Implementation of generator expressions for Java using background
 * threads.
 * 
 * @author Neil Madden
 */
public abstract class Generator<T, E extends Exception>
    implements CheckedIterable<T, E> {

    private static final Logger log = Logger.getLogger(Generator.class.getName());

    /**
     * Thread-local to hold the current generator task. Allows 
     * {@link #yield(java.lang.Object)} to determine which instance 
     * is yielding. This is purely an aesthetic choice: you could
     * alternatively pass the GeneratorTask into the generate method.
     */
    private final ThreadLocal<GeneratorTask> threadLocalTask = 
            new ThreadLocal<>();

    /**
     * The abstract implementation that each generator must 
     * implement. The body of the generator should call 
     * {@link #yield(java.lang.Object)} to communicate values to 
     * the consumer.
     * 
     * @throws E if an error occurs.
     */
    protected abstract void generate() throws E;

    /**
     * Called by the body of the generator to yield values to the 
     * consumer.
     * 
     * @param it the value to yield to the consumer.
     * @throws BreakException if the consumer has terminated early.
     */
    protected void yield(final T it) {
        log.fine("Yielding value: "  + it);
        threadLocalTask.get().yield(it);
    }

    @Override
    public CheckedIterator<T, E> iterator() {
        final GeneratorTask task = new GeneratorTask();
        log.fine("Starting generator task: " + task);

        // Start the generator in a new thread
        new Thread(task).start();

        return task;
    }

    /**
     * Exception used to indicate early termination of the consumer.
     */
    public static class BreakException extends RuntimeException {
        private static final long serialVersionUID = 1L;
    }

    /**
     * The core implementation of the generator task. Runs as a 
     * separate thread, reading values from the underlying generator,
     * and communicating them to the consumer thread through a 
     * {@link BlockingQueue}.
     */
    private class GeneratorTask implements Runnable, 
                                           CheckedIterator<T, E>,
                                           AutoCloseable {

        /**
         * Queue to use for communicating values from the producer 
         * (generator) to the consumer. We use an 
         * {@link ArrayBlockingQueue} to limit the amount of memory 
         * that the generator can use if it runs ahead of the
         * consumer.
         */
        private final BlockingQueue<T> queue = 
                new ArrayBlockingQueue<>(11);
        /**
         * Volatile flag to indicate when the generator is finished,
         * either because it has terminated normally, or because the
         * consumer has exited.
         */
        private volatile boolean isDone = false;
        /**
         * Atomic reference for communicating any exceptions thrown 
         * in the generator to the consuming code.
         */
        private final AtomicReference<E> exception =
                new AtomicReference<>();

        @Override
        @SuppressWarnings("unchecked")
        public void run() {
            // Install a reference to ourselves in the thread-local
            threadLocalTask.set(this);

            try {
                // Call the actual generator code
                generate();
            }
            catch (final BreakException ex) {
                log.fine("Early termination detected.");
            }
            catch (final RuntimeException ex) {
                throw ex;
            }
            catch (final Exception ex) {
                // Note: this *must* be an instance of E, but Java 
                // doesn't allow us to express that type constraint.
                exception.set((E) ex);
            }
            finally {
                log.fine("Generator task terminating: " + this);
                // Mark the task as completed
                isDone = true;

                // Remove the thread-local reference
                threadLocalTask.remove();
            }            
        }

        public void yield(final T it) {
            log.fine("Yielding value: " + it);
            while (!isDone) {
                try {
                    // Offer with timeout, so that we do not hang 
                    // forever in case of early termination.
                    if (queue.offer(it, 10, TimeUnit.MILLISECONDS)) {
                        return;
                    }
                }
                catch (final InterruptedException _) {
                    // Reset interrupted status so that callers can 
                    // detect this.
                    Thread.currentThread().interrupt();
                }
            }
            log.fine("Detected early termination");
            throw new BreakException();
        }

        @Override
        public void close() {
            isDone = true;
        }

        @Override
        public boolean hasNext() throws E {
            final E ex = exception.getAndSet(null);
            if (ex != null)
                throw ex;

            return !(isDone && queue.isEmpty());
        }

        @Override
        public T next() throws E {
            log.fine("Retrieving next element");
            while (hasNext()) {
                try {
                    // Poll with timeout so that we do not hang 
                    // forever in case the generator has died.
                    final T next = queue.poll(10, 
                                         TimeUnit.MILLISECONDS);
                    if (next != null) {
                        log.fine("Retrieved element: " + next);
                        return next;
                    }
                }
                catch (final InterruptedException _) {
                    // Reset interrupted status
                    Thread.currentThread().interrupt();
                }
            }
            throw new NoSuchElementException();
        }
    }
}

Discussion

The big bang in this article is the concept of generators. The other changes proposed are rather minor, incremental changes to the platform. Adding generators would be a bigger step. However, as shown, they can be implemented right now on top of threads, albeit in a less efficient manner than they could be implemented in the JVM itself. I believe that the code reduction that generators enable is a huge advantage, and greatly simplifies the construction of producer/consumer designs.

Some astute readers may be wondering how generators compare to lambda expressions. The approach using lambdas would be to allow lines() to accept a lambda which it calls with each line:

public void lines(final Function<String> f) throws IOException {
    String line;
    try (BufferedReader in = this.reader()) {
        while ((line = in.readLine()) != null)
            f.call(line);
    }
} 
// Usage:
public void printListing(final TextFile file) throws IOException {
    final int[] lineNum = new int[] {0};
    file.lines(line -> {
        System.out.printf("%5d %s%n", ++(lineNum[0]), line);
    });
}

This is both simple and expressive. There are a number of downsides however:

  • the caller must be refactored to pass a lambda (continuation-passing style);
  • we cannot use existing code that expects to use an Iterator or Iterable (also a problem for CheckedIterators, of course);
  • we cannot easily consume from more than one source at a time, as in the multiple-for-each statement (item 4 in this article).

The beauty of the generator approach is that it effectively performs this continuation-passing style (CPS) conversion for you. The final bullet is, to me, the real killer here. Consider our line-numbering example written using two generators:

public Generator<Integer,RuntimeException> intsFrom(final int from){
    return new Generator<Integer,RuntimeException>() {
        public void generate() {
            int next = from;
            while (true) { yield(next++); }
        }
    }
} 
public void printListing(final TextFile file) throws IOException {
    for (final String line : file.lines();
         final int lineNum : intsFrom(1)) {
        System.out.printf("%5d %s%n", lineNum, line);
    }
}

Now see if you can write this equivalently using lambdas (and consider the more complex cases). Finally, we can refactor the generator syntax to make use of lambda syntax too:

public Generator<String, IOException> lines() {
    return Generator.of({
        String line;
        try (BufferedReader in = reader()) {
            while ((line = in.readLine()) != null)
                yield(line);
        }
    });
}
About these ads

About Neil Madden

I am a senior Java developer at ForgeRock based in the UK. Previously, I have held positions at Atos and IBM, and was a Research Fellow at the University of Nottingham, where I also completed my PhD. I am interested in programming languages, software engineering, logic and artificial intelligence.
This entry was posted in Computing and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s