Sunday, April 24, 2005

Generics: The Importance of Wildcards

or "Why type erasure is not such a bad thing" or "Why generics in C# are not that good"

Last Friday, I read an article that has been on my to do list way too long: Adding Wildcards to the Java Programming Language. I've seen wildcards in Java; I've used wildcards in Java; and I've even read the Java Language Specification on wildcards, but did yet not get the essence of wildcards.

This paper makes the need for wildcards very clear and explains why the work on wildcards and parameterized types is novel. Unfortunately, generics are often ridiculed by functional programmers. They claim that their type systems have been more expressive for decades. Fortunately, this paper clearly explains the issues of introducing generics in an object oriented setting.

The problem with basic parameterized types is subtyping. For example, although Integer is a subclass of Number, a List<Integer> is not a subtype of a List<Number>. Hence, if a method requires a List<Number> as an argument, then you cannot pass a list of List<Integer> to it.

Why is a List<Integer> not a subclass of List<Number>? Well, this is related to covariance and contravariance. A type declaration is covariant if it allowed to be more specific. For example, return types are covariant. A method that is declared to return a Number can be overridden to be more specific and return an Integer. On the other hand, parameter types are contravariant: a method that is declared to accept a Integer argument can be implemented in a more general way by allowing all Numbers.

The problem with type parameters is that they are used in method parameters as well as return types. Hence, they are restricted to the intersection of covariance and contravariance: invariance. Thus, type parameters are invariant and a List<Integer> is not a subclass of List<Number>.

If Java would be restricted to basic parameterized types and methods, then it is quite difficult to come up with a good signature for a method that works on a List that contains any Number. In fact, you cannot even declare the type of lists with abitrary numbers! Allowing arbitrary numbers a generic method with a dummy type parameter for the 'real' Number, i.e.

      <T> void doSomething(List<T extends Number>) { ... }

This works, but it gets quite mind-boggling if the types get more complex ("The more interesting your types get, the less fun it is to write them down!" -- Benjamin C. Pierce). These types are not only hard to write down: it does not even work in all cases. For example, you cannot declare a field that contains arbitrary Numbers, since you cannot introduce a dummy type variable for a field.

Wildcards are a language feature that make it a bit more fun to write down these types. Actually, it is a language feature that is necessary to write down these types, since the dummy type variable is just a workaround and uses the type of the method to declare the type of the argument. The field example shows that you cannot write down the type itself. Wildcards are based on the notion of use-site variance. Using wildcards, you can declare that your list is covariant: List<? extends Number> or contravariant: List<? super Number>. For more details, read the paper!

Unfortunately, C# will not support wildcards or a similar mechanism. The implementation strategy does not allow the introduction of wildcards (generics are implemented in the runtime instead of by type erasure). This is a bit surprising, since the implementation strategy is often claimed to be superior. What disappoints me is that the designers of C# are not willing to admit that subtyping is an issue and that wildcards are a solution. See the weblog of Eric Gunnerson: Puzzling through Erasure II and the section on wildcards in JavaOne: Day One.

Tuesday, April 19, 2005

Java Surprise 3: The Return of the Class

First of all: good news! The final version of The Java Language Specification, Third Edition is now available online! The specification has been improved considerably since the latest draft. Gilad Bracha seems to be responsible for the bulk of the work, which is a tough job. I think that the result is pretty good, although I'm afraid that I will keep bothering him with comments and requests for clarification ;).

Now back to the issue of this post. First of all: I have no idea how well-known the issue in this post is. I didn't know it, but it might actually be quite well-known. I have some references to previous discussions on this issue at the end of the post.

First, I want to say something about what influences the return type of a method in Java. Before Java 1.5, the return type of a method was just the plain return type specified in the method declaration. In other words, the return type did not depend on anything.

Java 1.5 introduces parameterized types and generic methods. The return type of a method can now also include type variables. This makes the return type dependent on the values of the type variables that occur in it. The type variables can have two different scopes: the class of the method or just the method itself, which makes it a generic method. So, the actual return type of a method now also depends on the value of these type variables.

However, there is one method in the Java library that does not return what it declares to return and needs another dependency. Indeed, there is an additional factor that influences the return type of this method.

The method I'm talking about is Object.getClass(), which returns the class of an object. In Java 1.5, Class itself is parameterized with the type that it represents. For example, the Class for String is Class<String>. The question is: what should the type parameter of the Class returned by Object.getClass() be? Well, at the declaration of the method we basically know nothing, and that is indeed the declared return type: a wildcard (unknown type) with a very general bounds: the type must extend Object.

   public final Class<? extends Object> getClass()

However, let's take a look at a piece of code where getClass is invoked. Assuming that getClass returns what it claims to return, we cannot declare c to be of a more specific type, for example Class<List>. We must declare it with a very general value for the type parameter: a wildcard.

   List<String> list = ...;
   Class<?> c = list.getClass();

This is unfortunate, since we actually know more about the type parameter of Class. We know at the invocation site that it is a List, but of course we cannot declare that in the return type of getClass in this way! So, we would like to let the return type of getClass dependent on the static type of the Object on which the method is invoked. In this way, the variable c could be declared to be of type Class<List>.

The developers of the Java specification decided to make the return type of this method a special case. That is, the Java Language Specification defines that an invocation of the getClass method must be treated in special way. In other words, the return type of the method is different from the one declared on the source code. The bounds of the Class returned by Object.getClass() is changed by the specification to the static type of the expression on which the method getClass is invoked. This is a useful feature, but it is a pity that this return type cannot be declared!

This post is getting way too long, but I would like to relate this to the implicit this argument of methods in object-oriented languages. For ordinary method arguments, you can declare types, which might include type variables. These type variables can influence the return type of the method. This is more or less what we want, but now we need this for our implicit this argument. I'm not sure if a solution in this direction is more attractive, but there is some link ... Are there more methods whose return type we would like to dependent on the static type of the object at the invocation site? If so, then this should not be supported by the language itself. Unfortunately, I cannot think of an example at the moment ;) .

There is even more to tell about this getClass method, since the type parameter of the Class is not the static type of the subject expression, but the erased variant of it. Maybe I'll make that a future post ...

Some references to related discussions:

Sunday, April 17, 2005

Moby: I Like It

I just read in Moby's Journal that his new album Hotel is doing extraordinary well in Europe. I'm happy about that: I have a lot of respect for the way Moby lives and truely appreciate his music.

However, I did not expect this album to be a best-seller. When I first listened to Hotel, I was quite surprised: Moby is singing on most of the tracks. I think he has a wonderful voice for the kind of music he makes nowadays, but a lot of people (well, at least according to the opinions at Amazonon) seem to dislike his vocal contribution to the songs. I cannot disagree with them if they claim that his voice isn't really beautiful. However, it's loaded with emotion, love, and sometimes compassion, which makes his current work stand out.

My favourite track? Well, it depends on my mood, but usually it is Dream about me. Yes, I'm a sensitive fool ;) . Where you end is a great song as well. If you like Hotel, you might also like some previous tracks that feature Moby singing. My favourites are New Dawn Fades (from the album I Like to Score, or even better: Live on the Play DVD) and Stay and Afterlife from the album 18 B-sides.

I can almost hear you think: "WTF? A non-technical post?". Yes, I'm sorry ;) . If you don't understand the dual meaning of the title of this post, then go and buy the album :P .

Wednesday, April 06, 2005

Java Surprise 2: Motivation

In the previous posts I showed that the priority of a cast to a reference type is different from the cast to a primitive type. Martijn Vermaat asked me why the designers of the Java language made this decision. Of course, they have good reasons for design decision, but still the decision is questionable, especially now we have autoboxing.

Let's take a look at this example from the original post:

$ echo "(Integer) - 2" | parse-java -s Expr | aterm2xml --implicit
<Minus>
  <ExprName><Id>Integer</Id></ExprName>
  <Lit><Deci>2</Deci></Lit>
</Minus>

If no priorities where defined in the Java language, then this expression would be ambiguous. I can illustrate this by parsing the same expression using a Java grammar that does not declare priorities. I'm using the SGLR parser for this, which is capable of producing a parse forest (multiple parse trees) if an input is ambiguous. The alternatives are represented by an amb element with 2 or more children.

$ "(Integer) - 2" | sglri -p JavaAmb.tbl | aterm2xml --implicit
<amb>
  <Minus>
    <ExprName>
      <Id>Integer</Id>
    </ExprName>
    <Lit>
      <Deci>2</Deci>
    </Lit>
  </Minus>
  <CastRef>
    <ClassOrInterfaceType>
      <TypeName>
        <Id>Integer</Id>
      </TypeName>
    </ClassOrInterfaceType>
    <Minus>
      <Lit>
        <Deci>2</Deci>
      </Lit>
    </Minus>
  </CastRef>
</amb>

This clearly shows that the input is ambiguous: the first alternative is the binary operator (which is the alternative chosen by the Java language) and the other alternative is a cast to a reference type.

However, the cast to an int is not ambiguous, since int is a reserved keyword, thus forbidden as an identifier. So, for this input there is only a single parse option, even in the ambiguous version of Java.

$ echo "(int) - 2" | sglri -p JavaAmb.tbl | aterm2xml --implicit
<CastPrim>
  <Int/>
  <Minus>
    <Lit><Deci>2</Deci></Lit>
  </Minus>
</CastPrim>

The ambiguity in the first example has to be resolved. So, what should the language designer do? Prefer the cast, or prefer the binary minus? Well, that decision is not very hard: in the first example, the (Integer) is a parenthesized expression, where the expression is the variable Integer. If we ignore this actual value (since it is quite distracting), then the structure of the expression is ( Expression ) - Expression. You will recognize the need for this pattern, since the expression (a * b) - c has exactly the same structure!

The cast to a primitive type does not have the ambiguity problem, since all primitives types are keywords and all keywords are forbidden as identifiers. So, there is no reason to disallow this a primitive cast at this location and for this reason the language designers changed the priority of the primitive cast.

Are there alternatives? Yes, there are, but they are not very attractive either. First, a parenthesized expression name could be forbidden. Using parentheses for a plain identifier (or a qualified name) does not make a lot of sense. Another option is disallow casts to primitive types at this location. This can be annoying, but it makes things more clear and consistent.

Of course, having two different production rules for casts is not attractive. It's just a single language construct, so it should be defined by a single production as well. I wonder what the language designers would have done if autoboxing was already included in the first version of Java, since autoboxing makes this distinction between a reference cast and a primitive cast visible.

Tuesday, April 05, 2005

Java Surprise 2: Another Example

While browsing through the micro testsuites of Java-front, I was remembered of another typical example:

int x = (int) ++y;
int x = (Integer) ++y;

The first statement is allowed. The second is not.

(See the first post on Java Surprise 2 for an explanation)

Java Surprise 2: Cast Priority

I promised a Java Surprise series. Thanks to my full-time job this promise is not hard to remember: every few days there is a fresh surprise for me ;) . The second surprise in this series is actually one a discovered last summer, so I'm cheating a bit. If you know me in real life, then I've probably already bothered you with this one.

First of all: please take a seat. Are you sitting comfortably? Excellent. Did you know that the syntactical priority of a cast to a primitive type is different from the cast to a reference type? Well, it is. Most likely, you will never encounter this, but it is not hard to find an example that will surprise you.

You are probably familiar with autoboxing in Java 1.5. In short, autoboxing can convert primitive types (such as int) to reference types (such as Integer) for you if necessary. Hence, you can assign an int to an Integer and you can also cast an int to an Integer. Some (correct) statements:

Integer x = 3;
Integer y = (Integer) 3;
int z = (Integer) 3;

I'm going to abuse your familiarity with autoboxing to show how weird it can be that the priority of primitive casts is different from reference casts. The following program is a correct program that includes a (redundant) cast to an int.

public class JavaSurprise2 {
  public static void main(String[] ps) {
    int y = (int) - 2;
    System.out.println(String.valueOf(y));
  }
}

Compile and run:

martin@logistico:~/tmp> javac JavaSurprise2.java
martin@logistico:~/tmp> java JavaSurprise2
-2

Well, that looks great. Now, let's replace the int with an Integer.

public class JavaSurprise2 {
  public static void main(String[] ps) {
    int y = (Integer) - 2;
    System.out.println(String.valueOf(y));
  }
}

Compile ...

martin@logistico:~/tmp> javac JavaSurprise2.java
JavaSurprise2.java:4: cannot find symbol
symbol  : variable Integer
location: class JavaSurprise2
    int y = (Integer) - 2;
             ^
JavaSurprise2.java:4: illegal start of type
    int y = (Integer) - 2;
            ^
2 errors

What the heck? Cannot find symbol? Let's give it a symbol ...

public class JavaSurprise2 {
  public static void main(String[] ps) {
    int Integer = 3;
    int y = (Integer) - 2;
    System.out.println(String.valueOf(y));
  }
}

Compile and run ...

martin@logistico:~/tmp> javac JavaSurprise2.java
martin@logistico:~/tmp> java JavaSurprise2
1

So, what happens? Of course the compiler is right. As I said in the beginning, the priority of a cast to a primitive type is different from a reference type. Because of the priorities defined in the Java Language Specification, the int example is parsed as a cast. However, the Integer version is parsed to an expression name: an expression that can be referred to using a name (aka variable). The Java compiler will never come back to this decision to make it a cast anyway: syntactical choices are always committed.

I can illustrate these different parses using Java-front, a package that provides a Java parser that is generated from a declarative syntax definition for Java in SDF (yes, I'm the developer: marketing intended ;) )

martin@logistico:~/tmp> echo "(Integer) - 2" | parse-java -s Expr
Minus(ExprName(Id("Integer")),Lit(Deci("2")))

martin@logistico:~/tmp> echo "(int) - 2" | parse-java -s Expr
CastPrim(Int,Minus(Lit(Deci("2"))))

Or in terms of XML:

martin@logistico:~/tmp> echo "(Integer) - 2"
    | parse-java -s Expr | aterm2xml --implicit
<Minus>
  <ExprName><Id>Integer</Id></ExprName>
  <Lit><Deci>2</Deci></Lit>
</Minus>

martin@logistico:~/tmp> echo "(int) - 2"
    | parse-java -s Expr | aterm2xml --implicit
<CastPrim>
  <Int/>
  <Minus>
    <Lit><Deci>2</Deci></Lit>
  </Minus>
</CastPrim>

Surprised? You'd better be!