Sunday, April 24, 2005

Generics: The Importance of Wildcards

or "Why type erasure is not such a bad thing" or "Why generics in C# are not that good"

Last Friday, I read an article that has been on my to do list way too long: Adding Wildcards to the Java Programming Language. I've seen wildcards in Java; I've used wildcards in Java; and I've even read the Java Language Specification on wildcards, but did yet not get the essence of wildcards.

This paper makes the need for wildcards very clear and explains why the work on wildcards and parameterized types is novel. Unfortunately, generics are often ridiculed by functional programmers. They claim that their type systems have been more expressive for decades. Fortunately, this paper clearly explains the issues of introducing generics in an object oriented setting.

The problem with basic parameterized types is subtyping. For example, although Integer is a subclass of Number, a List<Integer> is not a subtype of a List<Number>. Hence, if a method requires a List<Number> as an argument, then you cannot pass a list of List<Integer> to it.

Why is a List<Integer> not a subclass of List<Number>? Well, this is related to covariance and contravariance. A type declaration is covariant if it allowed to be more specific. For example, return types are covariant. A method that is declared to return a Number can be overridden to be more specific and return an Integer. On the other hand, parameter types are contravariant: a method that is declared to accept a Integer argument can be implemented in a more general way by allowing all Numbers.

The problem with type parameters is that they are used in method parameters as well as return types. Hence, they are restricted to the intersection of covariance and contravariance: invariance. Thus, type parameters are invariant and a List<Integer> is not a subclass of List<Number>.

If Java would be restricted to basic parameterized types and methods, then it is quite difficult to come up with a good signature for a method that works on a List that contains any Number. In fact, you cannot even declare the type of lists with abitrary numbers! Allowing arbitrary numbers a generic method with a dummy type parameter for the 'real' Number, i.e.

      <T> void doSomething(List<T extends Number>) { ... }

This works, but it gets quite mind-boggling if the types get more complex ("The more interesting your types get, the less fun it is to write them down!" -- Benjamin C. Pierce). These types are not only hard to write down: it does not even work in all cases. For example, you cannot declare a field that contains arbitrary Numbers, since you cannot introduce a dummy type variable for a field.

Wildcards are a language feature that make it a bit more fun to write down these types. Actually, it is a language feature that is necessary to write down these types, since the dummy type variable is just a workaround and uses the type of the method to declare the type of the argument. The field example shows that you cannot write down the type itself. Wildcards are based on the notion of use-site variance. Using wildcards, you can declare that your list is covariant: List<? extends Number> or contravariant: List<? super Number>. For more details, read the paper!

Unfortunately, C# will not support wildcards or a similar mechanism. The implementation strategy does not allow the introduction of wildcards (generics are implemented in the runtime instead of by type erasure). This is a bit surprising, since the implementation strategy is often claimed to be superior. What disappoints me is that the designers of C# are not willing to admit that subtyping is an issue and that wildcards are a solution. See the weblog of Eric Gunnerson: Puzzling through Erasure II and the section on wildcards in JavaOne: Day One.

4 comments:

Ivan Tikhonov said...

Are all this stuff really usable?

For me it looks like object-oriented strictly typed languages (C++ clones lol) constantly get more and more complex syntax.

C# described in a 3-, may be 5-, year old book had much simpler syntax then C++. Now they are at least comparable.

Java going this way too. Looks like since 1.1.8 it got templates.

Personaly, i switched to agile languages few years ago. I like languages described in a few sentences.

Erlang pretty close to be good for me.

Lisp isn't an option because it's really hard behind it's illusory simplicity with all that functions to be used and untamed macros. Python and Perl all got too many methods/operators to remember. Ruby looks like Python clone with cool features for me.

I doesn't want to start flame war, i just want to get wider understanding of things. May be i missed something really important and going wrong way?

Steve Austin said...

Informative blog. I have a 2005 conference xhtml blog.

Aaron said...

Yes Ivan, this stuff is important.
Just a quick example, a while ago I implemeted a small GA framework for Java, using generics, and recently tried to port it to C#, but couldnt implement it as generic as I could in Java.
A snippet of Java code:

interface AlleleValue< T > {
T minValue();
T maxValue();
// Can have different bounds for same type //parameter T
T value();
}

interface Chromosome< T extends AlleleValue< ? > > {
T[] genes;
}
- So I know I can check upper and lower bounds on the Allele (Gene values) in each chromosome easily.

However, this is imposible in C#.
The only way is to have 2 type parameters for Chromosome:
interface Chromosome< T,V > where T : AlleleValue< V > <-- Here I would much prefer a wildcard!

In the Java version I can have a chromosome with some Integer genes, and some String genes mixed, but in the c# version only a single value type must be used for all alleles. Of course I ended up removing the type parameter from the AlleleValue interface having each method return object instead, so the code using the api is full of (potentially bad) class casts.... UGLY!

Aaron said...

Addendum:
Ivan, Java's generics are a completely different beast to C++'s templates.

Which leads to far less complications in most cases imho...

People seem fond of saying generics are making things too complicated, but in my experience, they result in much cleaner (and safer) code...