2008-12-04

Why the JVM Spec defines checkcast for interface types

I'm working on the specification of pointer analysis for Java using Datalog. Basically, a pointer analysis computes for each variable in a program the set of objects it may point to at run-time.

For this purpose I need to express parts of the JVM Spec in Datalog as well. As a simple example, the following Datalog rules define when a class is a subclass of another class.

/**
 * JVM Spec:
 * - A class A is a subclass of a class C if A is a direct 
 *   subclass of C
 */
Subclass(?c, ?a) <-
  DirectSubclass[?a] = ?c.

/**
 * JVM Spec:
 * - A class A is a subclass of a class C if there is a direct
 *   subclass B of C and class A is a subclass of B
 */
Subclass(?c, ?a) <-
  Subclass(?b, ?a),
  DirectSubclass[?b] = ?c.

As you can see, this is remarkably close to the original specification (quoted in comments). You can clearly see the relationship between the spec and the code, even if you are not familiar with Datalog.

Recently, I was working on the specification of the checkcast instruction. This instruction performs the run-time check if an object can be cast to some type. The JVM Spec for checkcast first defines some variables:

The following rules are used to determine whether an objectref that is not null can be cast to the resolved type: if S is the class of the object referred to by objectref and T is the resolved class, array, or interface type, checkcast determines whether objectref can be cast to type T as follows:

So, this basically says that we're checking the cast (T) S.

The first rule for this cast is straightforward:

If S is an ordinary (nonarray) class, then:
  • If T is a class type, then S must be the same class as T, or a subclass of T.
  • If T is an interface type, then S must implement interface T.
Well, if you're somewhat familiar with Java, or object-oriented programming, then this part is obvious. Again, the specification in Datalog is easy:
CheckCast(?s, ?s) <-
  ClassType(?s).

CheckCast(?s, ?t) <-
  Subclass(?t, ?s).

CheckCast(?s, ?t) <-
  ClassType(?s),
  Superinterface(?t, ?s).
However, the next alternative in the specification is confusing:
If S is an interface type, then:
  • If T is a class type, then T must be Object.
  • If T is an interface type, then T must be the same interface as S or a superinterface of S.

The specification is crystal clear, but how can S ever be an interface type? S is the type of the object that is being cast, and how can an object ever have a run-time type that is an interface? Of course, the static type of an expression can be an interface, but we're talking about the run-time here!

I searched the web, which only resulted in a few hits. There was one question on a Sun forum years ago, where the one answer didn't make a lot of sense.

It turns out that this is indeed an `impossible' case. The reason why this item is in the specification, is because checkcast is recursively defined for arrays:

If S is a class representing the array type SC[], that is, an array of components of type SC, then:
  • ...
  • If T is an array type TC[], that is, an array of components of type TC, then one of the following must be true:
    • ...
    • TC and SC are reference types, and type SC can be cast to TC by recursive application of these rules.

So, if you have an object of type List[] that is cast to an Collection[], then the rules for checkcast get recursively invoked for the types S = List and T = Collection. Notice that List is an interface, but an object can have type List[] at run-time. If have not verified this with the JVM Spec maintainers, but as far as I can see, this is the only reason why the rule for interface types is there.

Just to show a little bit more of my specifications, here is the rule for the array case I just quoted from the JVM Spec:

CheckCast(?s, ?t) <-
  ComponentType[?s] = ?sc,
  ComponentType[?t] = ?tc,
  ReferenceType(?sc),
  ReferenceType(?tc),
  CheckCast(?sc, ?tc).

Isn't it beautiful how this exactly corresponds to the formal specification?

Unfortunately, even formal specifications can have errors, so I also specified a large testsuite that checks the specifications with concrete code. Here are some of the tests for CheckCast.

test Casting to self
  using database tests/hello/Empty.jar
  assert
    CheckCast("java.lang.Integer", "java.lang.Integer")

test Casting to superclasses
  using database tests/hello/Empty.jar
  assert
    CheckCast("java.lang.Integer", "java.lang.Number")
    CheckCast("java.lang.Integer", "java.lang.Object")

test Cast ArrayList to various superinterfaces
  using database tests/hello/Arrays.jar
  assert
    CheckCast("java.util.ArrayList", "java.util.List")
    CheckCast("java.util.ArrayList", "java.util.Collection")
    CheckCast("java.util.ArrayList", "java.io.Serializable")

test Cast class[] to implemented interface[]
  using database tests/hello/Arrays.jar
  assert
    CheckCast("java.util.ArrayList[]", "java.util.List[]")
    CheckCast("java.lang.Integer[]", "java.io.Serializable[]")

test Cast interface[] to superinterface[]
  using database tests/hello/Arrays.jar
  assert
    CheckCast("java.util.List[]", "java.util.Collection[]")

The tests are specified in a little domain-specific language for unit-testing Datalog that I implemented, initially for IRIS and later for LogicBlox. This tool is similar to parse-unit, a tool I wrote earlier for testing parsers in Stratego/XT. The concise syntax of a test encourages you to write a lot of tests. Domain-specific languages rock for this purpose!