In the previous posts I showed that the priority of a cast to a reference type is different from the cast to a primitive type. Martijn Vermaat asked me why the designers of the Java language made this decision. Of course, they have good reasons for design decision, but still the decision is questionable, especially now we have autoboxing.
Let's take a look at this example from the original post:
$ echo "(Integer) - 2" | parse-java -s Expr | aterm2xml --implicit <Minus> <ExprName><Id>Integer</Id></ExprName> <Lit><Deci>2</Deci></Lit> </Minus>
If no priorities where defined in the Java language, then this
expression would be ambiguous. I can illustrate this by parsing the
same expression using a Java grammar that does not declare
priorities. I'm using the SGLR parser for this,
which is capable of producing a parse forest (multiple parse trees) if
an input is ambiguous. The alternatives are represented by an
amb element with 2 or more children.
$ "(Integer) - 2" | sglri -p JavaAmb.tbl | aterm2xml --implicit <amb> <Minus> <ExprName> <Id>Integer</Id> </ExprName> <Lit> <Deci>2</Deci> </Lit> </Minus> <CastRef> <ClassOrInterfaceType> <TypeName> <Id>Integer</Id> </TypeName> </ClassOrInterfaceType> <Minus> <Lit> <Deci>2</Deci> </Lit> </Minus> </CastRef> </amb>
This clearly shows that the input is ambiguous: the first alternative is the binary operator (which is the alternative chosen by the Java language) and the other alternative is a cast to a reference type.
However, the cast to an
int is not ambiguous,
int is a reserved keyword, thus forbidden as an
identifier. So, for this input there is only a single parse option,
even in the ambiguous version of Java.
$ echo "(int) - 2" | sglri -p JavaAmb.tbl | aterm2xml --implicit <CastPrim> <Int/> <Minus> <Lit><Deci>2</Deci></Lit> </Minus> </CastPrim>
The ambiguity in the first example has to be resolved. So, what should
the language designer do? Prefer the cast, or prefer the binary minus?
Well, that decision is not very hard: in the first example, the
(Integer) is a parenthesized expression, where the
expression is the variable
Integer. If we ignore this
actual value (since it is quite distracting), then the structure of
the expression is
( Expression ) - Expression. You will
recognize the need for this pattern, since the expression
(a * b) - c has exactly the same structure!
The cast to a primitive type does not have the ambiguity problem, since all primitives types are keywords and all keywords are forbidden as identifiers. So, there is no reason to disallow this a primitive cast at this location and for this reason the language designers changed the priority of the primitive cast.
Are there alternatives? Yes, there are, but they are not very attractive either. First, a parenthesized expression name could be forbidden. Using parentheses for a plain identifier (or a qualified name) does not make a lot of sense. Another option is disallow casts to primitive types at this location. This can be annoying, but it makes things more clear and consistent.
Of course, having two different production rules for casts is not attractive. It's just a single language construct, so it should be defined by a single production as well. I wonder what the language designers would have done if autoboxing was already included in the first version of Java, since autoboxing makes this distinction between a reference cast and a primitive cast visible.