Monday, July 31, 2006

Java-front versus Jackpot, APT, Eclipse JDT

Recently, there have been a few interesting developments in standard support for open compilers and program transformation. For example, Sun released the annotation processing tools (APT) as part of the JDK5, which opens up Sun's Java compiler a bit. Also, there is Jackpot, a plugin for Netbeans for transformating Java code. The obvious question is how this relates to work that has been done in research on open compilers and program transformation.

Olivier Lefevre sent me an email to ask how the tree for Java provided by Jackpot and javac compares to our support for parsing and transforming Java in Java-front. The answer is probably useful in general, so I'll quote it here. Feel free to share your opinion in the comments!

As you may know, starting with Java 6 the Sun JDK will ship with an API to the AST: see jackpot api

Yes, Jackpot and APT are great projects. However, there is not yet a full API to the AST in the standard JDK, afaik. The compiler will be more 'open' in two different ways.

First, the current annotation processing tool (APT) is going to be combined with javac, but APT only provides access to the global structure of a Java source file and does not include the statement and expression level. Also, this API does not allow modification of the Java representation. APT is read-only: you can only generate new code.

Second, there is Jackpot, which is a rule-based language for transforming Java code. For Jackpot, the representation of Java used by javac has been opened and cleaned up a bit to make it more usable in external tools. However, this representation is not standardized and Sun recommends not to use stuff from com.sun.*. Afaik, Jackpot will be shipped as part of NetBeans and not as part of the JDK.

How does this compare to Java-front?

That's a good question. The answer depends on the application.

If you just need an AST for Java, then the advantage of the com.sun.source.tree AST is that you are absolutely sure that the AST conforms to javac, since the implementation is exactly the same. Of course, the same holds for ecj and the AST of Java that is provided by Eclipse (org.eclipse.jdt.core.dom.*). However, the grammar provides by Java-front is very good, so I don't expect any parsing problems. It has been tested and used heavily in the last few years and the development of this grammar has even resulted in a number of fixes in the JLS.

An advantage of Java-front is that it is a bit more language independent. Obviously, the Eclipse and Javac ASTs are to be used in Java. If you want to implement a transformation of Java in a different language, then you have to write an exporter. Java-front outputs ASTs in a language independent exchange format (ATerms), which can also be converted to XML. Of course, Java-front is most useful if you combine it with a language that is designed for program transformation and operates on ATerms, such as Stratego. One of the biggest advantages of Stratego is that it is very easy to do traversals over the AST: no tiresome visitors.

If you need more information about Java than can be defined in a context-free grammar, then you need more than just a parser. For more complex transformations (which includes simple refactorings), you'll probably need an implementation of disambiguation (reclassification) and qualification of names. A simple statement like System.out.println is already highly ambiguous with an analysis: is System a variable? a class? a package? Is out an inner class? a field? Most likely, you'll need type information as well. Java and Eclipse have the major advantage that you can safely assume that their type checkers are pretty good. For Jackpot, I suppose that there is some way to get type information (since type information can be used in Jackpot), but I from a quick scan I cannot figure out how to do this from the public API. For Java-front, there is an extension (Dryad) that supports type-checking and disambiguation, but this work is not yet complete. Using an existing compiler is of course a safer alternative. For experiments, the stuff provided by Dryad should be ok (we use it in our course on program transformation).

A different application is the implementation of Java language extensions. Javac and ECJ do not support this. The Java representation is open, but not extensible. Java-front uses a modular syntax definition formalism (SDF) that allows you to extend the grammar of Java in an almost trivial way. The strength of this approach is illustrated by the embedding of the Java syntax in Stratego (GPCE '02) and Java (GPCE '05), the applications of the grammar in MetaBorg (OOPSLA '04), and the modular extension of the grammar for the definition of the AspectJ syntax (OOPSLA '06). Of course, these applications are not really interesting if you are just interested in a Java program transformation tool, but it illustrates the reusability of such a syntax definition (as opposed to the grammars used by ecj, javac and most other parser generators). You'll need tools for pretty-printing as well. Outside of Eclipse, pretty-printing the JDT Core DOM is troublesome and mostly useful for debugging the output only. Inside Eclipse, the support for pretty-printing and preserving the layout of a program is of course excellent (see the existing implementations of refactoring). Jackpot provides a pretty-printer as well, but I don't know if it can be used outside NetBeans. Java-front provides the tool pp-java, which has been heavily tested and can insert parentheses in exactly the right places.

I am interesting in implementing small refactorings.

If you want to implement solid refactorings that could eventually even be deployed, then I would suggest to use an existing framework for refactoring, since there is much more to do than just getting an AST. A few years ago, I implemented an extract method refactoring in JRefactory, which was quite a useful experience. I suppose it's a bit obsolete now, since the refactoring market is dominated by refactorings directly supported by IDEs. You could consider Eclipse or NetBeans.

If your objective is to play a bit with program transformations and maybe even be a bit more adventurous by using real program transformation languages, then it might be nice to use Java-front and Stratego. Using Stratego is a major advantage of the tiresome implementation of traversals in Java (and most other languages).

Hope this helps :) Feel free to ask more questions if anything is unclear :)

Tuesday, April 18, 2006

On the Details of Protected Access in Java

When I was working on my master thesis I spent most of my time in the Software Technology Lab of our department. This was really a gorgeous place to work, mostly due to a group of great fellow students. After finishing our master study, we all left this lab a few years ago and have a job now. One of the students I worked with first went to a research institute to work on maritime simulations and is now moving to a company that produces Real Software.

Of course, the problem is that universities do a terrible job at learning students how to write Real Software, so the first thing these companies do is give their employees some proper education. Real Software is written in Java, so the first step is to become a Sun Certified Programmer for the Java 2 Platform, aka SCJP.

This guy is now going through the SCJP materials, which covers Java at a surprising level of detail. Universities do at least one thing right: they stimulate you to think about the things you learn, instead of just accepting it all as-is. As a result, we have had some nice discussions about the details of Java, and he has only just started ;) .

Last week, we started talking about the details of protected access in Java. The study material seems to mention the detailed rules, but does not explain why the rules are the way they are. Strange enough, I could not find any good resource that does this, and I couldn't remember where I learned about this, so I decided to explain it myself.

Protected Access

The Java Language Specification has two subsections that define the rules for accessibility. The general rules are in subsection 6.6.1: Determining Accessibility and some of the more complicated rules for protected access are in subsection 6.6.2: Details on Protected Access. The rules of the first subsection are rather straightforward and don't need much explanation.

For protected members and constructors this subsection defines that the members are accessible if the access occurs from within the same package. This rule is clear, although many people seem to be surprised by this. However, the second case refers to subsection 6.6.2, where most of the confusion is about. This subsection defines the additional accessibility rules for protected access, to which most people are informally familiar as “a protected member is accessible from subclasses”. A simple example where this accessibility rule is applied:

package a;

public class A {
  protected int secret;
}
package b;

public class B extends a.A {
  void f() {
    secret = 5;
  }
}

However, the rules that define accessibility of protected members from subclasses are a bit more complex than you might expect. The problem is that the rule you know informally, is not that clear anymore if the access of the protected member is qualified (i.e. is applied to an object, not implicitly to this). Consider this simple example:

package a;

public class A {
  protected int secret;
}
package b;

public class B2 extends a.A {
  void f(a.A a) {
    a.secret = 5;
  }
}

In this example, the access to the protected instance field secret of A occurs from a subclass B of A, so according to our informal idea of protected access, this should be allowed. However, this example should make you feel a bit uncomfortable. Indeed, this is not allowed in Java. Let's take a look at what the compilers say:

$ javac b/B2.java
b/B2.java:5: secret has protected access in a.A
    a.secret = 5;
$ jikes b/B2.java
Found 1 semantic error compiling "b/B2.java":

     5.     a.secret = 5;
              ^----^
*** Semantic Error: The instance field "secret" in class "A" has protected access, but the qualifying expression is not of type "B2" or any of its enclosing types.
$ ecj b/B2.java
----------
1. ERROR in b/B2.java
 (at line 5)
        a.secret = 5;
        ^^^^^^^^
The field A.secret is not visible
----------

Side note: I was a bit surprised by the error report of ecj. This could be a bug: the protected field is visible but it is not accessible. The error report of jikes is by far the best.

Basically, if this access would be allowed, then you can access any protected field of any class, by just making a subclass of the class that declares the protected field. Hence, you could never really protect your protected fields if this kind of access would be allowed. Consider this example:

package a;

public class A {
  protected int secret;
}
package b;

public final class MySecurityHazard extends a.A {
}
package c;

public class C extends a.A {
  void f(b.MySecurityHazard b) {
    b.secret = 5;
  }
}

In this example the class MySecurityHazard has deliberately been declared to be final to avoid that the sensitive fields of this class can be accessed. However, according to our (now deprecated) informal knowledge of protected access, we can just create another subclass C of A that can be used to access the protected fields of MySecurityHazard.

How can we define the protected access that we would like to have? Of course, qualified access to protected members could be forbidden completely (and maybe that would have been a good idea), but Java is a bit more flexible, without introducing security problems. The basic problem of the unwanted access is that you start a new inheritance branch and access protected fields from there. So, the qualified access to protected fields should be restricted to the same inheritance branch as the object to which it is applied. This is exactly what the details of protected access are about. They define that access from a class S is permitted only if the type of the qualifier is S or a subclass of S. Let me now finally quote the specification:

Let C be the class in which a protected member m is declared. Access is permitted only within the body of a subclass S of C. In addition, if Id denotes an instance field or instance method, then:

  • If the access is by a qualified name Q.Id, where Q is an ExpressionName, then the access is permitted if and only if the type of the expression Q is S or a subclass of S.
  • If the access is by a field access expression E.Id, where E is a Primary expression, or by a method invocation expression E.Id(. . .), where E is a Primary expression, then the access is permitted if and only if the type of E is S or a subclass of S.

See JLS3, Section 6.6.2.1: Access to a protected Member

Note that these more specific rules only apply to instance members, not to static ones. If you are using a protected static field or method, make sure that you understand what you're doing: anyone will be able to access the field or method by just creating another subclass of the class that declares the protected field or method. It would not make any sense to impose these additional rules on static members, since all subclasses actually share the static member! So, while protected access on instance members could be used for security reasons, protected access on static members is only useful for hiding the member by restricting its accessibility to a part of a program where it is relevant.

If you read the rules carefully, then they are not too unclear. The real source of confusion seems to be that there is no motivation why these more complex rules are necessary. I hope this blog has solved that.