Tuesday, April 18, 2006

On the Details of Protected Access in Java

When I was working on my master thesis I spent most of my time in the Software Technology Lab of our department. This was really a gorgeous place to work, mostly due to a group of great fellow students. After finishing our master study, we all left this lab a few years ago and have a job now. One of the students I worked with first went to a research institute to work on maritime simulations and is now moving to a company that produces Real Software.

Of course, the problem is that universities do a terrible job at learning students how to write Real Software, so the first thing these companies do is give their employees some proper education. Real Software is written in Java, so the first step is to become a Sun Certified Programmer for the Java 2 Platform, aka SCJP.

This guy is now going through the SCJP materials, which covers Java at a surprising level of detail. Universities do at least one thing right: they stimulate you to think about the things you learn, instead of just accepting it all as-is. As a result, we have had some nice discussions about the details of Java, and he has only just started ;) .

Last week, we started talking about the details of protected access in Java. The study material seems to mention the detailed rules, but does not explain why the rules are the way they are. Strange enough, I could not find any good resource that does this, and I couldn't remember where I learned about this, so I decided to explain it myself.

Protected Access

The Java Language Specification has two subsections that define the rules for accessibility. The general rules are in subsection 6.6.1: Determining Accessibility and some of the more complicated rules for protected access are in subsection 6.6.2: Details on Protected Access. The rules of the first subsection are rather straightforward and don't need much explanation.

For protected members and constructors this subsection defines that the members are accessible if the access occurs from within the same package. This rule is clear, although many people seem to be surprised by this. However, the second case refers to subsection 6.6.2, where most of the confusion is about. This subsection defines the additional accessibility rules for protected access, to which most people are informally familiar as “a protected member is accessible from subclasses”. A simple example where this accessibility rule is applied:

package a;

public class A {
  protected int secret;
}
package b;

public class B extends a.A {
  void f() {
    secret = 5;
  }
}

However, the rules that define accessibility of protected members from subclasses are a bit more complex than you might expect. The problem is that the rule you know informally, is not that clear anymore if the access of the protected member is qualified (i.e. is applied to an object, not implicitly to this). Consider this simple example:

package a;

public class A {
  protected int secret;
}
package b;

public class B2 extends a.A {
  void f(a.A a) {
    a.secret = 5;
  }
}

In this example, the access to the protected instance field secret of A occurs from a subclass B of A, so according to our informal idea of protected access, this should be allowed. However, this example should make you feel a bit uncomfortable. Indeed, this is not allowed in Java. Let's take a look at what the compilers say:

$ javac b/B2.java
b/B2.java:5: secret has protected access in a.A
    a.secret = 5;
$ jikes b/B2.java
Found 1 semantic error compiling "b/B2.java":

     5.     a.secret = 5;
              ^----^
*** Semantic Error: The instance field "secret" in class "A" has protected access, but the qualifying expression is not of type "B2" or any of its enclosing types.
$ ecj b/B2.java
----------
1. ERROR in b/B2.java
 (at line 5)
        a.secret = 5;
        ^^^^^^^^
The field A.secret is not visible
----------

Side note: I was a bit surprised by the error report of ecj. This could be a bug: the protected field is visible but it is not accessible. The error report of jikes is by far the best.

Basically, if this access would be allowed, then you can access any protected field of any class, by just making a subclass of the class that declares the protected field. Hence, you could never really protect your protected fields if this kind of access would be allowed. Consider this example:

package a;

public class A {
  protected int secret;
}
package b;

public final class MySecurityHazard extends a.A {
}
package c;

public class C extends a.A {
  void f(b.MySecurityHazard b) {
    b.secret = 5;
  }
}

In this example the class MySecurityHazard has deliberately been declared to be final to avoid that the sensitive fields of this class can be accessed. However, according to our (now deprecated) informal knowledge of protected access, we can just create another subclass C of A that can be used to access the protected fields of MySecurityHazard.

How can we define the protected access that we would like to have? Of course, qualified access to protected members could be forbidden completely (and maybe that would have been a good idea), but Java is a bit more flexible, without introducing security problems. The basic problem of the unwanted access is that you start a new inheritance branch and access protected fields from there. So, the qualified access to protected fields should be restricted to the same inheritance branch as the object to which it is applied. This is exactly what the details of protected access are about. They define that access from a class S is permitted only if the type of the qualifier is S or a subclass of S. Let me now finally quote the specification:

Let C be the class in which a protected member m is declared. Access is permitted only within the body of a subclass S of C. In addition, if Id denotes an instance field or instance method, then:

  • If the access is by a qualified name Q.Id, where Q is an ExpressionName, then the access is permitted if and only if the type of the expression Q is S or a subclass of S.
  • If the access is by a field access expression E.Id, where E is a Primary expression, or by a method invocation expression E.Id(. . .), where E is a Primary expression, then the access is permitted if and only if the type of E is S or a subclass of S.

See JLS3, Section 6.6.2.1: Access to a protected Member

Note that these more specific rules only apply to instance members, not to static ones. If you are using a protected static field or method, make sure that you understand what you're doing: anyone will be able to access the field or method by just creating another subclass of the class that declares the protected field or method. It would not make any sense to impose these additional rules on static members, since all subclasses actually share the static member! So, while protected access on instance members could be used for security reasons, protected access on static members is only useful for hiding the member by restricting its accessibility to a part of a program where it is relevant.

If you read the rules carefully, then they are not too unclear. The real source of confusion seems to be that there is no motivation why these more complex rules are necessary. I hope this blog has solved that.