<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6943366</id><updated>2011-08-24T15:03:56.372+02:00</updated><category term='php-front'/><category term='grammar engineering tools'/><category term='stringborg'/><title type='text'>Subject to Meta Programming</title><subtitle type='html'>Syntax Definition, Domain-Specific Languages, Language Specifications, Program Transformation</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>34</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6943366.post-7674963467625906327</id><published>2008-12-04T14:58:00.003+01:00</published><updated>2008-12-04T15:03:48.121+01:00</updated><title type='text'>Why the JVM Spec defines checkcast for interface types</title><content type='html'>&lt;p&gt;
I'm working on the specification of pointer analysis for Java using Datalog. Basically, a pointer analysis computes for each variable in a program the set of objects it may point to at run-time.
&lt;/p&gt;

&lt;p&gt;
For this purpose I need to express parts of the JVM Spec in Datalog as well. As a simple example, the following Datalog rules define when a class is a subclass of another class.
&lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
/**
 * JVM Spec:
 * - A class A is a subclass of a class C if A is a direct 
 *   subclass of C
 */
Subclass(?c, ?a) &amp;lt;-
  DirectSubclass[?a] = ?c.

/**
 * JVM Spec:
 * - A class A is a subclass of a class C if there is a direct
 *   subclass B of C and class A is a subclass of B
 */
Subclass(?c, ?a) &amp;lt;-
  Subclass(?b, ?a),
  DirectSubclass[?b] = ?c.
&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;
As you can see, this is remarkably close to the original specification (quoted in comments). You can clearly see the relationship between the spec and the code, even if you are not familiar with Datalog.
&lt;/p&gt;

&lt;p&gt;
Recently, I was working on the specification of the &lt;code&gt;checkcast&lt;/code&gt; instruction. This instruction performs the run-time check if an object can be cast to some type. The &lt;a href="http://java.sun.com/docs/books/jvms/second_edition/html/Instructions2.doc2.html"&gt;JVM Spec&lt;/a&gt; for checkcast first defines some variables:
&lt;/p&gt;

&lt;blockquote&gt;
  The following rules are used to determine whether an objectref that
  is not null can be cast to the resolved type: if S is the class of
  the object referred to by objectref and T is the resolved class,
  array, or interface type, checkcast determines whether objectref can
  be cast to type T as follows:
&lt;/blockquote&gt;

&lt;p&gt;
So, this basically says that we're checking the cast &lt;code&gt;(T)
S&lt;/code&gt;.
&lt;/p&gt;

&lt;p&gt;
The first rule for this cast is straightforward:
&lt;/p&gt;

&lt;blockquote&gt;
If S is an ordinary (nonarray) class, then:
&lt;ul&gt;
  &lt;li&gt;If T is a class type, then S must be the same class as T, or a
  subclass of T.&lt;/li&gt;
  &lt;li&gt;If T is an interface type, then S must implement interface
  T.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

Well, if you're somewhat familiar with Java, or object-oriented
programming, then this part is obvious. Again, the specification in
Datalog is easy:

&lt;blockquote&gt;
&lt;pre&gt;
CheckCast(?s, ?s) &amp;lt;-
  ClassType(?s).

CheckCast(?s, ?t) &amp;lt;-
  Subclass(?t, ?s).

CheckCast(?s, ?t) &lt;-
  ClassType(?s),
  Superinterface(?t, ?s).
&lt;/pre&gt;
&lt;/blockquote&gt;

However, the next alternative in the specification is confusing:

&lt;blockquote&gt;
If S is an interface type, then:
&lt;ul&gt;
  &lt;li&gt;If T is a class type, then T must be Object.&lt;/li&gt;

  &lt;li&gt;If T is an interface type, then T must be the same interface as
  S or a superinterface of S.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;
The specification is crystal clear, but how can S ever be an interface
type? S is the type of the object that is being cast, and how can an
object ever have a run-time type that is an interface? Of course, the
static type of an expression can be an interface, but we're talking
about the run-time here!
&lt;/p&gt;

&lt;p&gt;
I &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=checkcast+%22If+S+is+an+interface+type%22"&gt;searched
the web&lt;/a&gt;, which only resulted in a few hits. There was one &lt;a href="http://forums-beta.sun.com/thread.jspa?messageID=4335864&amp;amp;tstart=0"&gt;question on a Sun forum&lt;/a&gt; years ago, where the one answer didn't make a lot of sense.
&lt;/p&gt;

&lt;p&gt;
It turns out that this is indeed an `impossible' case. The reason why
this item is in the specification, is because checkcast is recursively
defined for arrays:
&lt;/p&gt;

&lt;blockquote&gt;
If S is a class representing the array type SC[], that is, an array of
components of type SC, then:
&lt;ul&gt;
  &lt;li&gt;...&lt;/li&gt;
  &lt;li&gt;If T is an array type TC[], that is, an array of components of
  type TC, then one of the following must be true:
  &lt;ul&gt;
    &lt;li&gt;...&lt;/li&gt;
    &lt;li&gt;TC and SC are reference types, and type SC can be cast to TC
    by recursive application of these rules.&lt;/li&gt;
  &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;
So, if you have an object of type &lt;code&gt;List[]&lt;/code&gt; that is cast to
an &lt;code&gt;Collection[]&lt;/code&gt;, then the rules for checkcast get
recursively invoked for the types &lt;code&gt;S = List&lt;/code&gt; and &lt;code&gt;T =
Collection&lt;/code&gt;. Notice that List is an interface, but an object can
have type List[] at run-time. If have not verified this with the JVM
Spec maintainers, but as far as I can see, this is the only reason why
the rule for interface types is there.
&lt;/p&gt;

&lt;p&gt;
Just to show a little bit more of my specifications, here is the rule
for the array case I just quoted from the JVM Spec:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
CheckCast(?s, ?t) &amp;lt;-
  ComponentType[?s] = ?sc,
  ComponentType[?t] = ?tc,
  ReferenceType(?sc),
  ReferenceType(?tc),
  CheckCast(?sc, ?tc).
&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;
Isn't it beautiful how this &lt;em&gt;exactly&lt;/em&gt; corresponds to the formal
specification?
&lt;/p&gt;

&lt;p&gt;
Unfortunately, even formal specifications can have errors, so I also
specified a large testsuite that checks the specifications with
concrete code. Here are some of the tests for CheckCast.
&lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
test Casting to self
  using database tests/hello/Empty.jar
  assert
    CheckCast("java.lang.Integer", "java.lang.Integer")

test Casting to superclasses
  using database tests/hello/Empty.jar
  assert
    CheckCast("java.lang.Integer", "java.lang.Number")
    CheckCast("java.lang.Integer", "java.lang.Object")

test Cast ArrayList to various superinterfaces
  using database tests/hello/Arrays.jar
  assert
    CheckCast("java.util.ArrayList", "java.util.List")
    CheckCast("java.util.ArrayList", "java.util.Collection")
    CheckCast("java.util.ArrayList", "java.io.Serializable")

test Cast class[] to implemented interface[]
  using database tests/hello/Arrays.jar
  assert
    CheckCast("java.util.ArrayList[]", "java.util.List[]")
    CheckCast("java.lang.Integer[]", "java.io.Serializable[]")

test Cast interface[] to superinterface[]
  using database tests/hello/Arrays.jar
  assert
    CheckCast("java.util.List[]", "java.util.Collection[]")
&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;
The tests are specified in a little domain-specific language for
unit-testing Datalog that I implemented, initially for &lt;a href="http://www.iris-reasoner.org"&gt;IRIS&lt;/a&gt; and later for &lt;a href="http://www.logicblox.com"&gt;LogicBlox&lt;/a&gt;. This tool is similar to
&lt;a href="http://releases.strategoxt.org/strategoxt-manual/unstable/manual/chunk-chapter/tutorial-sdf.html#sdf-unit-testing"&gt;parse-unit&lt;/a&gt;,
a tool I wrote earlier for testing parsers in &lt;a href="http://www.strategoxt.org"&gt;Stratego/XT&lt;/a&gt;. The concise syntax
of a test encourages you to write a lot of tests. Domain-specific
languages rock for this purpose!
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-7674963467625906327?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/7674963467625906327/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=7674963467625906327' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/7674963467625906327'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/7674963467625906327'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2008/12/why-jvm-spec-defines-checkcast-for.html' title='Why the JVM Spec defines checkcast for interface types'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-7964712993697270374</id><published>2008-07-01T06:49:00.004+02:00</published><updated>2008-07-01T08:02:25.580+02:00</updated><title type='text'>New Conference on Software Language Engineering</title><content type='html'>&lt;p&gt;
This year is special. There is a new and exciting conference: the &lt;a href="http://planet-sl.org/sle2008/"&gt;International Conference on Software Language Engineering (SLE)&lt;/a&gt;. The deadline for submission of papers is July 14th, which is coming up soon! Before I start raving about the topics covered by this conference, here is the disclaimer: I'm on the program committee of this conference, and as such I believe it's my duty to advertise the conference.
&lt;/p&gt;

&lt;p&gt;
Anyway, if done right, this conference has the potential to become a major and prestigious conference. The conference fills a clear gap: the topics of software language engineering do not exactly fit in major programming language conferences like OOPSLA, PLDI, POPL, and ECOOP.  Nor do they fit exactly in the area of compiler construction (CC). CC does typically not accept more engineering or methodology-oriented papers. For OOPSLA and ECOOP the work more or less has to be in the context of object-oriented programming, for POPL it immediately has to be a principle (whatever that is), and for PLDI there are usually just a few slots available for papers that don't do something with memory management, garbage collection, program analysis, or concurrency. Personally, I've been pretty successful at getting papers in the area of software language engineering accepted at OOPSLA, but a full conference devoted to this topic is much better!
&lt;/p&gt;

&lt;p&gt;
Another reason why I think that this conference has a lot of potential is that if I look at the list of topics of interest in the &lt;a href="http://planet-sl.org/sle2008/index.php?option=com_content&amp;task=view&amp;id=4&amp;Itemid=4"&gt;call for papers&lt;/a&gt;, then I can only think of one summary: everything that's fun! I'm convinced I'm not the only one who thinks these topics are fun. When talking to colleagues, I notice again and again most of us just love languages. The engineering of those languages is an issue for almost all computer scientists and many programmers in industry, and this conference will be the most obvious target for papers about this!
&lt;/p&gt;

&lt;p&gt;
Also, the formalisms and used for the specification and implementation of (domain-specific) languages are still very much an open research topic. Standardization of languages is still far from perfect, as discussed by many posts on this blog. Also, new language implementation techniques are being proposed all the time, and extensible compilers for developing language extensions are more popular than ever. Not to mention the increasing interest in using domain-specific languages to help solve the software development problems we're facing.
&lt;/p&gt;

&lt;p&gt;
Earlier in this post I wrote that this conference has major potential &lt;em&gt;if done right&lt;/em&gt;. There are few risks. First, the conference has been started by two relatively small communities: ATEM and LDTA. I think the conference should attract a much larger community than the union of those two communities. I hope lots of people outside of the ATEM and LDTA communities will consider to submit a paper. Second, this year the conference is co-located with MODELS. Many programming language people are slightly allergic to model-driven engineering. I hope they will realize that this conference is &lt;em&gt;not&lt;/em&gt; specifically a model-driven conference. Finally, the whole setup of the conference should be international and varied. I'm sorry to say that at this point I'm not entirely happy with the choice of keynote speakers. This nothing personal: I respect both keynote speakers, but the particular combination of the two speakers is a bit unfortunate. First, they are both Dutch. Second, neither of them is extremely well-known in the communities of OOPSLA, PLDI, or ECOOP. I hope that this will not affect the potential of this interesting conference.
&lt;/p&gt;

&lt;p&gt;
Now go work on your submission!
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-7964712993697270374?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/7964712993697270374/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=7964712993697270374' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/7964712993697270374'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/7964712993697270374'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2008/07/new-conference-on-software-language.html' title='New Conference on Software Language Engineering'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-7955536333739439693</id><published>2008-06-03T06:19:00.003+02:00</published><updated>2008-06-03T07:14:55.949+02:00</updated><title type='text'>Dubious Conferences: How do they threat people?</title><content type='html'>&lt;p&gt;
I just got a call for papers for the &lt;a href="http://www.ieccs.net"&gt;International e-Conference on Computer Science&lt;/a&gt; 2008 (IeCCS 2008). The IeCCS conference organizers and committee members are one of a kind! The submission deadline for papers is the 20th of June. Notification of acceptance is 25th of June. That's 5 days for reviewing. The camera ready deadline is 27th of June. That's 2 days for revising your paper. You've got to love the efficiency of these people! The 2007 edition of this conference has a program of 11 pages of accepted papers. When I review papers, I rarely get more than 2 done per day. If the IeCCS committee want to accept a similar number of papers this year, then they'd better make sure to get enough coffee (or tea, as advised in my Ph.D. thesis).
&lt;/p&gt;

&lt;p&gt;
Now, if you are a computer science researcher such emails are hardly surprising. I delete several of them every week. Everybody is aware of conferences with questionable reviewing practices (see the &lt;a href="http://pdos.csail.mit.edu/scigen/"&gt;SCIgen paper generator&lt;/a&gt;). What surprised me about the IeCCS call for papers is that there is actually a researcher on the committee who I vaguely know from when I was a student. So, I searched the web a bit to see how obvious the evidence is that the reviewing practices of IeCCS are questionable. Interestingly, I could find only one reference that mentions IeCCS as a conference where you'd better not submit to. It's an interesting &lt;a href="http://pike.psu.edu/presentations/oracle.pdf "&gt;presentation&lt;/a&gt; of somebody at the PSU.
&lt;/p&gt;

&lt;p&gt;
It seems that lists of conferences with a dubious reputation (also known as fake conferences) are impossible to keep up. I've seen a few lists in the past, but they've all disappeared. What interests me is why those lists are taken down. The most well known list, by Arlindo Oliveira, was &lt;a href="http://www.inesc-id.pt/~aml/trash.html"&gt;taken down&lt;/a&gt; after receiving threats by conference organizers. I've never quite understood that: how serious can such a threat be? Maybe they'll publish a random paper with my name? They'll put me on the program committee next year?
&lt;/p&gt;

&lt;p&gt;
So well, here we go. Let's see what happens.
&lt;/p&gt;

&lt;p&gt;
Notice: IeCCS is not fake. It is very real!
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-7955536333739439693?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/7955536333739439693/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=7955536333739439693' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/7955536333739439693'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/7955536333739439693'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2008/06/dubious-conferences-how-do-they-threat.html' title='Dubious Conferences: How do they threat people?'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-1613880718242304738</id><published>2008-01-20T10:57:00.000+01:00</published><updated>2008-01-20T12:08:00.422+01:00</updated><title type='text'>Ph.D. Thesis: Exercises in Free Syntax</title><content type='html'>&lt;p&gt;
It has been awfully quiet here, I'm sorry about that. There are a few reasons for that. The first one is that I assembled my PhD thesis from my publications. This took quite some time and energy, but the result is great! My dissertation &lt;a href="http://martin.bravenboer.name/thesis.html"&gt;Exercises Free Syntax&lt;/a&gt; is available online. If you are interested in having dead tree version, just let me know!
&lt;/p&gt;

&lt;p&gt;
I will defend my thesis tomorrow, January 21 (see the Dutch &lt;a href="http://applicaties.csc.uu.nl/uupona/bekijkpromotie.cfm?npromotieid=1972"&gt;announcement&lt;/a&gt;). It's weird to realize that tomorrow is the accumulation of 4 years of working intensely!
&lt;/p&gt;

&lt;p&gt;
For the library I created an English abstract. To give you an idea what the thesis is about, let me quote it here:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
In modern software development the use of multiple software languages
to constitute a single application is ubiquitous. Despite the
omnipresent use of combinations of languages, the principles and
techniques for using languages together are ad-hoc, unfriendly to
programmers, and result in a poor level of integration. We work
towards a principled and generic solution to language extension by
studying the applicability of modular syntax definition, scannerless
parsing, generalized parsing algorithms, and program transformations.
&lt;/p&gt;

&lt;p&gt;
We describe MetaBorg, a method for providing concrete syntax for
domain abstractions to application programmers. Since object-oriented
languages are designed for extensibility and reuse, the language
constructs are often sufficient for expressing domain abstractions at
the semantic level. However, they do not provide the right
abstractions at the syntactic level. The MetaBorg method consists of
embedding domain-specific languages in a general purpose host language
and assimilating the embedded domain code into the surrounding host
code.  Instead of extending the implementation of the host language,
the assimilation phase implements domain abstractions in terms of
existing APIs leaving the host language undisturbed.
&lt;/p&gt;

&lt;p&gt;
We present a solution to injection vulnerabilities. Software written
in one language often needs to construct sentences in another
language, such as SQL queries, XML output, or shell command
invocations.  This is almost always done using unhygienic string
manipulation.  A client can then supply specially crafted input that
causes the constructed sentence to be interpreted in an unintended
way, leading to an injection attack. We describe a more natural style
of programming that yields code that is impervious to injections by
construction.  Our approach embeds the grammars of the guest languages
into that of the host language and automatically generates code that
maps the embedded language to constructs in the host language that
reconstruct the embedded sentences, adding escaping functions where
appropriate.
&lt;/p&gt;

&lt;p&gt;
We study AspectJ as a typical example of a language conglomerate,
i.e. a language composed of a number of separate languages with
different syntactic styles. We show that the combination of the
lexical syntax leads to considerable complexity in the lexical states
to be processed. We show how scannerless parsing elegantly addresses
this. We present the design of a modular, extensible, and formal
definition of the lexical and context-free aspects of the AspectJ
syntax. We introduce grammar mixins, which allows the declarative
definition of keyword policies and combination of extensions.
&lt;/p&gt;

&lt;p&gt;
We introduce separate compilation of grammars to enable deployment of
languages as plugins to a compiler. Current extensible compilers focus
on source-level extensibility, which requires users to compile the
compiler with a specific configuration of extensions. A compound
parser needs to be generated for every combination. We introduce an
algorithm for parse table composition to support separate compilation
of grammars to parse table components. Parse table components can be
composed (linked) efficiently at runtime, i.e. just before
parsing. For realistic language combination scenarios involving
grammars for real languages, our parse table composition algorithm is
an order of magnitude faster than computation of the parse table for
the combined grammars, making online language composition feasible.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
Also, they asked me for a Dutch, non-technical summary for news websites. For my Dutch readers:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
We presenteren een verzameling van methoden en technieken om
programmeertalen te combineren. Onze methoden maken het bijvoorbeeld
mogelijk om in een programmeertaal die ontworpen is voor algemene
doeleinden een subtaal te gebruiken die beter aansluit bij het domain
van een bepaald onderdeel van een applicatie. Hierdoor kan een
programmeur op een duidelijkere en compactere wijze een aspect van de
software implementeren.
&lt;/p&gt;
&lt;p&gt;
Op basis van dezelfde technieken presenteren we een methode die
programmeurs beschermt tegen fouten die de oorzaak zijn van het meest
voorkomende beveiligingsprobleem, een zogenaamde injectie aanval. Door
op een iets andere wijze te programmeren, heeft de programmeur de
garantie dat de software niet gevoelig is voor dergelijke
aanvallen. In tegenstelling tot eerder voorgestelde oplossingen geeft
onze methode absolute garanties, is eenvoudiger voor de programmeur,
en kan gebruikt worden voor alle gevallen waarin injectie aanvallen
kunnen voorkomen (bijvoorbeeld niet specifiek voor de taal SQL).
&lt;/p&gt;
&lt;p&gt;
Tot slot maken onze technieken het mogelijk om de syntaxis van sommige
programmeertalen duidelijker en formeler te definieren. Sommige
moderne programmeertalen zijn eigenlijk een samensmelting van
verschillende subtalen (zogenaamde taalagglomeraten). Van dergelijke
talen was het tot nu toe onduidelijk hoe de syntaxis precies
geformuleerd kon worden, wat voor standaardisering en compatibiliteit
noodzakelijk is.
&lt;/p&gt;
&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-1613880718242304738?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/1613880718242304738/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=1613880718242304738' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/1613880718242304738'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/1613880718242304738'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2008/01/phd-thesis-exercises-in-free-syntax.html' title='Ph.D. Thesis: Exercises in Free Syntax'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-3523845320301489824</id><published>2007-04-03T16:13:00.000+02:00</published><updated>2007-04-03T16:19:33.547+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='grammar engineering tools'/><title type='text'>LDTA'07 slides on Grammar Engineering Tools</title><content type='html'>The &lt;a href="http://martin.bravenboer.name/docs/ldta07-slides.pdf"&gt;slides&lt;/a&gt; of our presentation of the LDTA'07 paper &lt;a href="http://martin.bravenboer.name/docs/ldta07.pdf"&gt;Grammar Engineering Support for Precedence Rule Recovery and Compatibility Checking&lt;/a&gt; are now available online. The slides are a spectacular demonstration of latex masochism, so please take a look ;) . There are few bonus slides after the conclusion that I wasn't able to show during the 30-minutes version of the talk.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-3523845320301489824?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/3523845320301489824/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=3523845320301489824' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/3523845320301489824'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/3523845320301489824'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2007/04/ldta07-slides-on-grammar-engineering.html' title='LDTA&apos;07 slides on Grammar Engineering Tools'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-2611748389855552008</id><published>2007-04-03T15:57:00.000+02:00</published><updated>2007-04-03T17:00:54.504+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='grammar engineering tools'/><category scheme='http://www.blogger.com/atom/ns#' term='php-front'/><title type='text'>Migration of the YACC grammar for PHP to SDF</title><content type='html'>&lt;p&gt;
Last summer, &lt;a href="http://ericbouwers.blogspot.com"&gt;Eric Bouwers&lt;/a&gt; started working on infrastructure for PHP program transformation and analysis, sponsored by the Google Summer of Code. He did an excellent job, thanks to his expertise in PHP and his thorough knowledge of &lt;a href="http://www.strategoxt.org"&gt;Stratego/XT&lt;/a&gt;. To enjoy all the language engineering support in Stratego/XT, Eric developed a PHP grammar in SDF, the grammar formalism that is usually applied in Stratego/XT projects. Unfortunately it proved to be very difficult to get the grammar of PHP right.
&lt;/p&gt;

&lt;h2&gt;PHP precedence problems&lt;/h2&gt;

&lt;p&gt;
PHP features many operators, and the precedence of the operators is somewhat unusual and challenging for a grammar formalism. For example, PHP allows the weak binding assignment operator as an argument of the binary, strong binding &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; operator:
&lt;/p&gt;

&lt;pre&gt;
  if ($is_upload &amp;amp;&amp;amp; $file = fopen($fname, 'w')) { 
    ...
  }
&lt;/pre&gt;

&lt;p&gt;
The same holds for the unary, strong binding &lt;code&gt;!&lt;/code&gt; operator:
&lt;/p&gt;

&lt;pre&gt;
  if(!$foo = getenv('BAR')){
    ...
  }
&lt;/pre&gt;

&lt;p&gt;
A similar precedence rule for the &lt;code&gt;include&lt;/code&gt; operator allows an &lt;code&gt;include&lt;/code&gt; to occur as the argument of the strong binding &lt;code&gt;@&lt;/code&gt; operator:
&lt;/p&gt;

&lt;pre&gt;
  @include_once 'Var/Renderer/' . $mode . '.php'
&lt;/pre&gt;

&lt;h2&gt;Precedence rule recovery&lt;/h2&gt;

&lt;p&gt;
The most serious problem was to find out what the exact precedence rules of PHP operators are. The syntax of PHP is defined by a YACC grammar, which has a notion of precedence declarations that is heavily used by the PHP grammar. Unfortunately, for more complex grammars it is far from clear what the exact effect of the precedence declarations are. The precedence declarations are only used for conflict resolution in the parse table generator, so if there is no conflict, then the precedence declarations do not actually have any effect on a particular combination of operators. That's why we developed support for recovering precedence rules from YACC grammars, which I already wrote about in a &lt;a href="http://mbravenboer.blogspot.com/2007/01/grammar-engineering-im-loving-it.html"&gt;previous blog&lt;/a&gt;. Based on these tools, we now have a very precise specification of the precedence rules of PHP.
&lt;/p&gt;

&lt;p&gt;
The next step in the process of getting a perfect PHP grammar was to actually use this specification to develop very precise precedence declarations for the SDF grammar of PHP. However, the precedence rule specification involves about 1650 rules, so migrating these precedence rules to SDF precedence declarations by hand is not really an option. Fortunately, all the ingredients are actually there to &lt;em&gt;generate&lt;/em&gt; SDF priority declarations from the precedence rules that we recover from the YACC grammar.
&lt;/p&gt;

&lt;h2&gt;Argument-specific priorities&lt;/h2&gt;

&lt;p&gt;
Thanks to two new features of SDF, these precedence rules can be translated directly to SDF. The first feature is argument-specific priorities. In the past, SDF only allowed priority declarations between productions. For example, the SDF priority
&lt;/p&gt;

&lt;pre&gt;
  E "*" E -&gt; E &gt; E "+" E -&gt; E
&lt;/pre&gt;

&lt;p&gt;
defines that the production for the &lt;code&gt;+&lt;/code&gt; operator cannot be applied to produce any of the &lt;code&gt;E&lt;/code&gt; arguments of the production for the &lt;code&gt;*&lt;/code&gt; operator, hence the production for the addition operator cannot be applied on the left-hand side or right-hand side of the multiplication operator. This priority implies that the multiplication operator binds stronger than the addition operator. This single SDF priority corresponds to the following &lt;em&gt;two&lt;/em&gt; precedence rules in the grammar formalism independent notation we are using in the &lt;a href="http://www.stratego-language.org/Stratego/GrammarEngineeringTools"&gt;Stratego/XT Grammar Engineering Tools&lt;/a&gt;:
&lt;/p&gt;

&lt;pre&gt;
  &amp;lt;E -&gt; &amp;lt;E -&gt; E + E&gt; * E&gt;
  &amp;lt;E -&gt; E * &amp;lt;E -&gt; E + E&gt;&gt;
&lt;/pre&gt;

&lt;p&gt;
For many languages precedence rules are different for arguments of the same production. That's why we us the more specific representation of a precedence rules in our grammar engineering tools. Fortunately, SDF now supports argument-specific priorities as well. These argument-specific priorities are just plain numbers that indicate to which arguments of a production the priority applies. For example, the following SDF priority forbids the assignment operator only at the left-most and the right-most &lt;code&gt;E&lt;/code&gt; of the conditional operator:
&lt;/p&gt;

&lt;pre&gt;
  E "?" E ":" E -&gt; E &amp;lt;0,4&gt; &gt; E "=" E
&lt;/pre&gt;

&lt;p&gt;
This corresponds to the following precedence rules:
&lt;/p&gt;

&lt;pre&gt;
  &amp;lt;E -&gt; &amp;lt;E -&gt; E = E&gt; ? E : E&gt;
  &amp;lt;E -&gt; E ? E : &amp;lt;E -&gt; E = E&gt;&gt;
&lt;/pre&gt;

&lt;h2&gt;Non-transitive priorities&lt;/h2&gt;

&lt;p&gt;
The second new SDF feature that is required for expressing the PHP precedence rules is non-transitive priorities. Before the introduction of this feature, all SDF priorities where transitively closed. For example, if there are two separate priorities
&lt;/p&gt;

&lt;pre&gt;
  "!" E -&gt; E   &gt; E "+" E -&gt; E
  E "+" E -&gt; E &gt; V "=" E -&gt; E
&lt;/pre&gt;

&lt;p&gt;
then by the transitive closure of priorities this would imply the priority
&lt;/p&gt;

&lt;pre&gt;
  "!" E -&gt; E &gt;  V "=" E -&gt; E
&lt;/pre&gt;

&lt;p&gt;
This transitive closure feature is useful in most cases, but for some languages (such as PHP) the precedence rules are in fact not transitively closed, which makes the definition of these rules in SDF slightly problematic. For this reason, SDF now also features non-transitive priorities, using a dot before the &lt;code&gt;&gt;&lt;/code&gt; of the priority:
&lt;/p&gt;

&lt;pre&gt;
  "!" E -&gt; E .&gt; E "+" E -&gt; E
&lt;/pre&gt;

&lt;p&gt;
Non-transitive priorities will not be included in the transitive closure, which gives you very precise control over the precedence rules.
&lt;/p&gt;

&lt;h2&gt;Precedence rule migration&lt;/h2&gt;

&lt;p&gt;
Thanks to the position-specific, non-transitive priorities of SDF, the precedence rules that we recover from the YACC grammar for PHP can now be mapped directly to SDF priority declarations. The two precedence rules mentioned earlier:
&lt;/p&gt;

&lt;pre&gt;
  &amp;lt;E -&gt; &amp;lt;E -&gt; E + E&gt; * E&gt;
  &amp;lt;E -&gt; E * &amp;lt;E -&gt; E + E&gt;&gt;
&lt;/pre&gt;

&lt;p&gt;
now translate directly to SDF priorities:
&lt;/p&gt;

&lt;pre&gt;
  E * E -&gt; E &amp;lt;0&gt; .&gt; E + E -&gt; E
  E * E -&gt; E &amp;lt;2&gt; .&gt; E + E -&gt; E
&lt;/pre&gt;

&lt;p&gt;
The migration of the recovered YACC precedence rules results in about 1650 of these SDF priorities, but thanks to the fully automatic migration this huge number of priorities is not really a problem. The resulting PHP syntax definition immediately &lt;a href="https://bugs.cs.uu.nl/browse/PSAT-55"&gt;solved&lt;/a&gt; &lt;a href="https://bugs.cs.uu.nl/browse/PSAT-58"&gt;all&lt;/a&gt; the &lt;a href="https://bugs.cs.uu.nl/browse/PSAT-49"&gt;known&lt;/a&gt; &lt;a href="https://bugs.cs.uu.nl/browse/PSAT-53"&gt;issues&lt;/a&gt; with the PHP syntax definition, which shows that this migration was most reliable and successful.
&lt;/p&gt;

&lt;h2&gt;Future&lt;/h2&gt;

&lt;p&gt;
There is a lot of interesting work left to be done. First, it would be interesting to develop a more formal grammar for PHP, similar to the grammars of the C, Java, and C# specifications. These specifications all encode the precedence rules of the operators in the production rules, by introducing non-terminals for all the precedence levels. It should not be too difficult to automatically determine such an encoding from the precedence rules we recover. This would result in a formal specification of the PHP syntax, which will benefit many other parser generators. One of the remarkable things we found out is that the unary &lt;code&gt;-&lt;/code&gt; operator has the same precedence as the binary &lt;code&gt;-&lt;/code&gt; (usually it binds stronger), which results in &lt;code&gt;-1 * 3&lt;/code&gt; being parsed as &lt;code&gt;-(1 * 3)&lt;/code&gt;. We have not been able to find an example where this strange precedence rule results in unexpected behaviour, but for the development of a solid parser is it essential that such precedence rules are defined precisely.
&lt;/p&gt;

&lt;p&gt;
Second, it would be good to try to minimize the number of generated SDF priorities by determining a priority declaration that you can actually oversee as a human. This would involve finding out where the transitive closure feature of SDF priorities can be used to remove redundant priority declarations.
&lt;/p&gt;

&lt;p&gt;
Third, it would great to integrate the precedence rule migration in a tool that completely migrates a YACC/FLEX grammar to SDF. For this, we need tools to parse and understand a FLEX specification and extend the existing support for precedence rule migration to other YACC productions.
&lt;/p&gt;

&lt;p&gt;
Clearly, there is lots of interesting (and useful!) grammar engineering work to do in this direction!
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-2611748389855552008?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/2611748389855552008/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=2611748389855552008' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/2611748389855552008'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/2611748389855552008'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2007/04/migration-of-yacc-grammar-for-php-to.html' title='Migration of the YACC grammar for PHP to SDF'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-369307005206427580</id><published>2007-03-01T17:18:00.000+01:00</published><updated>2007-03-03T01:47:17.454+01:00</updated><title type='text'>x86-64 support for Stratego/XT!</title><content type='html'>&lt;p&gt;
Today is the day that &lt;a href="http://www.strategoxt.org"&gt;Stratego/XT&lt;/a&gt; supports 64-bit processors!  Stratego/XT supports x86-64 from release &lt;a href="http://buildfarm.st.ewi.tudelft.nl/releases/strategoxt/strategoxt-0.17M3pre16744/"&gt;0.17M3pre16744&lt;/a&gt; (&lt;a href="http://buildfarm.st.ewi.tudelft.nl/releases/strategoxt/strategoxt-unstable-latest/"&gt;or later&lt;/a&gt;), the sdf2-bundle from release &lt;a href="http://buildfarm.st.ewi.tudelft.nl/releases/meta-environment/sdf2-bundle-2.4pre212034-sqzzbkp3/"&gt;2.4pre212034&lt;/a&gt; (&lt;a href="http://buildfarm.st.ewi.tudelft.nl/releases/meta-environment/sdf2-bundle-unstable-latest/"&gt;or later&lt;/a&gt;). The releases are available from our new &lt;a href="http://buildfarm.st.ewi.tudelft.nl/releases"&gt;Nix buildfarm&lt;/a&gt; at the TU Delft.
&lt;/p&gt;

&lt;h2&gt;Some history&lt;/h2&gt;

&lt;p&gt;
About 6 years ago, various people &lt;a href="http://mail.cs.uu.nl/pipermail/stratego-dev/2003q2/000262.html"&gt;started&lt;/a&gt; &lt;a href="http://mail.cs.uu.nl/pipermail/stratego-dev/2003q2/000266.html"&gt;to&lt;/a&gt; &lt;a href="http://mail.cs.uu.nl/pipermail/stratego-dev/2003q3/000516.html"&gt;complain&lt;/a&gt; &lt;a href="http://mail.cs.uu.nl/pipermail/stratego-dev/2003q3/000517.html"&gt;about&lt;/a&gt; &lt;a href="http://mail.cs.uu.nl/pipermail/stratego/2003q4/000080.html"&gt;the&lt;/a&gt; &lt;a href="http://mail.cs.uu.nl/pipermail/stratego/2005q4/000440.html"&gt;lack&lt;/a&gt; &lt;a href="http://sjofar.sen.cwi.nl:8080/show_bug.cgi?id=190"&gt;of&lt;/a&gt; &lt;a href="http://sjofar.sen.cwi.nl:8080/show_bug.cgi?id=354"&gt;64-bit&lt;/a&gt; &lt;a href="http://sjofar.sen.cwi.nl:8080/show_bug.cgi?id=606"&gt;processor&lt;/a&gt; support. At the time, most complaints came from our very own Unix geek Armijn Hemel, mostly because of his passion for Sun and these strange UltraSparc machines. However, similar to the limited distribution of Unix geeks, 64-bit systems were rather uncommon at the time. The requests we got were about more obscure processors, like Sun's UltraSparc and Intel's IA-64 (Itanium).
&lt;/p&gt;

&lt;p&gt;
The 64-bit issues were never solved because &lt;em&gt;(1)&lt;/em&gt; we never had a decent 64-bit machine at our disposal, &lt;em&gt;(2)&lt;/em&gt; users with 64-bit system were uncommon, and &lt;em&gt;(3)&lt;/em&gt; most of the issues were actually not Stratego/XT issues, but problems in the &lt;a href="http://www.cwi.nl/htbin/sen1/twiki/bin/view/Meta-Environment/ATerms"&gt;ATerm library&lt;/a&gt;, which is not maintained by the Stratego/XT developers.
&lt;/p&gt;

&lt;h2&gt;Some first steps&lt;/h2&gt;

&lt;p&gt;
However, it is not possible anymore to ignore 64 bit systems: Intel and AMD both sell 64-bit processors for consumers these days. Several users of Stratego/XT already have x86-64 machines, and the only reason why they don't complain en masse is that there is always the option to compile in 32-bit mode (using &lt;code&gt;gcc -m32&lt;/code&gt;).
&lt;/p&gt;

&lt;p&gt;
At the TU Delft (the new Stratego/XT headquarters), we now have an amazing buildfarm with some &lt;a href="http://blog.eelcovisser.net/index.php?/archives/36-Bootfarm.html"&gt;real, dedicated hardware&lt;/a&gt; bought specifically for the purpose of building software. At the moment, all our build machines (except for the Mac Minis) have x86-64 processors, so the lack of 64-bit machines is no longer an excuse.
&lt;/p&gt;

&lt;p&gt;
Also, the ATerm library now enjoys a few more contributors. Last summer, Eelco Dolstra from Utrecht University created the first complete 64-bit patch for the ATerm library (Meta-Environment issue &lt;a href="http://sjofar.sen.cwi.nl:8080/show_bug.cgi?id=606"&gt;606&lt;/a&gt;), simply because his &lt;a href="http://nix.cs.uu.nl"&gt;Nix package management system&lt;/a&gt; uses the ATerm library and portability of Nix is important. Also, Erik Scheffers from the Eindhoven University of Technology has done an excellent job on the development of ATerm branches that support GCC 4.x and 64-bit machines.
&lt;/p&gt;

&lt;h2&gt;The final step .. uh, steps&lt;/h2&gt;

&lt;p&gt;
As a result, it was now feasible to fully support x86-64 systems. The only thing left for me to do was to use all the right patches and branches and enable an x86-64 build in our buildfarm. At least, that's what I thought ... Well, if you know computer scientists, then you'll also know that they are always far too optimistic.
&lt;/p&gt;

&lt;p&gt;
In the end, it took me a day or four to get everything working. This is rather stressful work, I must say. Debugging code that is affected by a mixture of 32-bit assumptions and aliasing bugs introduced by GCC optimizations is definitely &lt;em&gt;not&lt;/em&gt; much fun. You can stare at C code for as long as you like, but if the actual code being executed is completely different, then this won't help much. In the end, this little project resulted in quite a few new issues:
&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;a href="https://bugs.cs.uu.nl/browse/STR-701"&gt;STR-701&lt;/a&gt; is a bug that was raised by casting a pointer to an integer in the &lt;code&gt;address&lt;/code&gt; strategy of &lt;code&gt;libstratego-lib&lt;/code&gt;, which returns an integer representation of the address of an ATerm. The Stratego Library has had this strategy for a long time, and indeed the most natural representation of an address is an integer datatype. Unfortunately, ATerm integers are fixed size, 32-bit integers, hence it cannot be used to represent a pointer of 64 bits. The new representation is a string, which is acceptable for most of the applications of &lt;code&gt;address&lt;/code&gt;.
  &lt;/li&gt;

  &lt;li&gt;
    &lt;p&gt;
      &lt;a href="http://sjofar.sen.cwi.nl:8080/show_bug.cgi?id=720"&gt;Meta-Environment issue 720&lt;/a&gt; is related to GCC optimizations based on strict alias analysis. In this case, the optimization seems to be applied only in the x86-64 backend of GCC, while the underlying problem is in fact architecture independent.
    &lt;/p&gt;

    &lt;p&gt;
      The code that raises this bug applies efficient memory allocation by allocating blocks of objects rather than individual ones. The available objects are encoded efficiently in a linked list, with only a &lt;code&gt;next&lt;/code&gt; field. This &lt;code&gt;next&lt;/code&gt; field is used for the actual data of the object, as well as the link to next available object. The objects are character classes, having the name &lt;code&gt;CC_Class&lt;/code&gt;, which is a typedef for an array of longs. Roughly, the invalid code for adding a node to the linked list looks like this:
    &lt;/p&gt;

&lt;pre&gt;
struct CC_Node {
  struct CC_Node *next;
};

static struct CC_Node *free_nodes = NULL;

void add_node(CC_Class* c) {
  struct CC_Node *node = (struct CC_Node *) c;
  node-&gt;next = free_nodes;
  free_nodes = node;
}
&lt;/pre&gt;

&lt;p&gt;
The problem with this efficient linked list is that the same memory location is accessed through pointers of different types, in this case a pointer to a &lt;code&gt;CC_Node struct&lt;/code&gt; and a pointer to a &lt;code&gt;CC_Class&lt;/code&gt;. Hence, the code creates aliases of different types, which is invalid in C (see for example this nice &lt;a href="http://www.cellperformance.com/mike_acton/2006/06/understanding_strict_aliasing.html"&gt;introduction to strict aliasing&lt;/a&gt;). In this case, C compilers are allowed to assume that the two variables do not alias, which enables a whole bunch of optimizations that are invalid if they do in fact alias.
&lt;/p&gt;

&lt;p&gt;
The solution for this is to use a C union, which explicitly informs the compiler that a certain memory location is accessed through two different types. Using a union, the above code translates to:
&lt;/p&gt;

&lt;pre&gt;
union CC_Node {
  CC_Class *cc;
  CC_Class **next;
};

static union CC_Node free_node = {NULL};

void add_node(CC_Class* c) {
  node.cc = c;
  *(node.next) = free_node.cc;
  free_node.cc = node.cc;
}
&lt;/pre&gt;

&lt;blockquote&gt;
  &lt;p style="font-style: italic; font-size: small;"&gt;
    Sidenote: I'm not really a C union expert, and I'm not 100% sure whether in this case a union is necessary for a &lt;code&gt;CC_Class*&lt;/code&gt; and &lt;code&gt;CC_Class**&lt;/code&gt; or &lt;code&gt;CC_Class&lt;/code&gt; and &lt;code&gt;CC_Class*&lt;/code&gt;. The union I've chosen solves the bug, but I should figure out what the exact solution should be. Feedback is welcome.
  &lt;/p&gt;
&lt;/blockquote&gt;

  &lt;/li&gt;

  &lt;li&gt;
    &lt;a href="http://sjofar.sen.cwi.nl:8080/show_bug.cgi?id=718"&gt;Meta-Environment issue 718&lt;/a&gt; is related to the previous bug. The problem here is that the same memory locations are accessed through a generic datatype (ATerm) as well as pointers to more specific structs, which again leads to strict aliasing problems. This time, the issue has been solved in a more ad-hoc way by declaring a variable as volatile. This solves the issue for now, but a more fundamental solution (probably a union) is necessary here as well.
  &lt;/li&gt;

  &lt;li&gt;
    &lt;p&gt;
      &lt;a href="https://bugs.cs.uu.nl/browse/STR-705"&gt;STR-705&lt;/a&gt; adds some checks for the size of various types to the Stratego/XT build system, called Auto/XT. These checks are necessary for the header files of the ATerm library, which determine the characteristics of the platform based on the size of longs, integers, and void pointers, which are defined as macros (a feature that is under discussion in &lt;a href="http://sjofar.sen.cwi.nl:8080/show_bug.cgi?id=606"&gt;Meta-Environment issue 606&lt;/a&gt;: it does not play very well with cross compilation and compilation in 32-bit mode on a 64-bit platform). The ATerm library we are using at the moment is the branch &lt;code&gt;64-bit-fixes&lt;/code&gt;, which has been developed by Eelco Dolstra and Erik Scheffers.
    &lt;/p&gt;

    &lt;p&gt;
      The new macro &lt;code&gt;XT_C_TYPE_CHARACTERISTICS&lt;/code&gt; checks the sizes and defines the macros that are required by these headers. The macro &lt;code&gt;XT_SETUP&lt;/code&gt; invokes the &lt;code&gt;XT_C_TYPE_CHARACTERISTICS&lt;/code&gt; macro, so all packages based on Stratego/XT will automatically support the 64-bit branch of the ATerm library.
    &lt;/p&gt;
  &lt;/li&gt;

  &lt;li&gt;
    &lt;a href="https://bugs.cs.uu.nl/browse/STR-703"&gt;STR-703&lt;/a&gt; is related to the previous issues. In packages based on the GNU Autotools and Auto/XT, the C code is compiled by the Automake-based build system, not by the Stratego compiler itself (which only produces the C code). In this case, the &lt;code&gt;XT_C_TYPE_CHARACTERISTICS&lt;/code&gt; takes care of the required defines. However, the Stratego compiler can also be used as a standalone compiler, where &lt;code&gt;strc&lt;/code&gt; invokes the C compiler itself. In this case, &lt;code&gt;strc&lt;/code&gt; needs to pass the definitions of macros to the C compiler.
  &lt;/li&gt;

  &lt;li&gt;
    &lt;p&gt;
      &lt;a href="https://bugs.cs.uu.nl/browse/STR-704"&gt;STR-704&lt;/a&gt; drops the use of autoheader in stratego-libraries. Autoheader replaces the command-line definition of macros to a generated &lt;code&gt;config.h&lt;/code&gt;. This generated file used to be installed as &lt;code&gt;stratego-config.h&lt;/code&gt;, but this header file is no longer necessary: there is no configuration option in this file that is still necessary as part of the Stratego/XT installation. The mechanism of &lt;code&gt;config.h&lt;/code&gt; installation is rather fragile (some macro definitions have to be removed), so if it is not necessary anymore, then why not drop it ...
    &lt;/p&gt;

    &lt;p&gt;
      The relation to x86-64 support is that several C files in the stratego-libraries package did not correctly include the generated &lt;code&gt;config.h&lt;/code&gt; before &lt;code&gt;aterm2.h&lt;/code&gt;. This breaks on x86-64 systems because &lt;code&gt;aterm2.h&lt;/code&gt; requires the aforementioned macro definitions.
    &lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Short version&lt;/h2&gt;

&lt;p&gt;
The net result of this operation is that we now support x86-64 systems. And this time we will keep supporting 64-bit processors, whatever it takes.
&lt;/p&gt;

&lt;p&gt;
It would be fun to check now if UltraSparc and IA-64 machines work out of the box, but I don't have access to any of these. If you have one, I would love to know if it works.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-369307005206427580?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/369307005206427580/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=369307005206427580' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/369307005206427580'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/369307005206427580'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2007/03/x86-64-support-for-strategoxt.html' title='x86-64 support for Stratego/XT!'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-3787388678361515601</id><published>2007-02-15T00:48:00.001+01:00</published><updated>2007-02-15T00:54:23.507+01:00</updated><title type='text'>Base access in the C# specification</title><content type='html'>&lt;p&gt;
In a &lt;a href="http://mbravenboer.blogspot.com/2007/02/informal-specifications-are-not-so.html"&gt;previous post&lt;/a&gt;, I discussed a bug in the Java Language Specification on super field access of protected fields. If you haven't read this yet, I would suggest to give it a read before you continue with this post. Thanks to a discussion with &lt;a href="http://blogs.sun.com/abuckley/"&gt;Alex Buckley&lt;/a&gt; (the new maintainer of the Java Language specification), there is now a proposal to fix this bug in an elegant way. I'll report on the solution and the nice discussion on the relation to super field accesses in bytecode later.
&lt;/p&gt;

&lt;p&gt;
However, first I would like to illustrate the risk of reuse. While writing on issues in the Java Language Specification, I figured that the C# specification probably has the same issue. After all, C# features the same &lt;a href="http://mbravenboer.blogspot.com/2006/04/on-details-of-protected-access-in-java.html"&gt;details of protected access&lt;/a&gt;. Consider the following two C# classes:
&lt;/p&gt;

&lt;pre&gt;
  class A {
    protected int secret;
  }

  class B : A {
    public void f(A a) {
      a.secret = 5;
    }
  }
&lt;/pre&gt;

&lt;p&gt;
Due to the details of protected access, this example won't compile. The Mono C# compiler clearly explains the problem:
&lt;/p&gt;

&lt;pre&gt;
  A.cs(17,5): error CS1540: Cannot access protected 
  member  `A.secret' via a qualifier of type `A'. The 
  qualifier must be of type `B' or derived from it
&lt;/pre&gt;

&lt;p&gt;
Of course, C# also support access to fields of base classes (aka super classes). Indeed, checking the C# specification reveals that the definition of base access is exactly the same as super field access in Java. In Section 14.5.8 of the C# Language Specification (&lt;a href="http://www.ecma-international.org/publications/standards/Ecma-334.htm"&gt;ECMA-334&lt;/a&gt;), the semantics of a base access expressions is defined in the following way:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;em&gt;
"At compile-time, base-access expressions of the form &lt;code&gt;base.I&lt;/code&gt; and &lt;code&gt;base[E]&lt;/code&gt; are evaluated exactly as if they were written &lt;code&gt;((B)this).I&lt;/code&gt; and &lt;code&gt;((B)this)[E]&lt;/code&gt;, where &lt;code&gt;B&lt;/code&gt; is the base class of the class or struct in which the construct occurs."
&lt;/em&gt;
&lt;/blockquote&gt;

&lt;p&gt;
Compare this definition to the Java Language Specification:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;em&gt;
"Suppose that a field access expression &lt;code&gt;super.name&lt;/code&gt; appears within class &lt;code&gt;C&lt;/code&gt;, and the immediate super class of &lt;code&gt;C&lt;/code&gt; is class &lt;code&gt;S&lt;/code&gt;. Then &lt;code&gt;super.name&lt;/code&gt; is treated exactly as if it had been the expression &lt;code&gt;((S)this).name&lt;/code&gt;."
&lt;/em&gt;
&lt;/blockquote&gt;

&lt;p&gt;
The good thing about this reuse is that I can reuse the examples of my previous post as well. Consider the following two C# classes that compile without any problem:
&lt;/p&gt;

&lt;pre&gt;
  class A {
    protected int secret;
  }

  class B : A {
    public void f() {
      base.secret = 5;
    }
  }
&lt;/pre&gt;

&lt;p&gt;
Next, consider the derivative of this example, where class B has been modified to refer to the field secret using &lt;code&gt;(A) this&lt;/code&gt; which is exactly the same as a reference through &lt;code&gt;base&lt;/code&gt;, according to the specification.
&lt;/p&gt;

&lt;pre&gt;
  class B : A {
    public void f() {
      ((A) this).secret = 5;
    }
  }
&lt;/pre&gt;

&lt;p&gt;
Similar to Java, this class won't compile, due to the details of protected access in C#. Again, the Mono C# compiler explains the issue:
&lt;/p&gt;

&lt;pre&gt;
  A.cs(13,7): error CS1540: Cannot access protected 
  member `A.secret' via a qualifier of type `A'. The 
  qualifier must be of type `B' or derived from it
&lt;/pre&gt;

&lt;p&gt;
This example shows that for C# the two expressions &lt;code&gt;base.secret&lt;/code&gt; and &lt;code&gt;((A) this).secret&lt;/code&gt; are not evaluated in the same way, so the previously reported problem in the Java Language Specification also applies to the C# specification.
&lt;/p&gt;

&lt;p&gt;
Now I have to figure out how to report issues in the C# specification
...
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-3787388678361515601?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/3787388678361515601/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=3787388678361515601' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/3787388678361515601'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/3787388678361515601'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2007/02/base-access-in-c-specification.html' title='Base access in the C# specification'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-5153186230253633082</id><published>2007-02-14T14:38:00.000+01:00</published><updated>2007-02-14T15:20:54.576+01:00</updated><title type='text'>New Arrivals</title><content type='html'>Yesterday was the day of two new arrivals:
&lt;ul&gt;
  &lt;li&gt;&lt;a href="http://www.zefhemel.com"&gt;Zef Hemel&lt;/a&gt; is &lt;a href="http://www.zefhemel.com/archives/2007/02/14/another-change-in-direction"&gt;joining&lt;/a&gt; the &lt;a href="http://swerl.tudelft.nl/bin/view/MoDSE/WebHome"&gt;MoDSE&lt;/a&gt; project at the TU Delft. Welcome Zef!&lt;/li&gt;
  &lt;li&gt;&lt;a href="http://freeflycode.blogspot.com/"&gt;Shan Shan Huang&lt;/a&gt; has started blogging about her work on &lt;a href="http://freeflycode.blogspot.com/2007/02/mj.html"&gt;MJ&lt;/a&gt;, a very cool code generation solution. Yannis, Shan Shan, and David have been searching for a sweet spot in safe code generation during the last few years, and this sounds like a very interesting approach. Go and read their ECOOP paper!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This post was unusually short. I'm sorry for that, but I had to mention this short list ;)&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-5153186230253633082?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/5153186230253633082/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=5153186230253633082' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/5153186230253633082'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/5153186230253633082'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2007/02/new-arrivals.html' title='New Arrivals'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-3882147617116565486</id><published>2007-02-11T02:31:00.000+01:00</published><updated>2007-02-08T00:45:29.973+01:00</updated><title type='text'>Informal specifications are not so super</title><content type='html'>&lt;p&gt;
I'm trying to get back to the good habit of blogging about our work. I'm not very fond of dumping random links or remarks, so the most challenging part of blogging is to find a good topic to write a decent story about. This time, I fallback to a topic I've actually been working on about two years ago, but is still most relevant.  At that time, I was actively developing a type checker for Java, as part of the &lt;a href="http://www.stratego-language.org/Stratego/TheDryad"&gt;Dryad&lt;/a&gt; project. This story is about a bug in the &lt;a href="http://java.sun.com/docs/books/jls/"&gt;Java Language Specification&lt;/a&gt;, that for whatever bizarre reason has never been reported (afaik).
&lt;/p&gt;


&lt;h2&gt;Super field access&lt;/h2&gt;

&lt;p&gt;
Java supports access to fields of super classes using the &lt;code&gt;super&lt;/code&gt; keyword, even if this field is hidden by a declaration of another field with the same name. For example, the following sample will print &lt;code&gt;super.x = 1 and x = 2&lt;/code&gt;.
&lt;/p&gt;

&lt;pre&gt;
  class S {
    int x = 1;
  }

  class C extends S {
    int x = 2;

    void print() {
      System.out.println("super.x = " + super.x + " and x = " + x);
    }

    public static void main(String[] ps) {
      new C().print(); 
    }
  }
&lt;/pre&gt;

&lt;p&gt;
To allow access from inner classes to hidden fields of enclosing instances, Java also supports qualified super field accesses. In this case, the &lt;code&gt;super&lt;/code&gt; keyword is prefixed with the name of a lexically enclosing class. This feature is related to the qualified &lt;code&gt;this&lt;/code&gt; expression, which allows you to refer to an enclosing instance.
&lt;/p&gt;

&lt;h2&gt;Current specification&lt;/h2&gt;

&lt;p&gt;
We all have a reasonable, though informal, idea what the semantics of this language feature is. Of course for a real specification the semantics has to be defined more precisely. For example, two things that need to be define are what the type of such an expression is and if the field is accessible at all. The specification concisely defines the semantics of this language feature by &lt;em&gt;forwarding&lt;/em&gt; the semantic rules to existing, more basic language features. For &lt;code&gt;super.name&lt;/code&gt;, the JLS specifies:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;em&gt;
"Suppose that a field access expression &lt;code&gt;super.name&lt;/code&gt; appears within class &lt;code&gt;C&lt;/code&gt;, and the immediate super class of &lt;code&gt;C&lt;/code&gt; is class &lt;code&gt;S&lt;/code&gt;. Then &lt;code&gt;super.name&lt;/code&gt; is treated exactly as if it had been the expression &lt;code&gt;((S)this).name&lt;/code&gt;."
&lt;/em&gt;
&lt;/blockquote&gt;

&lt;p&gt;
So, in the example I gave, &lt;code&gt;super.x&lt;/code&gt; would be &lt;em&gt;exactly&lt;/em&gt; equivalent to &lt;code&gt;((S)this).x&lt;/code&gt;. Obviously, the emphasis of exactly is on purpose. Why would they use this word?  Does this suggest that there is also a notion of being treated &lt;em&gt;almost exactly&lt;/em&gt; in the same way? ;) .
&lt;/p&gt;

&lt;p&gt;
For qualified field access, the specification is almost the same, but this time using a qualified &lt;code&gt;this&lt;/code&gt; instead of &lt;code&gt;this&lt;/code&gt;.
&lt;/p&gt;

&lt;blockquote&gt;
&lt;em&gt;
"Suppose that a field access expression &lt;code&gt;T.super.name&lt;/code&gt; appears within class &lt;code&gt;C&lt;/code&gt;, and the immediate super class of the class denoted by &lt;code&gt;T&lt;/code&gt; is a class whose fully qualified name is &lt;code&gt;S&lt;/code&gt;. Then &lt;code&gt;T.super.name&lt;/code&gt; is treated exactly as if it had been the expression &lt;code&gt;((S)T.this).name&lt;/code&gt;."
&lt;/em&gt;
&lt;/blockquote&gt;

&lt;p&gt;
This specification looks very reasonable, considering that for a field access only the &lt;em&gt;compile-time type&lt;/em&gt; of the subject expression is used to determine which field is to be used. By casting the subject expression (&lt;code&gt;this&lt;/code&gt;) to the right type, the expected field of the super class is accessed.
&lt;/p&gt;

&lt;h2&gt;Oops&lt;/h2&gt;

&lt;p&gt;
Of course, it is always nice to have your type checker as compact as possible, so I was very happy with this specification. I could just forward everything related to super field accesses to the corresponding expression with a cast and a &lt;code&gt;this&lt;/code&gt; expression. The Dryad typing rules looked something like this:
&lt;/p&gt;

&lt;pre&gt;
  attributes:
    |[ super.x ]| -&gt; &amp;lt;attributes&gt; |[ ((reft) this).x ]|
    where
      &lt;em&gt;reft is the superclass of the current class&lt;/em&gt;

  attributes:
    |[ cname.super.x ]| -&gt;  &amp;lt;attributes&gt; |[ ((reft) cname.this).x ]|
    where
      &lt;em&gt;reft is the superclass of the class cname&lt;/em&gt;
&lt;/pre&gt;

&lt;p&gt;
This implementation looks very attractive, but ... it didn't work. The reason for this is that &lt;code&gt;super.name&lt;/code&gt; is in fact &lt;em&gt;not&lt;/em&gt; exactly the same as &lt;code&gt;((S)this).name&lt;/code&gt;. The reason for this are the details of protected access, which I've &lt;a href="http://mbravenboer.blogspot.com/2006/04/on-details-of-protected-access-in-java.html "&gt;previously&lt;/a&gt; written about on my blog. I'm not going to redo that, so let me just give an example (based on the example in the previous post) where this assumed equivalence is invalid. First, the following two classes are valid and can be compiled without any problems:
&lt;/p&gt;

&lt;pre&gt;
  package a;
  public class A {
    protected int secret;
  }

  package b;
  public class B extends a.A {
    void f() {
      super.secret = 5;
    }
  }
&lt;/pre&gt;

&lt;p&gt;
Next, let's now change the assignment to the expression &lt;code&gt;((a.A) this).secret&lt;/code&gt;, which is equivalent to &lt;code&gt;super.secret&lt;/code&gt; according to the specification.
&lt;/p&gt;

&lt;pre&gt;
  package b;

  public class B extends a.A {
    void f() {
      ((a.A) this).secret = 5;
    }
  }
&lt;/pre&gt;

&lt;p&gt;
Unfortunately, this won't compile, due to the details of protected access:
&lt;/p&gt;

&lt;pre&gt;
b/B.java:5: secret has protected access in a.A
    ((a.A) this).secret = 5;
    ^
1 error
&lt;/pre&gt;

&lt;p&gt;
This example shows that the two expressions are not treated in the same way, so this looks like a problem in the Java Language Specification to me. Also, this shows that in semantics of languages likje Java the devil is really in the detail. What surprises me is that nobody has mentioned this before. Several major Java compilers have been implemented, right?  Shouldn't the programmers responsible for these compilers have encountered this problem?
&lt;/p&gt;

&lt;h2&gt;Java Virtual Machine&lt;/h2&gt;

&lt;p&gt;
Another interesting thing is how the Java Virtual Machine specification deals with this. There is no special bytecode operator for accessing fields of super classes: all field assignments are performed using the &lt;code&gt;putfield&lt;/code&gt; operator. Assuming that the source compiler would ignore the protected access problem, the two unequal examples I just gave would compile to exactly the same bytecode. So how can the JVM report an error about an illegal access for the expression &lt;code&gt;((a.A) this).secret&lt;/code&gt;? Well, it turns out that it doesn't.
&lt;/p&gt;

&lt;p&gt;
We can show this by first making &lt;code&gt;secret&lt;/code&gt; public, compile &lt;code&gt;B&lt;/code&gt;, then make &lt;code&gt;secret&lt;/code&gt; protected, and only recompile &lt;code&gt;A&lt;/code&gt;. This works like a charm: doing this trick for the following example prints &lt;code&gt;secret = 5&lt;/code&gt;.
&lt;/p&gt;

&lt;pre&gt;
  package a;
  public class A {
    protected int secret;
    public void print() {
      System.out.println("secret = " + secret);
    }
  }

  package b;
  public class B extends a.A {
    void f() {
      ((a.A) this).secret = 5;
    }
    public static void main(String[] ps) {
      B b = new B();
      b.f();
      b.print();
    }
  }
&lt;/pre&gt;

&lt;p&gt;
However, if this would be allowed by bytecode in general, then this would mean that the security vulnerability that was fixed with the details of protected access, would actually only give &lt;em&gt;source&lt;/em&gt; level protection. Obviously, that would be no protection at all: you can safely assume that attackers are capable of writing bytecode. So let's try to make the example a bit more adventurous by passing the subject expression to the f method:


&lt;/p&gt;

&lt;pre&gt;
  package b;
  public class B extends a.A {
    void f(a.A a) {
      a.secret = 5;
    }
    public static void main(String[] ps) {
      B b = new B();
      b.f(b);
      b.print();
    }
  }
&lt;/pre&gt;

&lt;p&gt;
This time, the verifier reports an error:
&lt;/p&gt;

&lt;pre&gt;
  Exception in thread "main" java.lang.VerifyError:
    (class: b/B, method: f signature: (La/A;)V)
    Bad access to protected data
&lt;/pre&gt;

&lt;p&gt;
This error report is correct, so apparently the verifier does check for illegal protected access. In the first case, it was just a bit more liberal than the source language. The question is, how is this specified in the Java Virtual Machine specif cation? My first impression was that there might be some special handling of accesses to &lt;code&gt;this&lt;/code&gt;. However, this would require the verifier to trace which local variables might have the value of &lt;code&gt;this&lt;/code&gt;, which is rather unlikely. Then, Dick Eimers (who did lots of scary bytecode stuff for his master thesis) pointed me to a paper that exactly covers this subject: &lt;a href="http://www.jot.fm/issues/issue_2005_10/article3"&gt;Checking Access to Protected Members in the Java Virtual Machine&lt;/a&gt; by &lt;a href="http://www.kestrel.edu/home/people/coglio/"&gt;Alessandro Coglio&lt;/a&gt;. Strange enough, this paper is not cited anywhere, while I think that the discussion of this issue is pretty good.
&lt;/p&gt;

&lt;p&gt;
It turns out that the difference in accessibility between super field accesses and ordinary field accesses is handled &lt;em&gt;implicitly&lt;/em&gt; thanks to the type inferencer used by Java Virtual Machine. The inferred type of the operand of the field access will be more specific than the type in the corresponding source code, which makes the access to the protected field valid in bytecode. I don't think that this implicit handling of the observed difference is a very good idea.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-3882147617116565486?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/3882147617116565486/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=3882147617116565486' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/3882147617116565486'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/3882147617116565486'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2007/02/informal-specifications-are-not-so.html' title='Informal specifications are not so super'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-6325466175191844697</id><published>2007-02-06T20:46:00.001+01:00</published><updated>2007-02-19T23:28:24.905+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='stringborg'/><title type='text'>Our take on injection attacks</title><content type='html'>&lt;p&gt;
If you haven't been hiding under some really impressive rock for the last decade, then you probably know that injection attacks are a major issue in web applications. The problem of &lt;a href="http://en.wikipedia.org/wiki/SQL_injection"&gt;SQL injection&lt;/a&gt; is well-known, but you see similar issues &lt;em&gt;everywhere&lt;/em&gt;: &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=sql+injection&amp;amp;btnG=Search"&gt;SQL&lt;/a&gt;, &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=shell+injection&amp;amp;btnG=Search"&gt;Shell&lt;/a&gt;, &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=xml+injection&amp;amp;btnG=Search"&gt;XML&lt;/a&gt;, &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=html+injection&amp;amp;btnG=Search"&gt;HTML&lt;/a&gt;, &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=ldap+injection&amp;amp;btnG=Search"&gt;LDAP search filters&lt;/a&gt;, &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=xpath+injection&amp;amp;btnG=Search"&gt;XPath&lt;/a&gt;, &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=xquery+injection&amp;amp;btnG=Search"&gt;XQuery&lt;/a&gt;, and a whole series of enterprisey query languages, such as &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=hql+injection&amp;amp;btnG=Search"&gt;HQL&lt;/a&gt;, &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=jdoql+injection&amp;amp;btnG=Search"&gt;JDOQL&lt;/a&gt;, &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=ejbql+injection&amp;amp;btnG=Search"&gt;EJBQL&lt;/a&gt;, &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=oql+injection&amp;amp;btnG=Search"&gt;OQL&lt;/a&gt; are all potential candidates for injection attacks. Just search for any of these languages together with the term injection and observe the horror. Recently, it has also become more popular to mix a program written in &lt;a href="http://java.sun.com/developer/technicalArticles/J2SE/Desktop/scripting/"&gt;Java with scripts&lt;/a&gt;, usually something like JavaScript, Ruby or Groovy. If you include user input in the script, then this is yet another vector of attack.
&lt;/p&gt;

&lt;h2&gt;Solutions?&lt;/h2&gt;

&lt;p&gt;
Of course it is possible to just advice programmers to properly escape all user inputs, which prevents most of the injection attacks. However, that's like telling people to do their own &lt;a href="http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)"&gt;memory management&lt;/a&gt; or to do the dishes every day (which is a particular problem I have). In other words: you won't get it right.
    &lt;/p&gt;

&lt;p&gt;
Most of the research on injection attacks has focused on finding injection problems in existing source code using static and/or runtime analysis. Usually, this results in tools that check for injection attacks for specific languages (e.g. SQL) in specific host languages (e.g. PHP). This is very important and useful work, since it can easily be applied to detect or prevent injection attacks in existing code bases. However, at some point we just fundamentally need to reconsider the way we program. Why just fight the symptoms if you can fix the problem?
&lt;/p&gt;

&lt;p&gt;
So that's what we've done in our latest work called &lt;a href="http://www.stringborg.org"&gt;StringBorg&lt;/a&gt;. I'm not going to claim that all your injection problems will be over tomorrow, but at least I think that what we propose here gives us some perspective on solving theses issues once and for all in a few years. The solution we propose is to use syntax embeddings of the &lt;em&gt;guest&lt;/em&gt; languages (SQL, LDAP, Shell, XPath, JavaScript) in the &lt;em&gt;host&lt;/em&gt; language (PHP, Java) and let the system do all the proper &lt;em&gt;escaping&lt;/em&gt; and &lt;em&gt;positive checking&lt;/em&gt; of user input.
&lt;/p&gt;

&lt;h2&gt;Examples&lt;/h2&gt;

&lt;p&gt;
The paper I'll mention later explains all the technical details, and I cannot redo that in a better way in a blog, so I'll just give a bunch of examples that illustrate how it works.
&lt;/p&gt;

&lt;h4&gt;SQL&lt;/h4&gt;

&lt;p&gt;
The first example is an embedding of SQL in Java. This example illustrates how you can insert strings in SQL queries and compose SQL queries at runtime.  The first code fragment is the classic, vulnerable, way of composing SQL queries using string concatenation.
&lt;/p&gt;

&lt;pre&gt;
  String s = "'; DROP TABLE Users; --";
  String e = "username = \'" + s + "\'";
  String q = "SELECT password FROM Users WHERE " + e;
  System.out.println(q);
&lt;/pre&gt;

&lt;p&gt;
Clearly, if the string &lt;code&gt;s&lt;/code&gt; was provided by the user, then this would result in an injection attack: the final query is &lt;code&gt;SELECT password FROM Users WHERE username = ''; DROP TABLE Users; --'&lt;/code&gt;. Bad luck, the &lt;code&gt;Users&lt;/code&gt; table is gone! (or maybe you can thank your database administrator).
&lt;/p&gt;

&lt;p&gt;
With StringBorg, you can introduce some kind of literal syntax for SQL. The SQL code is written between the quotation symbols &lt;code&gt;&amp;lt;|...|&gt;&lt;/code&gt;. SQL code or strings can be inserted in another SQL query using the syntax &lt;code&gt;${...}&lt;/code&gt;. The example would be written in StringBorg as:
&lt;/p&gt;

&lt;pre&gt;
  String s = "'; DROP TABLE Users; --";
  SQL e = &amp;lt;| username = ${s} |&gt;;
  SQL q = &amp;lt;| SELECT password FROM Users WHERE ${e} |&gt;;
  System.out.println(q.toString());
&lt;/pre&gt;

&lt;p&gt;
This will result in the correct query, &lt;code&gt;SELECT password FROM Users WHERE username = '''; DROP TABLE Users; --'&lt;/code&gt;, where the single quotes have been escaped by StringBorg according to the rules of the SQL standard (the exact escaping rules depend on the SQL dialect). Not only does the StringBorg solution solve the injection problem, it is also much prettier! This example also shows that it is not required to know the full SQL query at compile-time, for example the actual condition &lt;code&gt;e&lt;/code&gt; could be different for two branches of an &lt;code&gt;if&lt;/code&gt; statement, or could even be constructed in a &lt;code&gt;while&lt;/code&gt; statement.
&lt;/p&gt;

&lt;p&gt;
The nice thing about StringBorg is that the SQL support is not restricted to a specific language, in this case Java. For PHP, you can do exactly the same thing:
&lt;/p&gt;

&lt;pre&gt;
  $s = "'; DROP TABLE Users; --";
  $e = &amp;lt;| username = ${$s} |&gt;;
  $q = &amp;lt;| SELECT password FROM Users WHERE ${$e} |&gt;;
  echo $q-&gt;toString(), "\n";
&lt;/pre&gt;

&lt;h4&gt;LDAP&lt;/h4&gt;

&lt;p&gt;
Using user input in LDAP search filters has very similar injection problems. First a basic example, where there is no problem with the user input:
&lt;/p&gt;

&lt;pre&gt;
  String name = "Babs Jensen";
  LDAP q = (| (cn=$(name)) |);
  System.out.println(q.toString());
&lt;/pre&gt;

&lt;p&gt;
The resulting LDAP filter will be &lt;code&gt;(cn=Babs Jensen)&lt;/code&gt;, which is what you would except. If the string has the value &lt;code&gt;Babs (Jensen)&lt;/code&gt;, then the parentheses need to be escaped. Indeed, StringBorg will produce the filter &lt;code&gt;(cn=Babs \28Jensen\29)&lt;/code&gt;. This input might have been an accident, but of course we can easily change this into a real injection attempt by using the string &lt;code&gt;*&lt;/code&gt;. Again, StringBorg will properly escape this, resulting in the query &lt;code&gt;(cn=\2a)&lt;/code&gt;.
&lt;/p&gt;

&lt;h4&gt;Shell&lt;/h4&gt;

&lt;p&gt;
Programs that invoke shell command could be vulnerable to injection attacks as well (as the TWiki developers and users &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=twiki+shell+injection&amp;amp;btnG=Search"&gt;have learned the hard way&lt;/a&gt;). Similar to the other examples, StringBorg introduces a syntax to construct shell commands, and escape strings:
&lt;/p&gt;

&lt;pre&gt;
  Shell cmd = &amp;lt;| /bin/echo svn cat http://x -r &amp;lt;| s |&gt; |&gt;;
  System.out.println(cmd.toString());
&lt;/pre&gt;

&lt;p&gt;
  If &lt;code&gt;s&lt;/code&gt; has the values &lt;code&gt;bravo&lt;/code&gt;, &lt;code&gt;foo
  bar&lt;/code&gt;, &lt;code&gt;*&lt;/code&gt; and &lt;code&gt;; echo pwn3d!&lt;/code&gt;
  respectively, then the resulting commands are:
&lt;/p&gt;

&lt;pre&gt;
  /bin/echo svn cat http://x -r bravo
  /bin/echo svn cat http://x -r foo\ bar
  /bin/echo svn cat http://x -r \*
  /bin/echo svn cat http://x -r \;\ echo\ pwn3d\!
&lt;/pre&gt;

&lt;h4&gt;JavaScript&lt;/h4&gt;

&lt;p&gt;
Not only does StringBorg prevent injection attacks, it also makes composing SQL, XQuery, JavaScript, etc more attractive: you don't have to concatenate all these nasty strings anymore. For example, the following example taken from and article on the new Java scripting support is just plain ugly:
&lt;/p&gt;

&lt;pre&gt;
  jsEngine.eval(
    "function printNames1(namesList) {" +
    "  var x;" +
    "  var names = namesList.toArray();" +
    "  for(x in names) {" +
    "    println(names[x]);" +
    "  }" +
    "}" +

    "function addName(namesList, name) {" +
    "  namesList.add(name);" +
    "}"
  );
&lt;/pre&gt;

&lt;p&gt;
whereas this looks quite reasonable:
&lt;/p&gt;

&lt;pre&gt;
  jsEngine.eval(|[
    function printNames1(namesList) {
      var x;
      var names = namesList.toArray();
      for(x in names) {
        println(names[x]);
      }
    }

    function addName(namesList, name) {
      namesList.add(name);
    }
  ]| );
&lt;/pre&gt;

&lt;p&gt;
Of course, this would be easy to fix by introducing multi-line string literals in Java, but in addition to the nicer syntax, you get protection against injection attacks and compile-time syntactic checking of the code for free!
&lt;/p&gt;

&lt;h2&gt;Generic, generic, generic&lt;/h2&gt;

&lt;p&gt;
Now, if you are familiar with our work, then this solution won't really surprise you, since we have been working on syntax embeddings for some time now (although in different application areas, such as meta programming). However, this work is quite a fundamental step towards making these syntax embeddings easier to use by ordinary programmers. First, the system now supports ambiguities, which always was the weak point of our code generation work: if you don't support ambiguities, then the programmer needs to be familiar with the details of the grammar of the guest language, which you really don't want. Fortunately, this is now a technical detail that you now can forget about! Second, &lt;em&gt;no meta-programming&lt;/em&gt; is required at all to add a new guest language (e.g. XPath) to the system. All you need to do is define the syntax of the language, define the syntax of the embedding, and optionally define escaping rules for strings and you're all set. Thus, compared to our previous work on &lt;a href="http://www.stratego-language.org/Stratego/ConcreteSyntaxForObjects"&gt;MetaBorg (OOPSLA '04)&lt;/a&gt;, there is no need for implementing the mapping from the syntax of the guest language to code in the host language.
&lt;/p&gt;

&lt;p&gt;
This is a pretty amazing property: basically, this means that you can just use &lt;em&gt;languages&lt;/em&gt; as &lt;em&gt;libraries&lt;/em&gt;. You can just pick the languages you want to use in a source file and that's it! No difficult meta-programming stuff, no program transformation, no rewrite rules and strategies, no limitations. In fact, this even goes beyond libraries: libraries are always language specific (for example for Java or PHP), but the implementation of support for a guest language (e.g. SQL) is &lt;em&gt;language independent&lt;/em&gt;. This means that if some person or company implements support for a guest language (e.g. SQL) then &lt;em&gt;all&lt;/em&gt; host languages (Java, PHP, etc) are immediately supported.
&lt;/p&gt;

&lt;h2&gt;Future?&lt;/h2&gt;

&lt;p&gt;
The paper we wrote about this is titled &lt;em&gt;"Preventing Injection Attacks with Syntax Embeddings. A Host and Guest Language Independent Approach"&lt;/em&gt; and is now available as &lt;a href="http://swerl.tudelft.nl/twiki/pub/Main/TechnicalReports/TUD-SERG-2007-003.pdf"&gt;technical report&lt;/a&gt;. Last week, we submitted this paper to the &lt;a href="http://www.usenix.org/events/sec07/index.html"&gt;USENIX Security Symposium&lt;/a&gt;. We won't know if the paper is accepted until April 4, but I would be flabbergasted if it got rejected ;) . Our prototype implementation, called &lt;a href="http://www.stringborg.org"&gt;StringBorg&lt;/a&gt;, is available as well. I'm looking forward to your feedback and opinions. I'll add some examples to the webpage later this week, so make sure to come back!
&lt;/p&gt;

&lt;p&gt;
&lt;a href="http://blogs.sun.com/ahe"&gt;Peter Ah&amp;eacute;&lt;/a&gt; already has a general solution for embedding foreign languages on his &lt;a href="http://blogs.sun.com/ahe/entry/java_se_7_wish_list"&gt;wish list&lt;/a&gt; (as opposed to an XML specific solution), so could this actually be happening in the near future?
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-6325466175191844697?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/6325466175191844697/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=6325466175191844697' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/6325466175191844697'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/6325466175191844697'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2007/02/our-take-on-injection-attacks.html' title='Our take on injection attacks'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-3867944523659561926</id><published>2007-02-04T22:41:00.000+01:00</published><updated>2007-02-04T22:53:54.657+01:00</updated><title type='text'>Some random thoughts on the complexity of syntax</title><content type='html'>&lt;p&gt;
For some reason I was invited to join the &lt;a href="https://lists.csail.mit.edu/mailman/listinfo/jsr308"&gt;JSR-308 mailing list&lt;/a&gt;, which is a public mailing list discussing &lt;a href="http://jcp.org/en/jsr/detail?id=308"&gt;JSR-308&lt;/a&gt;. I've reported some problems with the Java grammar of the Java Language Specification in the past, so maybe I'm now on some list of people that might be interested. I'm not sure if I will contribute to the discussion, but at least lurking has been quite interesting. If you're not familiar with the proposal, JSR-308 has been started to allow annotations at more places in a Java program. The title of the JSR is "Annotations on Java Types", but the current discussion seems to interpret the goal of the JSR a bit more ambitiously, since there is a lot of talk going on about annotations of statements, and even expressions. I don't have a particularly strong opinion on this, but it's interesting to observe how the members of the list are approaching this change in the language.
    &lt;/p&gt;

    &lt;p&gt;
&lt;a href="http://www.gafter.com/~neal/"&gt;Neal Gafter&lt;/a&gt; seems to represent the realistic camp in the discussion (not to call his opinion conservative). Neal was once the main person responsible for the Java Compiler at Sun, so you can safely assume that he knows what he's talking about. Together with Joshua Bloch, he is now mainly responsible for the position of Google in these Java matters. Last week, he sent another interesting message to the list: &lt;a href="https://lists.csail.mit.edu/pipermail/jsr308/2007-February/000083.html"&gt;Can we agree on our goals?&lt;/a&gt;. As I mentioned, I don't have a very strong opinion on what the goal of the JSR should be, but Neal raised a point about syntax that reminded me again of some thoughts on syntax that have been lingering in my mind for some time now. Neal wrote:
    &lt;/p&gt;

    &lt;blockquote&gt;
&lt;em&gt;"I think the right way to design a language with a general annotation facility is to support (or at least consider supporting) a way of annotating every semantically meaningful nonterminal. Doing that requires a language design with a very simple syntax. Java isn't syntactically simple, and I don't think there is anything we can do it make it so. If we wanted to take this approach with Java we'd have to come up with a syntactic solution for every construct that we want to be annotatable. Given the shape of the Java grammar, that solution would probably be a different special case for every thing we might want to annotate."&lt;/em&gt;
    &lt;/blockquote&gt;

    &lt;p&gt;
Whether you like it or not, this is a most valid concern. The interesting point about this annotation thing is that it is a language feature that applies in a completely different way to existing language constructs. Adding an expression, a statement, or some modifier to the language is not difficult to do, since this adds only an &lt;em&gt;alternative&lt;/em&gt; to the existing structure of the language. Annotations, on the other hand, do not add just another alternative, but crosscut (sorry, I couldn't avoid the term) the language. If you are an annotation guy, then you want to have them everywhere, since you essentially want to add information to arbitrary language constructs. Now this is quite a problem if you have a lot of language constructs, not alternative language constructs, but distinct &lt;em&gt;kinds&lt;/em&gt; of language constructs (of course known as nonterminals to grammar people). This would be trivial to do in language where there are not that many language constructs, such as Lisp and Scheme, and even model-based languages.
    &lt;/p&gt;

    &lt;p&gt;
This makes you wonder what is a good language syntax. Should adding such a crosscutting language feature be easy?  Conceptually, it is beyond any doubt attractive to have a limited number of language constructs, but on the other hand it is very convenient that Java has this natural syntax for things like modifiers, return types, formal parameters, formal type parameters, throws clauses, array initializers, multiple variable declarations at the same line, and so on. If you want to add annotations to all these different language constructs, then you basically have to &lt;em&gt;break&lt;/em&gt; their abstraction, which suddenly makes them look unnatural, since it becomes clear that a syntactical construct that used to be easy to read, has some explicit semantic meaning. That is, the entire structure of the language is exposed in this way. It is no longer possible to &lt;em&gt;read&lt;/em&gt; a program, abstracting over all the details of the language. Also, for several locations it is very unclear to the reader where an annotation refers to. For example, the current &lt;a href="http://pag.csail.mit.edu/jsr308/java-annotation-design.html"&gt;draft specification&lt;/a&gt; states that
    &lt;/p&gt;

    &lt;blockquote&gt;
&lt;em&gt;
"There is no need for new syntax for annotations on return types, because Java already permits an annotation to appear before a method return type. Currently, such annotations are interpreted as on the method declaration — for example, the @Deprecated annotation indicates that the method is deprecated. The person who defines the annotation decides whether an annotation that appears before the return value applies to the method declaration or to the return type.
      &lt;/em&gt;
    &lt;/blockquote&gt;

    &lt;p&gt;
Clearly, there is a problem in this case, since an annotation in the header of the method could refer to several things. The reason for this, is the syntactical conciseness of the language for method declarations: you don't have to identify every part explicitly, hence if you want to annotate some part only, then you have a problem. Moving that decision to the declaration side of the annotation is a not an attractive solution, for example there will be annotations that are applicable to both declarations and types.
    &lt;/p&gt;

    &lt;p&gt;
This all brings us to the question how to determine if a syntax of a programming language is simple? Is that really just some subjective idea, or is it possible to determine this semi-automatically with more objective methods?  I assume that the answer depends on the way the language is applied. For example, in program transformation it is rather inconvenient to have all kinds of optional clauses for a language construct. This reminds me of a &lt;a href="http://blog.nicksieger.com/articles/2006/10/27/visualization-of-rubys-grammar"&gt;post&lt;/a&gt; by Nick Sieger, who applied a visualization tool to some grammars. For some reason, this post was very popular and was discussed all over the web, including &lt;a href="http://lambda-the-ultimate.org/node/1849"&gt;Lambda the Ultimate&lt;/a&gt; and &lt;a href="http://lwn.net/Articles/206533/"&gt;LWN&lt;/a&gt;. However, most people seemed to agree that the visualizations did not tell much about the complexity of the languages. Indeed, the most visible aspects of the pictures are the &lt;em&gt;encodings&lt;/em&gt; of the actual grammar that had to be applied to make the grammar non-ambiguous or to fit in the used grammar formalism. For example, the encoding of precedence rules for expressions makes the graph look pretty, but conceptually this is just a single expression. As a first guess, I would expect that some balance between the number of nodes and edges would be a better measurement: lots of edges to a single node, means that nonterminal is allowed at a lot of places, which is probably good for the orthogonality of the language (more people have been claiming this in the discussion about these visualizations).
    &lt;/p&gt;

    &lt;p&gt;
But well, this makes you wonder if there has been any research on this. The only work I'm familiar with is &lt;a href="http://wiki.di.uminho.pt/wiki/bin/view/PURe/SdfMetz"&gt;SdfMetz&lt;/a&gt;, which is a metrics tool for &lt;a href="http://www.syntax-definition.org"&gt;SDF&lt;/a&gt; developed by &lt;a href="http://wiki.di.uminho.pt/twiki/bin/view/Joost"&gt;Joost Visser&lt;/a&gt; and &lt;a href="http://wiki.di.uminho.pt/wiki/bin/view/Main/TiagoAlves"&gt;Tiago Alves&lt;/a&gt;. SDF grammars are usually closer to the intended design of a language than LR or LL grammars, so if you are interested in the complexity of the syntax of a language, then using SDF grammars sounds like a good idea. SdfMetz supports quite an interesting list of metrics. Some are rather obvious (count productions etc), but there are also some more complex metrics. I'm quite sure that (combinations of) these metrics can give some indication of the complexity of a language. Unfortunately, the work on SdfMetz was not mentioned at all in the discussion of these visualizations. Why is it that a quick and dirty blogpost is discussed all over the web and solid research does not get mentioned? Clearly, the SdfMetz researchers should just post a few fancy pictures for achieving instant fame ;) . Back to the question what is a good syntax, they have mostly focused on the facts until now (see their paper &lt;a href="http://wiki.di.uminho.pt/wiki/pub/PURe/PurePublications/DI-PURe-05-05-01.pdf"&gt;Metrication of SDF Grammars&lt;/a&gt;), and have not done much work on &lt;em&gt;interpreting&lt;/em&gt; the metrics they have collected. It would be interesting if somebody would start doing this.
    &lt;/p&gt;

    &lt;p&gt;
Joost Visser and Tiago Alves will be presenting SdfMetz at &lt;a href="http://www.di.uminho.pt/ldta07/"&gt;LDTA 2007&lt;/a&gt;, the Workshop on Language Descriptions, Tools and Application (program now available!). As I mentioned in my previous post, we will be presenting our work on precedence rule recovery and compatibility checking there as well. So, if you are in the neighbourhood (or maybe even visiting ETAPS), then make sure to drop by if you are interested!
    &lt;/p&gt;

    &lt;p&gt;
Another thing that finally seems to get some well-deserved attention is ambiguity analysis. It strikes me that the people on the JSR-308 list approach this rather informally, by just guessing what might be ambiguous or not. It should be much easier to play with a language and determine how to introduce a new feature in a non-ambiguous way. &lt;a href="http://www.i3s.unice.fr/~schmitz/"&gt;Sylvian Schmitz&lt;/a&gt; will be presenting a Bison-based ambiguity detection tool at LDTA, so that should be interesting to learn about. The paper is already online, but I haven't read it yet. Maybe I'll report about it later.
    &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-3867944523659561926?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/3867944523659561926/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=3867944523659561926' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/3867944523659561926'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/3867944523659561926'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2007/02/some-random-thoughts-on-complexity-of.html' title='Some random thoughts on the complexity of syntax'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-4623110381714576397</id><published>2007-01-25T10:12:00.000+01:00</published><updated>2007-04-03T16:01:26.458+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='grammar engineering tools'/><title type='text'>Grammar Engineering, I'm loving it</title><content type='html'>&lt;p&gt;
I think that the most attractive thing to work on as a researcher are the problems that you actually encounter yourself. Obviously, it is an option to try to encounter these problems by studying or writing code that you are not actually directly interested in, but it is much more fun if you can work on issues in code that you just write out of your own interest.
    &lt;/p&gt;

    &lt;p&gt;
      This is what we've done in our latest paper, titled "Grammar Engineering Support for Precedence Rule Recovery and Compatibility Checking". Most of our work involves syntax definitions, for example to provide general support for the implementation of program transformations and also for our research on embedding and composing languages (for various applications). One of the problems we encounter is that the conversion of a grammar from one grammar formalism to another is rather unreliable. For example, if you need to convert a grammar from YACC to SDF, then you basically have no idea if the two grammars are compatible. For programs in general, this is understandable, since imperative source code are very difficult to compare. But, if you have a more or less declarative specification of a grammar, how is it possible that you cannot compare them at all?
    &lt;/p&gt;

    &lt;p&gt;
      As a first step towards supporting grammar compatibility checking, we have implemented a tool that compares the precedence rules of grammars. A very simple example of a precedence rule is that &lt;code&gt;1 + 2 * 3&lt;/code&gt; should be parsed as &lt;code&gt;1 + (2 * 3)&lt;/code&gt;. The precedence rules of a grammar might look like a trivial property at first sight, but actually it is rather complex to understand as a human what the precedence rules of a YACC or SDF grammar are. This tool has already been most successful for comparing existing C grammars written in YACC and SDF and deriving the exact precedence rules of PHP, which has quite a bizarre expression language.
    &lt;/p&gt;

    &lt;p&gt;
      The paper has been accepted for &lt;a href="http://www.di.uminho.pt/ldta07/"&gt;LDTA 2007&lt;/a&gt;, the Workshop on Language Descriptions, Tools and Applications, which is an excellent place for this subject. We will present our work at this workshop at the end of March. A draft version of the paper is available from the &lt;a href="http://martin.bravenboer.name/publications.html"&gt;publication list&lt;/a&gt; at my homepage. The implementation is available as part of the &lt;a href="http://www.stratego-language.org/Stratego/GrammarEngineeringTools"&gt;Stratego/XT Grammar Engineering Tools&lt;/a&gt;. The website includes a bunch of examples. In the future, we hope to provide more tools to assist with the maintenance, testing, conversion, and analysis of grammars. In fact, Stratego/XT itself already contains some interesting tools for this, most prominently the grammar unit-testing tool &lt;code&gt;parse-unit&lt;/code&gt;.
    &lt;/p&gt;

    &lt;p&gt;
      Now I think of it, it's probably a bad idea as a vegetarian to paraphrase a campaign by McDonald's ...
    &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-4623110381714576397?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/4623110381714576397/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=4623110381714576397' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/4623110381714576397'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/4623110381714576397'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2007/01/grammar-engineering-im-loving-it.html' title='Grammar Engineering, I&apos;m loving it'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-115436388271407790</id><published>2006-07-31T18:20:00.000+02:00</published><updated>2006-07-31T18:40:54.660+02:00</updated><title type='text'>Java-front versus Jackpot, APT, Eclipse JDT</title><content type='html'>&lt;p&gt;
Recently, there have been a few interesting developments in standard support for open compilers and program transformation. For example, Sun released the annotation processing tools (APT) as part of the JDK5, which opens up Sun's Java compiler a bit. Also, there is &lt;a href="http://jackpot.netbeans.org/"&gt;Jackpot&lt;/a&gt;, a plugin for Netbeans for transformating Java code. The obvious question is how this relates to work that has been done in research on open compilers and program transformation.
&lt;/p&gt;

&lt;p&gt;
Olivier Lefevre sent me an email to ask how the tree for Java provided by Jackpot and javac compares to our support for parsing and transforming Java in &lt;a href="http://www.stratego-language.org/Stratego/JavaFront"&gt;Java-front&lt;/a&gt;. The answer is probably useful in general, so I'll quote it here. Feel free to share your opinion in the comments!
&lt;/p&gt;

&lt;blockquote&gt;
As you may know, starting with Java 6 the Sun JDK will ship with an API to the AST: see &lt;a href="http://jackpot.netbeans.org/docs/org-netbeans-libs-javacapi/overview-summary.html"&gt;jackpot api&lt;/a&gt;
&lt;/blockquote&gt;

&lt;p&gt;
Yes, Jackpot and APT are great projects. However, there is not yet a full API to the AST in the standard JDK, afaik. The compiler will be more 'open' in two different ways.
&lt;/p&gt;

&lt;p&gt;
First, the current annotation processing tool (APT) is going to be combined with javac, but APT only provides access to the global structure of a Java source file and does not include the statement and expression level. Also, this API does not allow modification of the Java representation. APT is read-only: you can only generate new code.
&lt;/p&gt;

&lt;p&gt;
Second, there is Jackpot, which is a rule-based language for transforming Java code. For Jackpot, the representation of Java used by javac has been opened and cleaned up a bit to make it more usable in external tools. However, this representation is not standardized and Sun recommends not to use stuff from com.sun.*. Afaik, Jackpot will be shipped as part of NetBeans and not as part of the JDK.
&lt;/p&gt;

&lt;blockquote&gt;How does this compare to Java-front?&lt;/blockquote&gt;

&lt;p&gt;
That's a good question. The answer depends on the application.
&lt;/p&gt;

&lt;p&gt;
If you just need an AST for Java, then the advantage of the com.sun.source.tree AST is that you are absolutely sure that the AST conforms to javac, since the implementation is exactly the same. Of course, the same holds for ecj and the AST of Java that is provided by Eclipse (org.eclipse.jdt.core.dom.*). However, the grammar provides by Java-front is very good, so I don't expect any parsing problems. It has been tested and used heavily in the last few years and the development of this grammar has even resulted in a number of fixes in the JLS.
&lt;/p&gt;

&lt;p&gt;
An advantage of Java-front is that it is a bit more language independent. Obviously, the Eclipse and Javac ASTs are to be used in Java. If you want to implement a transformation of Java in a different language, then you have to write an exporter. Java-front outputs ASTs in a language independent exchange format (ATerms), which can also be converted to XML. Of course, Java-front is most useful if you combine it with a language that is designed for program transformation and operates on ATerms, such as Stratego. One of the biggest advantages of Stratego is that it is very easy to do traversals over the AST: no tiresome visitors.
&lt;/p&gt;

&lt;p&gt;
If you need more information about Java than can be defined in a context-free grammar, then you need more than just a parser. For more complex transformations (which includes simple refactorings), you'll probably need an implementation of disambiguation (reclassification) and qualification of names. A simple statement like System.out.println is already highly ambiguous with an analysis: is System a variable? a class? a package? Is out an inner class? a field? Most likely, you'll need type information as well. Java and Eclipse have the major advantage that you can safely assume that their type checkers are pretty good. For Jackpot, I suppose that there is some way to get type information (since type information can be used in Jackpot), but I from a quick scan I cannot figure out how to do this from the public
API. For Java-front, there is an extension (&lt;a href="http://www.stratego-language.org/Stratego/TheDryad"&gt;Dryad&lt;/a&gt;) that supports type-checking and disambiguation, but this work is not yet complete. Using an existing compiler is of course a safer alternative. For experiments, the stuff provided by Dryad should be ok (we use it in our course on program transformation).
&lt;/p&gt;

&lt;p&gt;
A different application is the implementation of Java language extensions. Javac and ECJ do not support this. The Java representation is open, but not extensible. Java-front uses a modular syntax definition formalism (SDF) that allows you to extend the grammar of Java in an almost trivial way. The strength of this approach is illustrated by the embedding of the Java syntax in Stratego (&lt;a href="http://www.stratego-language.org/Stratego/MetaProgrammingWithConcreteObjectSyntax"&gt;GPCE '02&lt;/a&gt;) and Java (&lt;a href="http://www.cs.uu.nl/wiki/Visser/GeneralizedTypeBasedDisambiguationOfMetaProgramsWithConcreteObjectSyntax"&gt;GPCE '05&lt;/a&gt;), the applications of the grammar in &lt;a href="http://www.stratego-language.org/Stratego/ConcreteSyntaxForObjects"&gt;MetaBorg&lt;/a&gt; (OOPSLA '04), and the modular extension of the grammar for the definition of the &lt;a href="http://www.stratego-language.org/Stratego/AspectJFront"&gt;AspectJ syntax&lt;/a&gt; (OOPSLA '06). Of course, these applications are not really interesting if you are just interested in a Java program transformation tool, but it illustrates the reusability of such a syntax definition (as opposed to the grammars used by ecj, javac and most other parser generators). You'll need tools for pretty-printing as well. Outside of Eclipse, pretty-printing the JDT Core DOM is troublesome and mostly useful for debugging the output only. Inside Eclipse, the support for pretty-printing and preserving the layout of a program is of course excellent (see the existing implementations of refactoring). Jackpot provides a pretty-printer as well, but I don't know if it can be used outside NetBeans. Java-front provides the tool pp-java, which has been heavily tested and can insert parentheses in exactly the right places.
&lt;/p&gt;

&lt;blockquote&gt;I am interesting in implementing small refactorings.&lt;/blockquote&gt;

&lt;p&gt;
If you want to implement solid refactorings that could eventually even be deployed, then I would suggest to use an existing framework for refactoring, since there is much more to do than just getting an AST. A few years ago, I implemented an extract method refactoring in JRefactory, which was quite a useful experience. I suppose it's a bit obsolete now, since the refactoring market is dominated by refactorings directly supported by IDEs. You could consider Eclipse or NetBeans.
&lt;/p&gt;

&lt;p&gt;
If your objective is to play a bit with program transformations and maybe even be a bit more adventurous by using real program transformation languages, then it might be nice to use Java-front and Stratego. Using Stratego is a major advantage of the tiresome implementation of traversals in Java (and most other languages).
&lt;/p&gt;

&lt;p&gt;
Hope this helps :) Feel free to ask more questions if anything is unclear :)
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-115436388271407790?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/115436388271407790/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=115436388271407790' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/115436388271407790'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/115436388271407790'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2006/07/java-front-versus-jackpot-apt-eclipse.html' title='Java-front versus Jackpot, APT, Eclipse JDT'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-114540064937940124</id><published>2006-04-19T00:49:00.000+02:00</published><updated>2006-04-19T12:32:53.130+02:00</updated><title type='text'>On the Details of Protected Access in Java</title><content type='html'>&lt;p&gt;
    When I was working on my master thesis I spent most of my time in
    the Software Technology Lab of our department. This was really a
    gorgeous place to work, mostly due to a group of great fellow
    students. After finishing our master study, we all left this lab a
    few years ago and have a job now. One of the students I worked
    with first went to a research institute to work on maritime
    simulations and is now moving to a company that produces Real
    Software.
  &lt;/p&gt;

  &lt;p&gt;
    Of course, the problem is that universities do a terrible job at
    learning students how to write Real Software, so the first thing
    these companies do is give their employees some proper
    education. Real Software is written in Java, so the first step is
    to become a &lt;a href="http://www.sun.com/training/certification/java/java_progj2se.html"&gt;Sun
    Certified Programmer for the Java 2 Platform&lt;/a&gt;, aka SCJP.
  &lt;/p&gt;

  &lt;p&gt;
    This guy is now going through the SCJP materials, which covers
    Java at a surprising level of detail. Universities do at least one
    thing right: they stimulate you to think about the things you
    learn, instead of just accepting it all as-is. As a result, we
    have had some nice discussions about the details of Java, and he
    has only just started ;) .
  &lt;/p&gt;

  &lt;p&gt;
    Last week, we started talking about the details of protected
    access in Java. The study material seems to mention the detailed
    rules, but does not explain why the rules are the way they
    are. Strange enough, I could not find any good resource that does
    this, and I couldn't remember where I learned about this, so I
    decided to explain it myself.
  &lt;/p&gt;

  &lt;h2&gt;Protected Access&lt;/h2&gt;

  &lt;p&gt;
    The Java Language Specification has two subsections that define
    the rules for accessibility. The general rules are in subsection
    &lt;a href="http://java.sun.com/docs/books/jls/second_edition/html/names.doc.html#102765"&gt;6.6.1:
    Determining Accessibility&lt;/a&gt; and some of the more complicated
    rules for protected access are in subsection &lt;a href="http://java.sun.com/docs/books/jls/second_edition/html/names.doc.html#62587"&gt;6.6.2:
    Details on Protected Access&lt;/a&gt;. The rules of the first subsection
    are rather straightforward and don't need much explanation.
  &lt;/p&gt;
  &lt;p&gt;
    For protected members and constructors this subsection defines
    that the members are accessible if the access occurs from within
    the same package. This rule is clear, although many people seem to
    be surprised by this. However, the second case refers to
    subsection 6.6.2, where most of the confusion is about. This
    subsection defines the additional accessibility rules for
    protected access, to which most people are informally familiar as
    &amp;ldquo;a protected member is accessible from subclasses&amp;rdquo;. A
    simple example where this accessibility rule is applied:
  &lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
package a;

public class A {
  protected int secret;
}
&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;pre&gt;
package b;

public class B extends a.A {
  void f() {
    secret = 5;
  }
}
&lt;/pre&gt;
  &lt;/blockquote&gt;

  &lt;p&gt;
    However, the rules that define accessibility of protected members
    from subclasses are a bit more complex than you might expect. The
    problem is that the rule you know informally, is not that clear
    anymore if the access of the protected member is qualified
    (i.e. is applied to an object, not implicitly to
    &lt;code&gt;this&lt;/code&gt;). Consider this simple example:
  &lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
package a;

public class A {
  protected int secret;
}
&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;pre&gt;
package b;

public class B2 extends a.A {
  void f(a.A a) {
    a.secret = 5;
  }
}
&lt;/pre&gt;
  &lt;/blockquote&gt;

  &lt;p&gt;
    In this example, the access to the protected instance field secret of A
    occurs from a subclass B of A, so according to our informal idea
    of protected access, this should be allowed. However, this example
    should make you feel a bit uncomfortable. Indeed, this is not
    allowed in Java. Let's take a look at what the compilers say:
  &lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
$ javac b/B2.java
b/B2.java:5: secret has protected access in a.A
    a.secret = 5;
&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;pre&gt;
$ jikes b/B2.java
Found 1 semantic error compiling "b/B2.java":

     5.     a.secret = 5;
              ^----^
&lt;/pre&gt;
&lt;code&gt;
*** Semantic Error: The instance field "secret" in class "A" has
protected access, but the qualifying expression is not of type "B2" or
any of its enclosing types.
&lt;/code&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;pre&gt;
$ ecj b/B2.java
----------
1. ERROR in b/B2.java
 (at line 5)
        a.secret = 5;
        ^^^^^^^^
The field A.secret is not visible
----------
&lt;/pre&gt;
  &lt;/blockquote&gt;

  &lt;blockquote&gt;
    &lt;p style="font-style: italic; font-size: small;"&gt;
      Side note: I was a bit surprised by the error report of
      ecj. This could be a bug: the protected field &lt;em&gt;is&lt;/em&gt;
      visible but it is not &lt;em&gt;accessible&lt;/em&gt;. The error report of
      jikes is by far the best.
    &lt;/p&gt;
  &lt;/blockquote&gt;

  &lt;p&gt;
    Basically, if this access would be allowed, then you can access
    any protected field of any class, by just making a subclass of the
    class that declares the protected field. Hence, you could never
    &lt;em&gt;really&lt;/em&gt; protect your protected fields if this kind of
    access would be allowed. Consider this example:
  &lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
package a;

public class A {
  protected int secret;
}
&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;pre&gt;
package b;

public final class MySecurityHazard extends a.A {
}
&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;pre&gt;
package c;

public class C extends a.A {
  void f(b.MySecurityHazard b) {
    b.secret = 5;
  }
}
&lt;/pre&gt;
  &lt;/blockquote&gt;

  &lt;p&gt;
    In this example the class MySecurityHazard has deliberately been
    declared to be &lt;code&gt;final&lt;/code&gt; to avoid that the sensitive
    fields of this class can be accessed. However, according to our
    (now deprecated) informal knowledge of protected access, we can
    just create another subclass &lt;code&gt;C&lt;/code&gt; of &lt;code&gt;A&lt;/code&gt; that
    can be used to access the protected fields of
    &lt;code&gt;MySecurityHazard&lt;/code&gt;.
  &lt;/p&gt;

  &lt;p&gt;
    How can we define the protected access that we would like to have?
    Of course, qualified access to protected members could be
    forbidden completely (and maybe that would have been a good idea),
    but Java is a bit more flexible, without introducing security
    problems. The basic problem of the unwanted access is that you
    start a new inheritance branch and access protected fields from
    there. So, the qualified access to protected fields should be
    restricted to the same inheritance branch as the object to which
    it is applied. This is exactly what the details of protected
    access are about. They define that access from a class
    &lt;code&gt;S&lt;/code&gt; is permitted only if the type of the qualifier is
    &lt;code&gt;S&lt;/code&gt; or a subclass of &lt;code&gt;S&lt;/code&gt;. Let me now finally
    quote the specification:
  &lt;/p&gt;

  &lt;blockquote&gt;
    &lt;p&gt;
      Let &lt;i&gt;C&lt;/i&gt; be the class in which a &lt;code&gt;protected&lt;/code&gt;
      member m is declared. Access is permitted only within the body
      of a subclass &lt;i&gt;S&lt;/i&gt; of &lt;i&gt;C&lt;/i&gt;. In addition, if &lt;i&gt;Id&lt;/i&gt;
      denotes an instance field or instance method, then:
    &lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;
 If the access is by a qualified name
 &lt;i&gt;Q&lt;/i&gt;&lt;code&gt;.&lt;/code&gt;&lt;i&gt;Id&lt;/i&gt;, where &lt;i&gt;Q&lt;/i&gt; is an
 &lt;em&gt;ExpressionName&lt;/em&gt;, then the access is permitted if and
 only if the type of the expression &lt;i&gt;Q&lt;/i&gt; is &lt;i&gt;S&lt;/i&gt; or a
 subclass of &lt;i&gt;S&lt;/i&gt;.
      &lt;/li&gt;
      &lt;li&gt;
 If the access is by a field access expression
 &lt;i&gt;E&lt;/i&gt;&lt;code&gt;.&lt;/code&gt;&lt;i&gt;Id&lt;/i&gt;, where &lt;i&gt;E&lt;/i&gt; is a
 &lt;em&gt;Primary&lt;/em&gt; expression, or by a method invocation
 expression
 &lt;i&gt;E&lt;/i&gt;&lt;code&gt;.&lt;/code&gt;&lt;i&gt;Id&lt;/i&gt;&lt;code&gt;(&lt;/code&gt;. . .&lt;code&gt;)&lt;/code&gt;,
 where &lt;i&gt;E&lt;/i&gt; is a &lt;em&gt;Primary&lt;/em&gt; expression, then the
 access is permitted if and only if the type of &lt;i&gt;E&lt;/i&gt; is
 &lt;i&gt;S&lt;/i&gt; or a subclass of &lt;i&gt;S&lt;/i&gt;.
      &lt;/li&gt;
    &lt;/ul&gt;
    &lt;p style="text-align: right; font-size: small; margin-bottom: 0pt;"&gt;
      See &lt;a href="http://java.sun.com/docs/books/jls/second_edition/html/names.doc.html#62587"&gt;JLS3,
      Section 6.6.2.1: Access to a protected Member&lt;/a&gt;
    &lt;/p&gt;
  &lt;/blockquote&gt;

  &lt;p&gt;
    Note that these more specific rules only apply to instance
    members, not to static ones. If you are using a protected static
    field or method, make sure that you understand what you're doing:
    anyone will be able to access the field or method by just creating
    another subclass of the class that declares the protected field or
    method. It would not make any sense to impose these additional
    rules on static members, since all subclasses actually share the
    static member! So, while protected access on instance members
    could be used for security reasons, protected access on static
    members is only useful for hiding the member by restricting its
    accessibility to a part of a program where it is relevant.
  &lt;/p&gt;

  &lt;p&gt;
    If you read the rules carefully, then they are not too
    unclear. The real source of confusion seems to be that there is no
    motivation why these more complex rules are necessary. I hope this
    blog has solved that.
  &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-114540064937940124?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/114540064937940124/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=114540064937940124' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/114540064937940124'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/114540064937940124'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2006/04/on-details-of-protected-access-in-java.html' title='On the Details of Protected Access in Java'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-112301956666843956</id><published>2005-08-02T23:51:00.000+02:00</published><updated>2005-08-02T23:52:46.676+02:00</updated><title type='text'>Lifting Member Classes from Generic Classes</title><content type='html'>&lt;p&gt;
  I've been working of Java generics and member classes during the
  last few weeks. In particular, I had to find out how the additional
  information on Java generics is exactly represented in bytecode
  attributes of generic classes and methods (aka generic
  signatures). I was surprised by the way member classes of generic
  classes are compiled and I'm worried about the consequences of this
  for future updates of the JVM specification. That's what this entry
  is about.
&lt;/p&gt;

&lt;p&gt;
  First, something about the relation between member classes and
  lambda lifting. The Java language supports member classes, but Java
  bytecode does not. Therefore, Java compilers have to lift member
  classes to top-level classes, a transformation that is comparable to
  lambda lifting (see for example the paper &lt;a href="http://danae.uni-muenster.de/lehre/kuchen/JFLP/articles/2004/A2004-01/JFLP-A2004-01.pdf"&gt;Lambda-Lifting in Quadratic Time&lt;/a&gt;).
&lt;/p&gt;

&lt;p&gt;
  Member classes are compiled to ordinary top-level classes where the
  constructor takes an extra argument for the instance of the
  enclosing class. For example, the constructor a member class Bar of
  class Foo will get an additional argument of type Foo for the
  enclosing instance of a Bar object. Constructors of local classes
  (classes declared in a method) also take additional arguments for
  the local variables that it uses from its enclosing method. This
  process of lifting classes (that have lexical scope) is very similar
  to lifting nested functions in lambda lifting: all local variables
  that are used in the nested class become explicit arguments to make
  the nested class &lt;em&gt;scope insensitive&lt;/em&gt;. After that, the class
  can simply be lifted out of its original scope to the top-level. An
  essential property of the class (or function) after lifting is that
  the nested class (or function) no longer directly refers to
  variables of the original scope of the nested class or function.
&lt;/p&gt;

&lt;p&gt;
  In Java 5.0, parameterized types and methods (aka generics) have
  been introduced. In combination with member classes, this raises the
  question how &lt;em&gt;type&lt;/em&gt; variables should be handled when lifting
  member classes. From the source code point of view, this is pretty
  obvious:
&lt;/p&gt;

&lt;pre&gt;
class Foo&amp;lt;A&gt; {
  class Bar {
    A get() { ... }
  }
}
&lt;/pre&gt;

&lt;p&gt;
  If the class Bar is lifted, then its constructor gets an additional
  parameter for the enclosing Foo instance. This Foo instance is
  parameterized using a type variable A, so the lifted class Bar
  should also be parameterized with a type: the type parameter of its
  enclosing instance. This lifting of type parameters is comparable to
  the lifting of parameters for normal variables. So, the result of
  source-level lifting the Bar class could be:
&lt;/p&gt;

&lt;pre&gt;
class Foo&amp;lt;A&gt; {
}

class Bar&amp;lt;A&gt; {
  private final Foo&amp;lt;A&gt; _enclosing;

  public Bar(Foo&amp;lt;A&gt; enclosing) {
    _enclosing = enclosing;
  }

  A get() { ... }
}
&lt;/pre&gt;

&lt;p&gt;
  Indeed, the Eclipse implementation of the refactoring &lt;em&gt;"Move
  Member Type to New File"&lt;/em&gt; adds the type parameter to the lifted
  class (thumbs up for the generics support in Eclipse!).
&lt;/p&gt;

&lt;p&gt;
  So, what happens in Java bytecode? Should the lifted class have a
  type parameter? Should the lifted class be a valid class,
  generically speaking? (of course, a JVM is currently not required to
  understand generics related information in bytecode).
&lt;/p&gt;

&lt;p&gt;
  Well, the lifted class does not have type parameter and, generically
  speaking, it is not a valid class. Let's take a look at the
  bytecode, represented in a structured way as an aterm, produced by a
  tool called class2aterm (I use ... to leave out some details and //
  to explain what the code means. The full aterm is available &lt;a href="http://www.cs.uu.nl/people/martin/FooBar.aterm"&gt;here&lt;/a&gt;)
&lt;/p&gt;

&lt;pre&gt;
$ class2aterm -i Foo\$Bar.class --parse-sig | pp-aterm
ClassFile(
  ...
  // field for the enclosing Foo instance.
  Field(
    AccessFlags([Final, Synthetic])
  , Name("this$0")
  , FieldDescriptor(ObjectType("Foo"))
  , Attributes([])
  )
  ...

  // constructor, taking a Foo argument.
  Method(
    AccessFlags([])
  , Name("&amp;lt;init&gt;")
  , MethodDescriptor([ObjectType("Foo")], Void)
  , Attributes([])
  )
  ...

  // get method with a generic signature
  Method(
    AccessFlags([])
  , Name("get")
  , MethodDescriptor([], ObjectType("java.lang.Object"))
  , Attributes(
      [ MethodSignature(
          TypeParams([])
        , Params([])
        , Returns(TypeVar(Id("A")))
        , Throws([])
        )
      ]
    )
  )
  ...

  // attributes of the class Bar
  Attributes(
    [ SourceFile("Foo.java")
    , InnerClasses( ... )
    ]
  )
  ...
)
&lt;/pre&gt;

&lt;p&gt;
  This disassembled class file reveals some interesting details about
  the way nested classes are lifted:
&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    The lifted class Bar is not parameterized: it has no
    ClassSignature attributed, which should be there if the class
    takes formal type parameters.
  &lt;/li&gt;

  &lt;li&gt;
    The field for the enclosing class does not have a parameterized
    type. Its type is the &lt;em&gt;raw&lt;/em&gt; type Foo!
  &lt;/li&gt;

  &lt;li&gt;
    The constructor of Bar (the method name &amp;lt;init&gt;) has no generic
    signature and takes a raw type Foo as an argument.
  &lt;/li&gt;

  &lt;li&gt;
    The get method &lt;em&gt;does&lt;/em&gt; have a generic signature, which
    describes that the method returns a type variable A.
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
  Of course, all the information of the original source can be
  reconstructed by a tool that knows about member classes &lt;em&gt;and&lt;/em&gt;
  generics. But, to a tool that only knows about generics, this code
  would be considered incorrect. Hence, if the virtual machine would
  support generics in the future (which is an option explicitly left
  open), then this code would be incorrect! The type variable
  mentioned in the generic signature of the get method is &lt;em&gt;not in
  scope&lt;/em&gt;. Hence, the JVM would be required to have knowledge of
  inner classes as well as generics to be able to find out what type
  parameter this type variable refers to. Unless, of course, the
  bytecode format is changed, which will still make it impossible to
  run code compiled to the current bytecode format under the new JVM,
  which has always been a important requirement for Sun when working
  on extensions of the Java platform (language and virtual machine).
&lt;/p&gt;

&lt;p&gt;
  Furthermore, the type variable in the signature of the get method is
  not qualified.  Every single name in Java bytecode is fully
  qualified, which is very useful for tools that need to work on
  bytecode: they don't have to name analysis to find out to what
  construct a name refers.  Type variables are not qualified, which
  complicates the analysis that has to be performed by a tool that
  operates on bytecode. Not only can this type variable refer to type
  parameters of arbitrary enclosing classes, it could also refer to
  type parameters of enclosing generic methods (for local classes or
  member classes in local classes).
&lt;/p&gt;

&lt;p&gt;
  The fact that type variables in bytecode are not qualified is
  already quite annoying without considering member classes. In the
  Java language, it is allowed to redeclare type variables. For
  example:
&lt;/p&gt;

&lt;pre&gt;
class Foo&amp;lt;A&gt; {
  &amp;lt;A&gt; void foo(A x) {
  }
}
&lt;/pre&gt;

&lt;p&gt;
  In this example the type parameter A of the foo method is a
  different type parameter then the A parameter of the class Foo. This
  basically means that a bytecode processing tool with knowledge of
  generics has to do name analysis, which is definitely not something
  that is desirable for a bytecode format. Introducing canonical,
  fully qualified names for type variables would solve this.
&lt;/p&gt;

&lt;p&gt;
  As you might know, I'm working on semantic analysis for Java in the
  context of the &lt;a href="http://www.strategoxt.org"&gt;Stratego/XT&lt;/a&gt;
  project. My goal is to make it possible to define program
  transformations in Stratego at the semantic level: in program
  transformations consider the actual meaning of names, types of
  expressions, and so on, without requiring the programmer to redo the
  semantic analysis, which is quite complex for a 'real' language like
  Java. Obviously, I have decided to qualify type variables. For
  example, the parameter A of the method foo in class Foo in the last
  example is represented as:
&lt;/p&gt;

&lt;pre&gt;
Param(
  []
, TypeVar(
    MethodName(TypeName(PackageName([]), Id("Foo")), Id("foo"))
  , Id("A")
  )
, Id("x")
)
&lt;/pre&gt;

&lt;p&gt;
  The MethodName is the qualifier of the type variable in this
  example. This qualifier makes it immediately clear that the type
  variable refers to the type parameter of the method foo.
&lt;/p&gt;

&lt;p&gt;
  I don't know if this would have been fixed (maybe I see this
  completely wrong), but still it's a pity that I wasn't able to give
  feedback on this before JSR14 was finished. At that time, I was
  still working on the syntactic part of my Java transformation
  project (which is now available as &lt;a href="http://www.stratego-languahe.org/Stratego/JavaFront"&gt;Java
  Front&lt;/a&gt;). I gave some feedback on the syntax of generics,
  annotations and enumerations (mostly typos and minor bugs), but
  that's about it. For reducing the number of possible problems, I
  think that it would be very useful if new language features, such as
  generics, are also implemented with alternative techniques and
  tools. For example, I was able to give some feedback on the syntax
  of Java, because I was implementing a parser by creating a
  declarative syntax definition in &lt;a href="http://www.syntax-definition.org"&gt;SDF&lt;/a&gt;, a modular syntax
  definition formalism that integrates lexical and context-free
  syntax. These unconventional approaches might in general result in
  valuable feedback on proposals for new language features.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-112301956666843956?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/112301956666843956/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=112301956666843956' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/112301956666843956'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/112301956666843956'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2005/08/lifting-member-classes-from-generic.html' title='Lifting Member Classes from Generic Classes'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-112300817911603394</id><published>2005-08-02T20:21:00.000+02:00</published><updated>2005-08-02T20:50:21.740+02:00</updated><title type='text'>Cool Researchers Blog</title><content type='html'>&lt;p&gt;
      Long time no see! I'm sorry about that. Again, I'm going to work
      on my blogging practices.
    &lt;/p&gt;

    &lt;p&gt;
      About a year ago, I wrote about &lt;a href="http://mbravenboer.blogspot.com/2004/10/research-and-blogging.html"&gt;Research
      and Blogging&lt;/a&gt;, basically claiming that researchers should
      start blogging, since blogging is an efficient way of spreading
      knowledge. A few days ago &lt;a href="http://www.cs.uu.nl/wiki/Visser/WebHome"&gt;Eelco Visser&lt;/a&gt;
      (my supervisor) started &lt;a href="http://eelco-visser.blogspot.com/"&gt;blogging&lt;/a&gt;. Eelco is
      responsible for strategic rewriting, the &lt;a href="http://www.strategoxt.org"&gt;Stratego&lt;/a&gt; program
      transformation language and &lt;a href="http://www.cs.uu.nl/wiki/Visser/SyntaxDefinition"&gt;Scannerless Generalized-LR parsing and SDF2&lt;/a&gt;. Also, he was program
      co-chair of &lt;a href="http://www.gpce.org"&gt;GPCE&lt;/a&gt; 2004, and he
      supervised the development of the amazing &lt;a href="http://www.cs.uu.nl/wiki/Trace/Nix"&gt;Nix Deployment System&lt;/a&gt; (developed by &lt;a href="http://www.cs.uu.nl/people/eelco/"&gt;Eelco Dolstra&lt;/a&gt;). So,
      you can safely assume that his blog will be quite interesting to
      subscribe to!
    &lt;/p&gt;

&lt;p&gt;
  Coming up after the break: lifting member classes from parameterized classes ...
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-112300817911603394?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/112300817911603394/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=112300817911603394' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/112300817911603394'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/112300817911603394'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2005/08/cool-researchers-blog.html' title='Cool Researchers Blog'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-111780377624989944</id><published>2005-06-03T14:57:00.000+02:00</published><updated>2005-06-03T15:02:56.256+02:00</updated><title type='text'>Understanding a Problem</title><content type='html'>&lt;p&gt;
      This week I've been reviewing the solutions of assignments
      submitted by student of our &lt;a href="http://www.cs.uu.nl/wiki/Pt"&gt;program transformation course&lt;/a&gt;. One of the things that strikes me again and again is
      how hard it is for students to get a grasp of the problem that
      they have to solve in an assignment.
    &lt;/p&gt;
    
    &lt;p&gt;
      In the last few installments of our program transformation
      course, the students had to develop a program instrumentation
      that traces the number of calls for every callee/caller pair for
      one of the assignments (we usually have about 10
      assignments). Last year, we just described the problem in the &lt;a href="http://www.cs.uu.nl/wiki/Pt03/AssignmentScopedDynamicRewriteRules"&gt;assignment&lt;/a&gt;. The solutions of the students were ok, but not really exciting. They forgot to handle all kinds of cases, and some solutions even didn't terminate for some input programs.
    &lt;/p&gt;

    &lt;p&gt;
      This year, I included a set of tests in the &lt;a href="http://www.cs.uu.nl/wiki/Pt04/AssignmentConcreteObjectSyntax"&gt;assignment&lt;/a&gt;, which illustrate most (but not all) of the problems in this
      program transformation. Surprise, surprise: the students
      suddenly were able to handle all the issues illustrated by the
      testsuite that I provided. However, obviously I did not give the students all
      tests (evil grin). Indeed, several solutions could not handle the tests that
      I did not provide.
    &lt;/p&gt;

    &lt;p&gt;
      Most students don't write test. Worse, if they do test, then
      they create a single file and &lt;em&gt;modify&lt;/em&gt; the test to check
      a new situation that they might have discovered. In this way
      they don't build up a nice testsuite. The important part of
      testing is that they can be repeated automatically, not that you
      run a test once! I'm trying very hard to convince students to
      test their code properly, but they don't seem to understand the
      need for it, so you typically get questions like &lt;em&gt;"Is 10
      tests enough?"&lt;/em&gt;. I'm afraid that this is not a problem
      specific to students.
    &lt;/p&gt;

    &lt;p&gt;
      This might not be very surprising to you, but I &lt;em&gt;am&lt;/em&gt;
      surprised how clear the results of this small 'experiment' are
      (I did not do this experiment on purpose). I wonder what the
      implications of this should be for education. Clearly, having a
      grasp of a problem is the most important part of the solution.
    &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-111780377624989944?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/111780377624989944/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=111780377624989944' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111780377624989944'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111780377624989944'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2005/06/understanding-problem.html' title='Understanding a Problem'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-111436116255395503</id><published>2005-04-24T18:42:00.000+02:00</published><updated>2005-04-24T18:51:09.013+02:00</updated><title type='text'>Generics: The Importance of Wildcards</title><content type='html'>&lt;p&gt;
      or &lt;i&gt;"Why type erasure is not such a bad thing"&lt;/i&gt; or &lt;i&gt;"Why
      generics in C# are not that good"&lt;/i&gt;
    &lt;/p&gt;

    &lt;p&gt;
      Last Friday, I read an article that has been on my to do list
      way too long: &lt;a href="http://bracha.org/wildcards.pdf"&gt;Adding Wildcards to the Java Programming Language&lt;/a&gt;. I've seen
      wildcards in Java; I've used wildcards in Java; and I've even
      read the &lt;a href="http://java.sun.com/docs/books/jls/"&gt;Java Language Specification&lt;/a&gt; on wildcards, but did yet not get the
      essence of wildcards.
    &lt;/p&gt;

    &lt;p&gt;
      This paper makes the need for wildcards very clear &lt;em&gt;and&lt;/em&gt;
      explains why the work on wildcards and parameterized types is
      novel. Unfortunately, generics are often ridiculed by functional
      programmers. They claim that their type systems have been more
      expressive for decades. Fortunately, this paper clearly explains
      the issues of introducing generics in an object oriented
      setting.
    &lt;/p&gt;

    &lt;p&gt;
      The problem with basic parameterized types is subtyping. For
      example, although &lt;code&gt;Integer&lt;/code&gt; is a subclass of
      &lt;code&gt;Number&lt;/code&gt;, a &lt;code&gt;List&amp;lt;Integer&gt;&lt;/code&gt; is not a
      subtype of a &lt;code&gt;List&amp;lt;Number&gt;&lt;/code&gt;. Hence, if a method
      requires a &lt;code&gt;List&amp;lt;Number&gt;&lt;/code&gt; as an argument, then you
      cannot pass a list of &lt;code&gt;List&amp;lt;Integer&gt;&lt;/code&gt; to it.
    &lt;/p&gt;

    &lt;p&gt;
      Why is a &lt;code&gt;List&amp;lt;Integer&gt;&lt;/code&gt; not a subclass of
      &lt;code&gt;List&amp;lt;Number&gt;&lt;/code&gt;? Well, this is related to
      covariance and contravariance. A type declaration is covariant
      if it allowed to be more specific. For example, return types are
      covariant. A method that is declared to return a
      &lt;code&gt;Number&lt;/code&gt; can be overridden to be &lt;em&gt;more
      specific&lt;/em&gt; and return an &lt;code&gt;Integer&lt;/code&gt;. On the other
      hand, parameter types are &lt;em&gt;contravariant&lt;/em&gt;: a method that
      is declared to accept a &lt;code&gt;Integer&lt;/code&gt; argument can be
      implemented in a &lt;em&gt;more general&lt;/em&gt; way by allowing all
      &lt;code&gt;Number&lt;/code&gt;s.
    &lt;/p&gt;

    &lt;p&gt;
      The problem with type parameters is that they are used in method
      parameters as well as return types. Hence, they are restricted
      to the intersection of covariance and contravariance:
      invariance. Thus, type parameters are invariant and a
      &lt;code&gt;List&amp;lt;Integer&gt;&lt;/code&gt; is not a subclass of
      &lt;code&gt;List&amp;lt;Number&gt;&lt;/code&gt;.
    &lt;/p&gt;

    &lt;p&gt;
      If Java would be restricted to basic parameterized types and
      methods, then it is quite difficult to come up with a good
      signature for a method that works on a &lt;code&gt;List&lt;/code&gt; that
      contains any &lt;code&gt;Number&lt;/code&gt;. In fact, you cannot even
      declare the type of lists with abitrary numbers! Allowing
      arbitrary numbers a generic method with a dummy type parameter
      for the 'real' Number, i.e.
    &lt;/p&gt;

    &lt;pre&gt;
      &amp;lt;T&gt; void doSomething(List&amp;lt;T extends Number&gt;) { ... }&lt;/pre&gt;

    &lt;p&gt;
      This works, but it gets quite mind-boggling if the types get
      more complex (&lt;i&gt;"The more interesting your types get, the less
      fun it is to write them down!"&lt;/i&gt; -- &lt;a href="http://www.cis.upenn.edu/~bcpierce/papers/tng-lics2003-slides.pdf"&gt;Benjamin C. Pierce&lt;/a&gt;).
      These types are not only hard to write down: it
      does not even work in all cases. For example, you cannot declare
      a field that contains arbitrary &lt;code&gt;Number&lt;/code&gt;s, since you
      cannot introduce a dummy type variable for a field.
    &lt;/p&gt;

    &lt;p&gt;
      Wildcards are a language feature that make it a bit more fun to
      write down these types. Actually, it is a language feature that
      is &lt;em&gt;necessary&lt;/em&gt; to write down these types, since the dummy
      type variable is just a workaround and uses the type of the
      method to declare the type of the argument. The field example
      shows that you cannot write down the type itself. Wildcards are
      based on the notion of &lt;em&gt;use-site variance&lt;/em&gt;. Using
      wildcards, you can declare that your list is covariant:
      &lt;code&gt;List&amp;lt? extends Number&gt;&lt;/code&gt; or contravariant:
      &lt;code&gt;List&amp;lt? super Number&gt;&lt;/code&gt;. For more details, read the
      paper!
    &lt;/p&gt;

    &lt;p&gt;
      Unfortunately, C# will not support wildcards or a similar
      mechanism. The implementation strategy does not allow the
      introduction of wildcards (generics are implemented in the
      runtime instead of by type erasure). This is a bit surprising,
      since the implementation strategy is often claimed to be
      superior. What disappoints me is that the designers of C# are
      not willing to admit that subtyping is an issue and that
      wildcards are a solution. See the weblog of Eric Gunnerson: &lt;a href="http://blogs.msdn.com/ericgu/archive/2004/09/23/233438.aspx"&gt;Puzzling through Erasure II&lt;/a&gt; and the section on wildcards in &lt;a href="http://blogs.msdn.com/ericgu/archive/2004/06/29/168808.aspx"&gt;JavaOne: Day One&lt;/a&gt;.
    &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-111436116255395503?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/111436116255395503/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=111436116255395503' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111436116255395503'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111436116255395503'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2005/04/generics-importance-of-wildcards.html' title='Generics: The Importance of Wildcards'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-111394875302531677</id><published>2005-04-20T00:10:00.000+02:00</published><updated>2006-01-17T20:27:25.453+01:00</updated><title type='text'>Java Surprise 3: The Return of the Class</title><content type='html'>&lt;p&gt;
  First of all: good news! The final version of &lt;a href="http://java.sun.com/docs/books/jls/"&gt;The Java Language Specification, Third Edition&lt;/a&gt; is now available online! The
  specification has been improved considerably since the latest
  draft. &lt;a href="http://bracha.org"&gt;Gilad Bracha&lt;/a&gt; seems to be
  responsible for the bulk of the work, which is a tough job. I think
  that the result is pretty good, although I'm afraid that I will keep
  bothering him with comments and requests for clarification ;).
&lt;/p&gt;

&lt;p&gt;
  Now back to the issue of this post. First of all: I have no idea how
  well-known the issue in this post is. I didn't know it, but it might
  actually be quite well-known. I have some references to previous
  discussions on this issue at the end of the post.
&lt;/p&gt;

&lt;p&gt;
  First, I want to say something about what influences the return type
  of a method in Java. Before Java 1.5, the return type of a method
  was just the plain return type specified in the method
  declaration. In other words, the return type did not depend on
  anything.
&lt;/p&gt;

&lt;p&gt;
  Java 1.5 introduces parameterized types and generic methods. The
  return type of a method can now also include type variables. This
  makes the return type dependent on the values of the type variables
  that occur in it. The type variables can have two different scopes:
  the class of the method or just the method itself, which makes it a
  generic method. So, the actual return type of a method now also
  depends on the value of these type variables.
&lt;/p&gt;

&lt;p&gt;
  However, there is &lt;em&gt;one&lt;/em&gt; method in the Java library that does
  not return what it declares to return and needs another
  dependency. Indeed, there is an additional factor that influences
  the return type of this method.
&lt;/p&gt;

&lt;p&gt;
  The method I'm talking about is &lt;code&gt;Object.getClass()&lt;/code&gt;,
  which returns the class of an object. In Java 1.5,
  &lt;code&gt;Class&lt;/code&gt; itself is parameterized with the type that it
  represents. For example, the &lt;code&gt;Class&lt;/code&gt; for
  &lt;code&gt;String&lt;/code&gt; is &lt;code&gt;Class&amp;lt;String&gt;&lt;/code&gt;. The question
  is: what should the type parameter of the &lt;code&gt;Class&lt;/code&gt;
  returned by &lt;code&gt;Object.getClass()&lt;/code&gt; be?  Well, at the
  declaration of the method we basically know nothing, and that is
  indeed the declared return type: a wildcard (unknown type) with a
  very general bounds: the type must extend &lt;code&gt;Object&lt;/code&gt;.
&lt;/p&gt;

&lt;pre&gt;
   public final Class&amp;lt;? extends Object&gt; getClass()
&lt;/pre&gt;

&lt;p&gt;
  However, let's take a look at a piece of code where
  &lt;code&gt;getClass&lt;/code&gt; is invoked. Assuming that
  &lt;code&gt;getClass&lt;/code&gt; returns what it claims to return, we cannot
  declare &lt;code&gt;c&lt;/code&gt; to be of a more specific type, for example
  &lt;code&gt;Class&amp;lt;List&gt;&lt;/code&gt;. We must declare it with a very general
  value for the type parameter: a wildcard.
&lt;/p&gt;

&lt;pre&gt;
   List&amp;lt;String&gt; list = ...;
   Class&amp;lt;?&gt; c = list.getClass();
&lt;/pre&gt;

&lt;p&gt;
  This is unfortunate, since we actually know more about the type
  parameter of &lt;code&gt;Class&lt;/code&gt;. We know &lt;em&gt;at the invocation
  site&lt;/em&gt; that it is a &lt;code&gt;List&lt;/code&gt;, but of course we cannot
  declare that in the return type of &lt;code&gt;getClass&lt;/code&gt; in this
  way! So, we would like to let the return type of
  &lt;code&gt;getClass&lt;/code&gt; dependent on the static type of the
  &lt;code&gt;Object&lt;/code&gt; on which the method is invoked. In this way, the
  variable c could be declared to be of type
  &lt;code&gt;Class&amp;lt;List&gt;&lt;/code&gt;.
&lt;/p&gt;

&lt;p&gt;
  The developers of the Java specification decided to make the return
  type of this method a special case. That is, the Java Language
  Specification defines that an invocation of the
  &lt;code&gt;getClass&lt;/code&gt; method must be treated in special way. In
  other words, the return type of the method is different from the one
  declared on the source code. The bounds of the &lt;code&gt;Class&lt;/code&gt;
  returned by &lt;code&gt;Object.getClass()&lt;/code&gt; is changed by the
  specification to the static type of the expression on which the
  method &lt;code&gt;getClass&lt;/code&gt; is invoked. This is a useful feature,
  but it is a pity that this return type cannot be declared!
&lt;/p&gt;

&lt;p&gt;
  This post is getting &lt;em&gt;way&lt;/em&gt; too long, but I would like to
  relate this to the implicit &lt;code&gt;this&lt;/code&gt; argument of methods in
  object-oriented languages. For ordinary method arguments, you can
  declare types, which might include type variables. These type
  variables can influence the return type of the method. This is more
  or less what we want, but now we need this for our implicit
  &lt;code&gt;this&lt;/code&gt; argument. I'm not sure if a solution in this
  direction is more attractive, but there is some link ... Are there
  more methods whose return type we would like to dependent on the
  static type of the object at the invocation site? If so, then this
  should not be supported by the language itself. Unfortunately, I
  cannot think of an example at the moment ;) .
&lt;/p&gt;

&lt;p&gt;
  There is even more to tell about this &lt;code&gt;getClass&lt;/code&gt; method,
  since the type parameter of the &lt;code&gt;Class&lt;/code&gt; is not the static
  type of the subject expression, but the erased variant of it. Maybe
  I'll make that a future post ...
&lt;/p&gt;

&lt;p&gt;
  Some references to related discussions:
&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    Bug report in the Sun bug database about using the erased type as
    the parameter of the resulting Class:
    &lt;a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5004321"&gt;Object.getClass() should return erased class type&lt;/a&gt;
  &lt;/li&gt;

  &lt;li&gt;
    Discussion in the Java Generics forum on the same issue:
    &lt;a href="http://forum.java.sun.com/thread.jspa?threadID=496028&amp;amp;start=0&amp;amp;tstart=0"&gt;Are there bugs in the generics tutorial?&lt;/a&gt;
  &lt;/li&gt;

  &lt;li&gt;
    Bug report for the Eclipse JDT subproject:
    &lt;a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=58666"&gt;Object.getClass() need to be treated special ?&lt;/a&gt;
  &lt;/li&gt;

  &lt;li&gt;
    Another Generics FAQ:
    &lt;a href="http://www.angelikalanger.com/GenericsFAQ/FAQSections/TechnicalDetails.html#Is%20the%20capture%20of%20a%20bounded%20wildcard%20compatible%20to%20the%20bound?"&gt;Is the capture of a bounded wildcard compatible to the bound?&lt;/a&gt;
  &lt;/li&gt;
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-111394875302531677?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/111394875302531677/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=111394875302531677' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111394875302531677'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111394875302531677'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2005/04/java-surprise-3-return-of-class.html' title='Java Surprise 3: The Return of the Class'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-111374764826148310</id><published>2005-04-17T16:13:00.000+02:00</published><updated>2005-04-17T16:23:17.160+02:00</updated><title type='text'>Moby: I Like It</title><content type='html'>&lt;p&gt;
      I just read in &lt;a href="http://www.moby.com/journal"&gt;Moby's Journal&lt;/a&gt; that his new album &lt;em&gt;Hotel&lt;/em&gt; is doing extraordinary well in Europe. I'm happy about that: I have a lot of respect for the way Moby lives and truely appreciate his music.
    &lt;/p&gt;

    &lt;p&gt;
      However, I did not expect this album to be a best-seller. When I
      first listened to Hotel, I was quite surprised: Moby is singing
      on most of the tracks. I think he has a wonderful voice for the
      kind of music he makes nowadays, but a lot of people (well, at
      least according to the opinions at Amazonon) seem to dislike his
      vocal contribution to the songs. I cannot disagree with them if
      they claim that his voice isn't really beautiful. However, it's
      loaded with emotion, love, and sometimes compassion, which
      makes his current work stand out.
    &lt;/p&gt;

    &lt;p&gt;
      My favourite track? Well, it depends on my mood, but usually it
      is &lt;em&gt;Dream about me&lt;/em&gt;. Yes, I'm a sensitive fool ;) . &lt;em&gt;Where you end&lt;/em&gt; is a great song as well. If you like Hotel, you might also like some previous tracks that feature Moby singing. My favourites are &lt;em&gt;New Dawn Fades&lt;/em&gt;
      (from the album I Like to Score, or even better: Live on the
      Play DVD) and &lt;em&gt;Stay&lt;/em&gt; and &lt;em&gt;Afterlife&lt;/em&gt; from the
      album 18 B-sides.
    &lt;/p&gt;

    &lt;p&gt;
      I can almost hear you think: &lt;i&gt;"WTF? A non-technical post?"&lt;/i&gt;. Yes, I'm sorry ;) . If you don't understand the dual meaning of the title of this post, then go and buy the album :P .
    &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-111374764826148310?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/111374764826148310/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=111374764826148310' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111374764826148310'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111374764826148310'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2005/04/moby-i-like-it.html' title='Moby: I Like It'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-111281126433206614</id><published>2005-04-06T20:13:00.000+02:00</published><updated>2005-04-06T20:17:16.330+02:00</updated><title type='text'>Java Surprise 2: Motivation</title><content type='html'>&lt;p&gt;
In the previous posts I showed that the priority of a cast to a
reference type is different from the cast to a primitive type. &lt;a href="http://www.cs.vu.nl/~mvermaat/"&gt;Martijn Vermaat&lt;/a&gt; asked me why
the designers of the Java language made this decision. Of course, they
have good reasons for design decision, but still the decision is
questionable, especially now we have autoboxing.
&lt;/p&gt;

&lt;p&gt;
Let's take a look at this example from the original post:
&lt;/p&gt;

&lt;pre&gt;
$ echo "(Integer) - 2" | parse-java -s Expr | aterm2xml --implicit
&amp;lt;Minus&gt;
  &amp;lt;ExprName&gt;&amp;lt;Id&gt;Integer&amp;lt;/Id&gt;&amp;lt;/ExprName&gt;
  &amp;lt;Lit&gt;&amp;lt;Deci&gt;2&amp;lt;/Deci&gt;&amp;lt;/Lit&gt;
&amp;lt;/Minus&gt;
&lt;/pre&gt;

&lt;p&gt;
If no priorities where defined in the Java language, then this
expression would be ambiguous. I can illustrate this by parsing the
same expression using a Java grammar that does not declare
priorities. I'm using the &lt;a href="http://www.syntax-definition.org/SdfSoftware"&gt;SGLR&lt;/a&gt; parser for this,
which is capable of producing a parse forest (multiple parse trees) if
an input is ambiguous. The alternatives are represented by an
&lt;code&gt;amb&lt;/code&gt; element with 2 or more children.
&lt;/p&gt;

&lt;pre&gt;
$ "(Integer) - 2" | sglri -p JavaAmb.tbl | aterm2xml --implicit
&amp;lt;amb&gt;
  &amp;lt;Minus&gt;
    &amp;lt;ExprName&gt;
      &amp;lt;Id&gt;Integer&amp;lt;/Id&gt;
    &amp;lt;/ExprName&gt;
    &amp;lt;Lit&gt;
      &amp;lt;Deci&gt;2&amp;lt;/Deci&gt;
    &amp;lt;/Lit&gt;
  &amp;lt;/Minus&gt;
  &amp;lt;CastRef&gt;
    &amp;lt;ClassOrInterfaceType&gt;
      &amp;lt;TypeName&gt;
        &amp;lt;Id&gt;Integer&amp;lt;/Id&gt;
      &amp;lt;/TypeName&gt;
    &amp;lt;/ClassOrInterfaceType&gt;
    &amp;lt;Minus&gt;
      &amp;lt;Lit&gt;
        &amp;lt;Deci&gt;2&amp;lt;/Deci&gt;
      &amp;lt;/Lit&gt;
    &amp;lt;/Minus&gt;
  &amp;lt;/CastRef&gt;
&amp;lt;/amb&gt;
&lt;/pre&gt;

&lt;p&gt;
This clearly shows that the input is ambiguous: the first alternative
is the binary operator (which is the alternative chosen by the Java
language) and the other alternative is a cast to a reference
type.
&lt;/p&gt;

&lt;p&gt;
However, the cast to an &lt;code&gt;int&lt;/code&gt; is &lt;em&gt;not&lt;/em&gt; ambiguous,
since &lt;code&gt;int&lt;/code&gt; is a reserved keyword, thus forbidden as an
identifier. So, for this input there is only a single parse option,
even in the ambiguous version of Java.
&lt;/p&gt;

&lt;pre&gt;
$ echo "(int) - 2" | sglri -p JavaAmb.tbl | aterm2xml --implicit
&amp;lt;CastPrim&gt;
  &amp;lt;Int/&gt;
  &amp;lt;Minus&gt;
    &amp;lt;Lit&gt;&amp;lt;Deci&gt;2&amp;lt;/Deci&gt;&amp;lt;/Lit&gt;
  &amp;lt;/Minus&gt;
&amp;lt;/CastPrim&gt;
&lt;/pre&gt;

&lt;p&gt;
The ambiguity in the first example has to be resolved. So, what should
the language designer do? Prefer the cast, or prefer the binary minus?
Well, that decision is not very hard: in the first example, the
&lt;code&gt;(Integer)&lt;/code&gt; is a parenthesized expression, where the
expression is the variable &lt;code&gt;Integer&lt;/code&gt;. If we ignore this
actual value (since it is quite distracting), then the structure of
the expression is &lt;code&gt;( Expression ) - Expression&lt;/code&gt;. You will
recognize the need for this pattern, since the expression &lt;code&gt;(a * b) - c&lt;/code&gt; has exactly the same structure!
&lt;/p&gt;

&lt;p&gt;
The cast to a primitive type does not have the ambiguity problem,
since all primitives types are keywords and all keywords are forbidden
as identifiers. So, there is no reason to disallow this a primitive
cast at this location and for this reason the language designers
changed the priority of the primitive cast.
&lt;/p&gt;

&lt;p&gt;
Are there alternatives? Yes, there are, but they are not very
attractive either. First, a parenthesized expression name could be
forbidden. Using parentheses for a plain identifier (or a qualified
name) does not make a lot of sense. Another option is disallow casts
to primitive types at this location. This can be annoying, but it
makes things more clear and consistent.
&lt;/p&gt;

&lt;p&gt;
Of course, having two different production rules for casts is not
attractive. It's just a single language construct, so it should be
defined by a single production as well. I wonder what the language
designers would have done if autoboxing was already included in the
first version of Java, since autoboxing makes this distinction between
a reference cast and a primitive cast visible.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-111281126433206614?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/111281126433206614/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=111281126433206614' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111281126433206614'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111281126433206614'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2005/04/java-surprise-2-motivation.html' title='Java Surprise 2: Motivation'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-111273584340688870</id><published>2005-04-05T23:12:00.000+02:00</published><updated>2005-04-05T23:17:23.406+02:00</updated><title type='text'>Java Surprise 2: Another Example</title><content type='html'>&lt;p&gt;
While browsing through the &lt;a href="https://svn.cs.uu.nl:12443/repos/StrategoXT/java-front/trunk/test/v1.5/expressions.testsuite"&gt;micro testsuites&lt;/a&gt; of Java-front, I was remembered of another typical example:
&lt;/p&gt;

&lt;pre&gt;
int x = (int) ++y;
int x = (Integer) ++y;
&lt;/pre&gt;

&lt;p&gt;The first statement is allowed. The second is not.&lt;/p&gt;

&lt;p&gt;(See the first post on Java Surprise 2 for an explanation)&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-111273584340688870?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/111273584340688870/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=111273584340688870' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111273584340688870'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111273584340688870'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2005/04/java-surprise-2-another-example.html' title='Java Surprise 2: Another Example'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-111273403464878748</id><published>2005-04-05T22:42:00.000+02:00</published><updated>2005-04-05T23:23:38.966+02:00</updated><title type='text'>Java Surprise 2: Cast Priority</title><content type='html'>&lt;p&gt;
I promised a Java Surprise series. Thanks to my full-time job this
promise is not hard to remember: every few days there is a
fresh surprise for me ;) . The second surprise in this series is
actually one a discovered last summer, so I'm cheating a bit. If you
know me in real life, then I've probably already bothered you with
this one.
&lt;/p&gt;

&lt;p&gt;
First of all: please take a seat. Are you sitting comfortably?
Excellent. Did you know that the syntactical priority of a cast to a
primitive type is different from the cast to a reference type? Well,
it is. Most likely, you will never encounter this, but it is not hard
to find an example that will surprise you.
&lt;/p&gt;

&lt;p&gt;
You are probably familiar with autoboxing in Java 1.5. In short,
autoboxing can convert primitive types (such as &lt;code&gt;int&lt;/code&gt;) to
reference types (such as &lt;code&gt;Integer&lt;/code&gt;) for you if
necessary. Hence, you can assign an &lt;code&gt;int&lt;/code&gt; to an
&lt;code&gt;Integer&lt;/code&gt; and you can also cast an &lt;code&gt;int&lt;/code&gt; to an
&lt;code&gt;Integer&lt;/code&gt;. Some (correct) statements:
&lt;/p&gt;

&lt;pre&gt;
Integer x = 3;
Integer y = (Integer) 3;
int z = (Integer) 3;
&lt;/pre&gt;

&lt;p&gt;
I'm going to abuse your familiarity with autoboxing to show how weird
it can be that the priority of primitive casts is different from
reference casts. The following program is a correct program that
includes a (redundant) cast to an &lt;code&gt;int&lt;/code&gt;.
&lt;/p&gt;

&lt;pre&gt;
public class JavaSurprise2 {
  public static void main(String[] ps) {
    int y = (int) - 2;
    System.out.println(String.valueOf(y));
  }
}
&lt;/pre&gt;

&lt;p&gt;
Compile and run:
&lt;/p&gt;

&lt;pre&gt;
martin@logistico:~/tmp&gt; javac JavaSurprise2.java
martin@logistico:~/tmp&gt; java JavaSurprise2
-2
&lt;/pre&gt;

&lt;p&gt;
Well, that looks great. Now, let's replace the &lt;code&gt;int&lt;/code&gt; with
an &lt;code&gt;Integer&lt;/code&gt;.
&lt;/p&gt;

&lt;pre&gt;
public class JavaSurprise2 {
  public static void main(String[] ps) {
    int y = (Integer) - 2;
    System.out.println(String.valueOf(y));
  }
}
&lt;/pre&gt;

&lt;p&gt;
Compile ...
&lt;/p&gt;

&lt;pre&gt;
martin@logistico:~/tmp&gt; javac JavaSurprise2.java
JavaSurprise2.java:4: cannot find symbol
symbol  : variable Integer
location: class JavaSurprise2
    int y = (Integer) - 2;
             ^
JavaSurprise2.java:4: illegal start of type
    int y = (Integer) - 2;
            ^
2 errors
&lt;/pre&gt;

&lt;p&gt;
What the heck? Cannot find symbol? Let's give it a symbol ...
&lt;/p&gt;

&lt;pre&gt;
public class JavaSurprise2 {
  public static void main(String[] ps) {
    int Integer = 3;
    int y = (Integer) - 2;
    System.out.println(String.valueOf(y));
  }
}
&lt;/pre&gt;

&lt;p&gt;
Compile and run ...
&lt;/p&gt;

&lt;pre&gt;
martin@logistico:~/tmp&gt; javac JavaSurprise2.java
martin@logistico:~/tmp&gt; java JavaSurprise2
1
&lt;/pre&gt;

&lt;p&gt;
So, what happens? Of course the compiler is right. As I said in the
beginning, the priority of a cast to a primitive type is different
from a reference type. Because of the priorities defined in the Java
Language Specification, the &lt;code&gt;int&lt;/code&gt; example is parsed as a
cast. However, the &lt;code&gt;Integer&lt;/code&gt; version is parsed to an
expression name: an expression that can be referred to using a name
(aka variable). The Java compiler will never come back to this
decision to make it a cast anyway: syntactical choices are always
committed.
&lt;/p&gt;

&lt;p&gt;
I can illustrate these different parses using &lt;a href="http://www.strategoxt.org/Stratego/JavaFront"&gt;Java-front&lt;/a&gt;, a
package that provides a Java parser that is generated from a
declarative syntax definition for Java in &lt;a href="http://www.syntax-definition.org"&gt;SDF&lt;/a&gt; (yes, I'm the developer: marketing intended ;) )
&lt;/p&gt;

&lt;pre&gt;
martin@logistico:~/tmp&gt; echo "(Integer) - 2" | parse-java -s Expr
Minus(ExprName(Id("Integer")),Lit(Deci("2")))

martin@logistico:~/tmp&gt; echo "(int) - 2" | parse-java -s Expr
CastPrim(Int,Minus(Lit(Deci("2"))))
&lt;/pre&gt;

&lt;p&gt;
Or in terms of XML:
&lt;/p&gt;

&lt;pre&gt;
martin@logistico:~/tmp&gt; echo "(Integer) - 2"
    | parse-java -s Expr | aterm2xml --implicit
&amp;lt;Minus&gt;
  &amp;lt;ExprName&gt;&amp;lt;Id&gt;Integer&amp;lt;/Id&gt;&amp;lt;/ExprName&gt;
  &amp;lt;Lit&gt;&amp;lt;Deci&gt;2&amp;lt;/Deci&gt;&amp;lt;/Lit&gt;
&amp;lt;/Minus&gt;

martin@logistico:~/tmp&gt; echo "(int) - 2"
    | parse-java -s Expr | aterm2xml --implicit
&amp;lt;CastPrim&gt;
  &amp;lt;Int/&gt;
  &amp;lt;Minus&gt;
    &amp;lt;Lit&gt;&amp;lt;Deci&gt;2&amp;lt;/Deci&gt;&amp;lt;/Lit&gt;
  &amp;lt;/Minus&gt;
&amp;lt;/CastPrim&gt;
&lt;/pre&gt;

&lt;p&gt;
Surprised? You'd better be!
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-111273403464878748?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/111273403464878748/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=111273403464878748' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111273403464878748'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111273403464878748'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2005/04/java-surprise-2-cast-priority.html' title='Java Surprise 2: Cast Priority'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-111208380314598111</id><published>2005-03-29T09:55:00.000+02:00</published><updated>2005-04-05T22:01:37.500+02:00</updated><title type='text'>Java Surprise 1: Overloading and Inner Classes</title><content type='html'>&lt;p&gt;
As some of you might know, I'm working on implementing components of a Java compiler in Stratego. Obviously, I have to study the Java Language Specification in great detail for that. I had the impression that I knew a lot a about the Java language, but I still learn a lot of new details. Some of these details are funny, some are not. I've already encountered a lot of these causes and I'll try to blog about them from now.
&lt;/p&gt;

&lt;p&gt;
My first post in this series is about this fragment:
&lt;/p&gt;
&lt;pre&gt;
class Foo {
  void f(String s) {}
 
  class Bar {
    void f(int x) {}

    class Fred {
      void g() { f("aaa"); }  
    }
  }
}
&lt;/pre&gt;
&lt;p&gt;
Did you know that you cannot overload the method &lt;code&gt;f&lt;/code&gt; in this way?
&lt;/p&gt;

&lt;p&gt;
The reason for this is that the specification separates method invocation in a few phases. The first compile-time phases determines the class to search for the method to invoke. For a plain method invocation (just and identifier), the JLS specifies that the class to search for methods is the &lt;em&gt;innermost&lt;/em&gt; type declaration that has a method of that name. In this case, this will be the class &lt;code&gt;Bar&lt;/code&gt;. Hence, the later phases that handle method overloading will not consider the method &lt;code&gt;f&lt;/code&gt; that takes a String argument.
&lt;/p&gt;

&lt;p&gt;
I don't think I've encountered this issue during my Java programming. Did you?
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-111208380314598111?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/111208380314598111/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=111208380314598111' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111208380314598111'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/111208380314598111'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2005/03/java-surprise-1-overloading-and-inner.html' title='Java Surprise 1: Overloading and Inner Classes'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-110115116272586703</id><published>2004-11-22T20:09:00.000+01:00</published><updated>2004-11-22T20:19:22.726+01:00</updated><title type='text'>Lexical Macros are Bad</title><content type='html'>&lt;p&gt;
  &lt;a href="http://arthurvd.blogspot.com/"&gt;Arthur van Dam&lt;/a&gt; just created this nice picture, with a clear statement, for me:
&lt;p&gt;

&lt;center&gt;
&lt;img src="http://losser.st-lab.cs.uu.nl/~adam/lexical_macros_are_bad.jpg" width="400"&gt;
&lt;/center&gt;

&lt;p&gt;
It's going to be featured in our discussion of &lt;a href="http://www.brics.dk/RS/00/24/"&gt;Growing Languages with Metamorphic Syntax Macros&lt;/a&gt;, this Tuesday as part of the Software Generation and Configuration course I mentioned before. Thanks Arthur!
&lt;/p&gt;
&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-110115116272586703?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/110115116272586703/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=110115116272586703' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/110115116272586703'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/110115116272586703'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2004/11/lexical-macros-are-bad.html' title='Lexical Macros are Bad'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-110105370578617927</id><published>2004-11-21T17:05:00.000+01:00</published><updated>2004-11-21T17:19:34.516+01:00</updated><title type='text'>Meta Blog: How Do I Look?</title><content type='html'>  &lt;p&gt;
    Last week I changed the look of my &lt;a
    href="http://www.cs.uu.nl/groups/ST/Martin/WebHome"&gt;homepage&lt;/a&gt;
    (which is a Wiki) to a style derived from Blogger's Rounders 3
    template, which was designed by &lt;a
    href="http://www.stopdesign.com"&gt;Douglas Bowman&lt;/a&gt;. Of course, I
    wanted to change the look of my blog as well; what you see now is
    the result of this. I hope you like it. As you can see, I'm fond
    of the combination of shades of blue and gray ;) .
  &lt;/p&gt;

  &lt;p&gt;
    The sources of my blog template are &lt;a href="https://svn.cs.uu.nl:12443/repos/mbravenboer/blog/"&gt;available&lt;/a&gt; from my Subversion repository. The &lt;a href="https://svn.cs.uu.nl:12443/repos/mbravenboer/BlueBoxSkin/"&gt;sources of the Wiki skin&lt;/a&gt; for my homepage are there as well. Feel free to use it.
  &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-110105370578617927?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/110105370578617927/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=110105370578617927' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/110105370578617927'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/110105370578617927'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2004/11/meta-blog-how-do-i-look.html' title='Meta Blog: How Do I Look?'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-110068323831959264</id><published>2004-11-17T10:58:00.000+01:00</published><updated>2004-11-17T11:40:01.996+01:00</updated><title type='text'>Paper of the Day</title><content type='html'>  &lt;p&gt;
    Yesterday, I read the article &lt;a
    href="http://www2.parc.com/csl/groups/sda/publications/papers/Kiczales-IMSA92/for-web.pdf"&gt;"Towards
    a New Model of Abstraction in Software Engineering"&lt;/a&gt; by Gregor
    Kiczales. We are going to discuss this paper tomorrow (Thursday)
    in our &lt;a href="http://www.cs.uu.nl/groups/ST/Sgc/WebHome"&gt;master
    seminar on software generation and configuration&lt;/a&gt;. I'm not
    really convinced that aspect-oriented programming (as it is
    currently implemented in AspectJ) is the way to go, but this
    earlier article is brilliant!
  &lt;/p&gt;

  &lt;p&gt;
    The problem with abstraction is very well described: abstractions
    cannot hide their implementations. The need for a separation of
    meta-level interfaces from base interfaces is entirely clear after
    reading this paper. The papers immediately reminded me of &lt;a
    href="http://www.joelonsoftware.com/articles/LeakyAbstractions.html"&gt;The
    Law of Leaky Abstractions&lt;/a&gt;. The law introduced in this
    excellent article by Joel Spolsky is cited quite
    frequently. However, the credits for identifying this problem (and
    suggesting a solution!) should go to this article by Gregor
    Kiczales. I think that many of the ideas expressed in his article
    are still not realized and researched thorough enough.
  &lt;/p&gt;

  &lt;p&gt;
    Another interesting thing to note is that annotations and
    attributes as they are available in C# and Java are not really
    that novel. Until now, it was unclear to me where the idea of
    attributes in C# actually came from. I think other people have
    this problem as well, since the idea of adding attributes to
    source code is often described as being truly novel. After
    reading more about &lt;a
    href="http://www.cs.uu.nl/groups/ST/Sgc/OpenCompilers"&gt;metaobject
    protocols&lt;/a&gt;, it seems that annotations are nothing more than
    what was already available in the earliest MOP systems. Why has
    this link never been explained? Or did I miss something?
  &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-110068323831959264?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/110068323831959264/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=110068323831959264' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/110068323831959264'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/110068323831959264'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2004/11/paper-of-day.html' title='Paper of the Day'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-109777102043920421</id><published>2004-10-14T18:21:00.000+02:00</published><updated>2004-10-14T18:23:40.440+02:00</updated><title type='text'>Preparing for OOPSLA and GPCE</title><content type='html'>  &lt;p&gt;
    Next week we're going to &lt;a
    href="http://www.oopsla.org/2004"&gt;OOPSLA&lt;/a&gt; and &lt;a
    href="http://gpce04.gpce.org/"&gt;GPCE&lt;/a&gt;!  I'm going to &lt;a
    href="http://www.oopsla.org/2004/ShowEvent.do?id=27"&gt;present&lt;/a&gt;
    our paper &lt;a
    href="http://www.stratego-language.org/Stratego/ConcreteSyntaxForObjects"&gt;Concrete
    syntax for Objects&lt;/a&gt; at OOPSLA. I'm really looking forward to
    the conference and the talk, although it is a little bit scary to
    have your first conference talk ever at OOPSLA. I'm going to
    prepare this talk in the coming week, so get ready for MetaBorg
    posts ;) .

  &lt;p&gt;
    We're also going to the &lt;a
    href="http://www.oopsla.org/2004/ShowEvent.do?id=292"&gt;Software
    Transformation Systems workshop&lt;/a&gt;. I expect that this is going
    to very be interesting: quite some transformation system
    researchers will be there.
  &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-109777102043920421?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/109777102043920421/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=109777102043920421' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/109777102043920421'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/109777102043920421'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2004/10/preparing-for-oopsla-and-gpce.html' title='Preparing for OOPSLA and GPCE'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-109760805450970660</id><published>2004-10-12T22:07:00.000+02:00</published><updated>2004-10-12T21:31:13.953+02:00</updated><title type='text'>Software Engineering Lectures</title><content type='html'>  &lt;p&gt;
    Today, I gave a lecture on the &lt;a
    href="http://www.cs.uu.nl/docs/vakken/swe/slides/SWE04-CBSE.pdf"&gt;concept
    and techniques of component-based software engineering&lt;/a&gt; as part
    of our &lt;a
    href="http://www.cs.uu.nl/groups/ST/Swe/WebHome"&gt;software
    engineering course&lt;/a&gt;. The lecture is divided in two parts:
    concepts of component software and the implementation of these
    concepts in current platforms (Java and .NET). I hope the students
    liked the lecture; at least I had a lot of fun myself
    ;). Preparing a lecture from scratch (which was the case) takes a
    lot of time, but it usually provides some fresh ideas for stuff
    that we could do for our 'real work'.
  &lt;/p&gt;

  &lt;p&gt;
    For lectures like these, you get to read material that you
    wouldn't consider if you would only focus on academic stuff (&lt;a
    href="http://www.amazon.com/exec/obidos/tg/detail/-/0201734117/002-8868428-8727207?v=glance"&gt;Essential
    .NET&lt;/a&gt; and &lt;a href="http://www.amazon.com/exec/obidos/tg/detail/-/0201753065/002-8868428-8727207?v=glance"&gt;Component Development for the Java Platform&lt;/a&gt;
    are great books!). About two weeks ago, I gave a lecture on &lt;a
    href="http://www.cs.uu.nl/docs/vakken/swe/slides/SWE04-TestingTools.pdf"&gt;testing
    tools and techniques&lt;/a&gt;. The preparation for this lecture (and
    today's one as well) provided me with some more insight in the use
    of reflection for meta-programming. Concerning testing, it's
    particularly interesting to see how people work around the poor
    support for implementing tests in current programming
    languages. In particular, mock objects (&lt;a
    href="http://www.easymock.org"&gt;EasyMock&lt;/a&gt; is impressive) and &lt;a
    href="http://java.sun.com/j2se/1.5.0/docs/guide/reflection/proxy.html"&gt;dynamic
    proxies&lt;/a&gt; were useful to learn more about.
  &lt;/p&gt;
&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-109760805450970660?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/109760805450970660/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=109760805450970660' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/109760805450970660'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/109760805450970660'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2004/10/software-engineering-lectures.html' title='Software Engineering Lectures'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-109722219322175495</id><published>2004-10-08T10:55:00.000+02:00</published><updated>2004-10-08T09:56:33.220+02:00</updated><title type='text'>Data-flow Components are the Magic Bullet</title><content type='html'>  &lt;p&gt;
    &lt;a href="http://seanmcgrath.blogspot.com/"&gt;Sean McGrath&lt;/a&gt; has
    posted a nice set of &lt;a
    href="http://seanmcgrath.blogspot.com/pipelines.ppt"&gt;slides&lt;/a&gt;
    (ppt) that he used for his presentation on XML pipelining at &lt;a
    href="http://www.xmlopen.org/"&gt;XML 2004&lt;/a&gt;. Sean is more or less
    the pipe guy of the XML community, which is an honorable position.
    Most interesting is his slide (nr 4) on API components versus
    data-flow components. Unfortunately, I don't know exactly what he
    said about this slide (I wasn't there), but he must have raised
    some interesting points.
  &lt;/p&gt;

  &lt;p&gt;
    The last few decades have made clear that data-flow components
    never go out of fashion. In particular, the only really successful
    component model is the idea of pipes and filters of Unix. Unix
    components can easily be composed; they can be implemented in any
    language and they are suitable for anticipated (e.g. shell
    scripts) as well ad-hoc (the command line) composition. API-based
    component models tend to be language, or at least language family,
    specific. Hence, components relying on such a component model are
    coupled to a language or platform. As a result, they go out of
    fashion. This doesn't make API-based component models useless, but
    components based on such a mechanism just don't last. Over time,
    components will have to survive in a &lt;em&gt;heterogenous&lt;/em&gt;
    environment, since fashion changes while time passes by. In this
    way, the Internet is comparable to time. If you want to connect
    components in an heterogenous environment, then language
    specific solutions just don't work. Your current environment is
    the Internet, which is quite heterogenous.
  &lt;/p&gt;
&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-109722219322175495?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/109722219322175495/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=109722219322175495' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/109722219322175495'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/109722219322175495'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2004/10/data-flow-components-are-magic-bullet.html' title='Data-flow Components are the Magic Bullet'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-109721733372931017</id><published>2004-10-08T08:34:00.000+02:00</published><updated>2004-10-08T08:35:33.730+02:00</updated><title type='text'>Tool Hell and Encapsulation</title><content type='html'>    After more than 7 years of &lt;a
    href="http://www.stratego-language.org"&gt;StrategoXT&lt;/a&gt; development
    (to be honest, I joined the project only about 3 years ago) we
    have developed a huge number of tools: the &lt;code&gt;bin&lt;/code&gt;
    directory of my StrategoXT installation contains 96 executables
    and the &lt;em&gt;libexec&lt;/em&gt; directory adds 93 tools to that. That's
    quite a pile of tools, and this pile is a big problem for new
    users: how to learn all these tools?  What command-line arguments
    do all these tools need?
  &lt;/p&gt;

  &lt;p&gt;
    To make things worse, many of these tools do not only take input
    and produce output, but they are also &lt;em&gt;generic&lt;/em&gt; and need to
    be &lt;em&gt;specialized&lt;/em&gt; with some configuration for a specific
    programming language. This idea has been explained in the article
    &lt;a
    href="http://www.cs.uu.nl/groups/ST/Merijn/PaperGrammarsAsContracts"&gt;Grammars
    as Contracts&lt;/a&gt;. The syntax definition of a programming language
    can be used to configure these generic tools in the typical
    pipeline of a program transformation. In this way, the grammar is
    used to specialize these tools to language specific tools. The
    disadvantage of this approach is that all these tools need to be
    configured and that this configuration is not just the syntax
    definition, but for example a parse table, a pretty-print table,
    an abstract syntax definition, and so on. Users need to know all
    these file types, tool, and their configuration.
  &lt;/p&gt;

  &lt;p&gt;
    We are all aware of this issue, but for some reason we never
    really tried to solve this, except by adding more abstract and
    easy to use tools. In a discussion with &lt;a
    href="http://journal.boblycat.org/karltk/"&gt;Karl Trygve
    Kalleberg&lt;/a&gt; we realized that this really needs to be improved,
    since StrategoXT users now need to apply far too many tools for
    generating all the configuration files required for a basic source
    to source program transformation:
  &lt;/p&gt;

  &lt;ul&gt;
    &lt;li&gt;
      &lt;code&gt;pack-sdf&lt;/code&gt;, for collecting a set of SDF modules into
      a single syntax definition.
    &lt;/li&gt;

    &lt;li&gt;
      &lt;code&gt;sdf2table&lt;/code&gt;, for creating a parse table.
    &lt;/li&gt;

    &lt;li&gt;
      &lt;code&gt;sdf2rtg&lt;/code&gt;, for creating an abstract syntax
      definition.
    &lt;/li&gt;

    &lt;li&gt;
      &lt;code&gt;rtg2sig&lt;/code&gt;, for creating a Stratego signature
      (abstract data type declarations for Stratego).
    &lt;/li&gt;

    &lt;li&gt;
      &lt;code&gt;ppgen&lt;/code&gt;, for generating a pretty-printer.
    &lt;/li&gt;
  &lt;/ul&gt;

  &lt;p&gt;
    Obviously, this situation cannot be explained to new users. They
    have to new all these tools and the file types they operate on
    (&lt;code&gt;.sdf, .def, .tbl, .rtg, .str, .pp&lt;/code&gt;). Having all these
    tools and file types is not really a bad thing, but we provide no
    mechanism to &lt;em&gt;abstract&lt;/em&gt; over these tools and files. Karl
    came up with a very practical solution to this problem (he also &lt;a
    href="http://journal.boblycat.org/karltk/archives/000097.html"&gt;blogged
    about it&lt;/a&gt;). We need a &lt;em&gt;single tool: xtar&lt;/em&gt; that applies
    all these program and produces a single file: an XT archive
    (&lt;code&gt;.xtar&lt;/code&gt;). This archive can be passed to all the tools
    in StrategoXT and they just take out all the configuration files
    that they need to do their work. So, the user no longer needs to
    know all generators of configuration files and he doesn't need to
    know all these file types as well. This is a huge improvement for
    new (but also experienced) users. We hope to implement this as
    soon as possible! A little bit more information is &lt;a
    href="http://www.stratego-language.org/Stratego/XtArchive"&gt;available&lt;/a&gt;
    at the Stratego Wiki
  &lt;/p&gt;

  &lt;p&gt;
    The xtar file and tool are a good example of encapsulation:
    instead of exposing the internal implementation of our tools, we
    now just expose the concept an XT archive for a language. In this
    way, we can also change the implementation details more easily,
    which used to be quite difficult in the past. It is interesting to
    see that this idea corresponds exactly to widely applied
    object-oriented design techniques. Why haven't we learned from
    this earlier! Maybe we should apply more design patterns to our
    set of command-line tools?
  &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-109721733372931017?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/109721733372931017/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=109721733372931017' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/109721733372931017'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/109721733372931017'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2004/10/tool-hell-and-encapsulation.html' title='Tool Hell and Encapsulation'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-109708307528220903</id><published>2004-10-06T19:10:00.000+02:00</published><updated>2004-10-06T19:54:39.043+02:00</updated><title type='text'>Research and Blogging</title><content type='html'>&lt;p&gt;
Yesterday &lt;a href="http://journal.boblycat.org/karltk"&gt;Karl Trygve Kalleberg&lt;/a&gt; returned from his trip to the &lt;a href="http://se.inf.ethz.ch/laser/2004/index.htm"&gt;LASER Summer School on Software Engineering&lt;/a&gt; (Elba!) and Norway. Having Karl in Utrecht is really great. He is (over?)loaded with fresh ideas for improving our software and he's always ready for a good discussion.
&lt;/p&gt;
&lt;p&gt;
Yesterday, he convinced me to start blogging. That wasn't too difficult, since I love weblogs. It's truly amazing how many bright people are communicating their ideas and thoughts in their weblogs. This is not limited to the famous guys in the computer industry and blogging scene (such as &lt;a href="http://patricklogan.blogspot.com"&gt;Patrick Logan&lt;/a&gt;, &lt;a href="http://www.iunknown.com"&gt;John Lam&lt;/a&gt; and &lt;a href="http://seanmcgrath.blogspot.com"&gt;Sean McGrath&lt;/a&gt;). Less well-known bloggers write very interesting and well-phrased stuff as well. For example, &lt;a href="http://www.zefhemel.com/"&gt;Zef Hemel&lt;/a&gt; (a student at the University of Groningen, The Netherlands) maintains a very good weblog. Almost every day he adds an interesting story, straight from his mind. Isn't that great?
&lt;/p&gt;
&lt;p&gt;
This network of bloggers is incredibly powerful. You are no longer thinking for yourself: the blogging scene is a hive mind that spreads knowledge at a faster pace than ever before (at least for software development). I'm a PhD student specialized in software technology, so I'm part of what you might call the research community. What surprises (and maybe disappoints) me is the limited the number of researchers that are blogging. If blogging is such an excellent medium for spreading knowledge, then why aren't we part of it?
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-109708307528220903?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/109708307528220903/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=109708307528220903' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/109708307528220903'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/109708307528220903'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2004/10/research-and-blogging.html' title='Research and Blogging'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6943366.post-108430169100774603</id><published>2004-05-11T20:54:00.000+02:00</published><updated>2004-05-11T20:54:51.006+02:00</updated><title type='text'>First Post</title><content type='html'>Blogger now has a comment feature, which encourages me to try this thing again :) .&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6943366-108430169100774603?l=mbravenboer.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mbravenboer.blogspot.com/feeds/108430169100774603/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6943366&amp;postID=108430169100774603' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/108430169100774603'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6943366/posts/default/108430169100774603'/><link rel='alternate' type='text/html' href='http://mbravenboer.blogspot.com/2004/05/first-post.html' title='First Post'/><author><name>Martin</name><uri>http://www.blogger.com/profile/10341098356068600999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry></feed>
