If you haven't been hiding under some really impressive rock for the last decade, then you probably know that injection attacks are a major issue in web applications. The problem of SQL injection is well-known, but you see similar issues everywhere: SQL, Shell, XML, HTML, LDAP search filters, XPath, XQuery, and a whole series of enterprisey query languages, such as HQL, JDOQL, EJBQL, OQL are all potential candidates for injection attacks. Just search for any of these languages together with the term injection and observe the horror. Recently, it has also become more popular to mix a program written in Java with scripts, usually something like JavaScript, Ruby or Groovy. If you include user input in the script, then this is yet another vector of attack.
Solutions?
Of course it is possible to just advice programmers to properly escape all user inputs, which prevents most of the injection attacks. However, that's like telling people to do their own memory management or to do the dishes every day (which is a particular problem I have). In other words: you won't get it right.
Most of the research on injection attacks has focused on finding injection problems in existing source code using static and/or runtime analysis. Usually, this results in tools that check for injection attacks for specific languages (e.g. SQL) in specific host languages (e.g. PHP). This is very important and useful work, since it can easily be applied to detect or prevent injection attacks in existing code bases. However, at some point we just fundamentally need to reconsider the way we program. Why just fight the symptoms if you can fix the problem?
So that's what we've done in our latest work called StringBorg. I'm not going to claim that all your injection problems will be over tomorrow, but at least I think that what we propose here gives us some perspective on solving theses issues once and for all in a few years. The solution we propose is to use syntax embeddings of the guest languages (SQL, LDAP, Shell, XPath, JavaScript) in the host language (PHP, Java) and let the system do all the proper escaping and positive checking of user input.
Examples
The paper I'll mention later explains all the technical details, and I cannot redo that in a better way in a blog, so I'll just give a bunch of examples that illustrate how it works.
SQL
The first example is an embedding of SQL in Java. This example illustrates how you can insert strings in SQL queries and compose SQL queries at runtime. The first code fragment is the classic, vulnerable, way of composing SQL queries using string concatenation.
String s = "'; DROP TABLE Users; --";
String e = "username = \'" + s + "\'";
String q = "SELECT password FROM Users WHERE " + e;
System.out.println(q);
Clearly, if the string s
was provided by the user, then this would result in an injection attack: the final query is SELECT password FROM Users WHERE username = ''; DROP TABLE Users; --'
. Bad luck, the Users
table is gone! (or maybe you can thank your database administrator).
With StringBorg, you can introduce some kind of literal syntax for SQL. The SQL code is written between the quotation symbols <|...|>
. SQL code or strings can be inserted in another SQL query using the syntax ${...}
. The example would be written in StringBorg as:
String s = "'; DROP TABLE Users; --";
SQL e = <| username = ${s} |>;
SQL q = <| SELECT password FROM Users WHERE ${e} |>;
System.out.println(q.toString());
This will result in the correct query, SELECT password FROM Users WHERE username = '''; DROP TABLE Users; --'
, where the single quotes have been escaped by StringBorg according to the rules of the SQL standard (the exact escaping rules depend on the SQL dialect). Not only does the StringBorg solution solve the injection problem, it is also much prettier! This example also shows that it is not required to know the full SQL query at compile-time, for example the actual condition e
could be different for two branches of an if
statement, or could even be constructed in a while
statement.
The nice thing about StringBorg is that the SQL support is not restricted to a specific language, in this case Java. For PHP, you can do exactly the same thing:
$s = "'; DROP TABLE Users; --";
$e = <| username = ${$s} |>;
$q = <| SELECT password FROM Users WHERE ${$e} |>;
echo $q->toString(), "\n";
LDAP
Using user input in LDAP search filters has very similar injection problems. First a basic example, where there is no problem with the user input:
String name = "Babs Jensen";
LDAP q = (| (cn=$(name)) |);
System.out.println(q.toString());
The resulting LDAP filter will be (cn=Babs Jensen)
, which is what you would except. If the string has the value Babs (Jensen)
, then the parentheses need to be escaped. Indeed, StringBorg will produce the filter (cn=Babs \28Jensen\29)
. This input might have been an accident, but of course we can easily change this into a real injection attempt by using the string *
. Again, StringBorg will properly escape this, resulting in the query (cn=\2a)
.
Shell
Programs that invoke shell command could be vulnerable to injection attacks as well (as the TWiki developers and users have learned the hard way). Similar to the other examples, StringBorg introduces a syntax to construct shell commands, and escape strings:
Shell cmd = <| /bin/echo svn cat http://x -r <| s |> |>;
System.out.println(cmd.toString());
If s
has the values bravo
, foo
bar
, *
and ; echo pwn3d!
respectively, then the resulting commands are:
/bin/echo svn cat http://x -r bravo
/bin/echo svn cat http://x -r foo\ bar
/bin/echo svn cat http://x -r \*
/bin/echo svn cat http://x -r \;\ echo\ pwn3d\!
JavaScript
Not only does StringBorg prevent injection attacks, it also makes composing SQL, XQuery, JavaScript, etc more attractive: you don't have to concatenate all these nasty strings anymore. For example, the following example taken from and article on the new Java scripting support is just plain ugly:
jsEngine.eval(
"function printNames1(namesList) {" +
" var x;" +
" var names = namesList.toArray();" +
" for(x in names) {" +
" println(names[x]);" +
" }" +
"}" +
"function addName(namesList, name) {" +
" namesList.add(name);" +
"}"
);
whereas this looks quite reasonable:
jsEngine.eval(|[
function printNames1(namesList) {
var x;
var names = namesList.toArray();
for(x in names) {
println(names[x]);
}
}
function addName(namesList, name) {
namesList.add(name);
}
]| );
Of course, this would be easy to fix by introducing multi-line string literals in Java, but in addition to the nicer syntax, you get protection against injection attacks and compile-time syntactic checking of the code for free!
Generic, generic, generic
Now, if you are familiar with our work, then this solution won't really surprise you, since we have been working on syntax embeddings for some time now (although in different application areas, such as meta programming). However, this work is quite a fundamental step towards making these syntax embeddings easier to use by ordinary programmers. First, the system now supports ambiguities, which always was the weak point of our code generation work: if you don't support ambiguities, then the programmer needs to be familiar with the details of the grammar of the guest language, which you really don't want. Fortunately, this is now a technical detail that you now can forget about! Second, no meta-programming is required at all to add a new guest language (e.g. XPath) to the system. All you need to do is define the syntax of the language, define the syntax of the embedding, and optionally define escaping rules for strings and you're all set. Thus, compared to our previous work on MetaBorg (OOPSLA '04), there is no need for implementing the mapping from the syntax of the guest language to code in the host language.
This is a pretty amazing property: basically, this means that you can just use languages as libraries. You can just pick the languages you want to use in a source file and that's it! No difficult meta-programming stuff, no program transformation, no rewrite rules and strategies, no limitations. In fact, this even goes beyond libraries: libraries are always language specific (for example for Java or PHP), but the implementation of support for a guest language (e.g. SQL) is language independent. This means that if some person or company implements support for a guest language (e.g. SQL) then all host languages (Java, PHP, etc) are immediately supported.
Future?
The paper we wrote about this is titled "Preventing Injection Attacks with Syntax Embeddings. A Host and Guest Language Independent Approach" and is now available as technical report. Last week, we submitted this paper to the USENIX Security Symposium. We won't know if the paper is accepted until April 4, but I would be flabbergasted if it got rejected ;) . Our prototype implementation, called StringBorg, is available as well. I'm looking forward to your feedback and opinions. I'll add some examples to the webpage later this week, so make sure to come back!
Peter Ahé already has a general solution for embedding foreign languages on his wish list (as opposed to an XML specific solution), so could this actually be happening in the near future?