Strings in Java

At my first big programming job, we were building web applications using Servlets and EJBs. This is before Java web application frameworks had been invented, Servlets and EJBs were still in their infancy, and JSP were not even on the map. Everybody was feeling out how to use the technology properly, and a lot of mistakes were made along the way.

The first thing I was tasked with was helping the existing team to improve performance in an application that had just been launched. Looking back, I cringe at how badly this application was written, but at the time it was the best the team could come up with. While there were many problems with this app, performance was the showstopper: loading a page took an average of 60 seconds.

The app was pulling large amounts of data from an Oracle database using stored procedures, and building presentation html right there in the servlet layer. We couldn’t afford software to properly analyze the app, so we wrote a quick timer and started doing simple start/stop events throughout the loading of particularly slow pages, tracking the time it took to perform specific actions.

We slowly figured out that no one part of the app was being egregiously slow; The whole thing was just uniformly slow. Obviously loops through resultsets were taking up the bulk of the time, but inside the loops each statement performed at the same equal snail’s pace.

Searching for help on the web didn’t help. I’m not sure if my google-fu was just in its infancy back then, or if there really was no information out about this problem. Eventually I started writing simple test servlets and tried to do the same task in different ways, hoping to flush out the piece of code that was causing the slowdown. As it turned out, I stumbled upon the solution in Sun’s Javadocs.

The Problem

Apparently in Java, Strings are immutable (they cannot be altered after they are created). The idea is like so: Lets say you’re going to display an invoice to a customer. You have their name in a string, pulled from the customer table. You read in the invoice, and you have their name again, pulled twice from the billing and shipping information. Now you have three string variables all storing the same static bit of information in memory. Rather than waste 3x the space, Java simply points all three to the same chunk of memory.

Now you might think this would be a bad thing; What if the customer changes their billing address name? Because strings cannot be changed after they are created, every time you think you’re changing a string, you’re actually creating a brand new string and pointing your variable to the new string. In the event that the old string has no more variables pointing to it, it will be garbage collected.

Understanding how this whole mess affected this application has to do with how we were building our html. You see, building a page went something like this:

...
String body = new String();
body += "<table border='1'></table>";
body += "<tr>";
body += "    <th>Column 1</th>";
body += "    <th>Column 2</th>";
body += "</tr>";

while( rs.next() )
{
    body += "<tr>";
    body += "    <td>"+rs.getString(1)+"</td>";
    body += "    <td>"+rs.getFloat(2)+"</td>";
    body += "</tr>";
}
body += "</table>";
...
return header + body + footer;

That’s right, every line of html that was being created for every page was done by string concatenation. With strings, when body += "html here" is used, it creates a new string containing the concatenated contents of both strings, then points body to the new string. This means that the old string body pointed to, along with the html that was concatenated with it, are now floating around waiting for garbage collection. Each and every line of html created using this method has the side effect of creating 1 or more new string variables. Object creation is expensive in Java, and the garbage collection for these excess strings was nearly as expensive.

The Solution

The solution was actually pretty trivial, but time consuming to implement. Java has a StringBuffer class which is specifically designed for this type of problem. It stores the same information as a String, but it a mutable fashion. This allows us to alter its contents without the hassle of object creation and garbage collection. The fixed code looked like so:

...
StringBuffer body = new StringBuffer();
body.append( "<table border='1'></table>" );
body.append( "<tr>" );
body.append( "    <th>Column 1</th>" );
body.append( "    <th>Column 2</th>" );
body.append( "</tr>" );

while( rs.next() )
{
    body.append( "<tr>" );
    body.append( "    <td>").append( rs.getString(1) ).append( "</td>" );
    body.append( "    <td>").append( rs.getFloat(2)  ).append( "</td>" );
    body.append( "</tr>" );
}
body.append( "</table>" );
...
return (new StringBuffer( header.toString() )
                 .append( body )
                 .append( footer )
       ).toString();

For anybody doing Java web apps now, you just won’t run into this problem as often. JSPs allow for much more readable presentation code, and you rarely run around appending to your html like this. That said, the misuse of strings in Java is in no way limited to this example here. The problem crops up more than you’d imagine, with experienced and inexperienced developers alike.

New Job, Same Problem

At my second big computer job, I was surrounded by some very smart people. They had been using Java for years, and were currently maintaining a huge, high performance web app using JSPs, Servlets, and EJBs. They were handling exclusively backend work, and passing off finished code webmonkeys to design UIs around. I remember sitting in my little cubicle, pouring over code rapidly, trying to get a grasp of how the application worked. I overheard three of the senior developers discussing a problem a few cubicles over. Much to my surprise, they were having performance problems with a new piece of code that sounded suspiciously like it was an immutable string issue.

I listened for a minute or two, then swaggered over to solve their problem. What took days to figure out at my first job was related and then fixed in mere minutes here, and the performance issue disappeared. I’m always reminded of this moment, and how even experienced developers don’t know everything. There’s always something more to learn.

Leave a Reply

Comments from new authors must be approved before they will appear.
After your first comment is approved, you are free to comment at will.