Joe Walnes
  Blog



Recent Entries

Creative uses of Hamcrest matchers

Hamcrest 1.1 released

Testing on the Toilet

Building testable AJAX apps (Does my button look big in this?)

QDox is back - 1.6 released

Java and .NET RESTful interoperability with XStream

I've joined Google

OSCon: SiteMesh, SiteMesh, SiteMesh, SiteMesh

Flexible JUnit assertions with assertThat()

SiteMesh and Content Management @ O'Reilly OpenSource Conference

XStream 1.1.2 released. Java 5 Enums, JavaBeans, field aliasing, StAX, and more...

VB.Net is the bestest

XStream 1.1.1 released

Accessing generic type information at runtime

XStream 1.1 released

JUnit tip: Setting the default timezone with a TestDecorator

XStream: how to serialize objects to non XML formats

How my backflip went...

Backflippin' in 4 hours.

Is 100% test coverage a BAD thing?

Looking back at the SiteMesh HTML parser

The road ahead for SiteMesh 3

Joe's Backflipping for Autistic Research - time is nearly up...

SiteMesh 2.2 Released

Advanced SiteMesh

More... [RSS | RDF]

About Joe Walnes

I am a software engineer for Google, based in London.

Open Source

WebStuff (coming soon)

XStream

ActiveMQ

SiteMesh

QDox

nMock

jMock

Pico Container

Nano Container

OpenSymphony

Squiggle

MockDoclet

MockObjects

Jelly

Groovy

PatternStitcher

XJB

Books

Java Open Source Programming, Wiley JSP Site Design, Wrox

Talks

Mock Roles, not Objects
October 26 2004, Vancouver, Canada. OOPSLA'04

Personal Development Practices Map
June 24 2004, Salt Lake City, Utah. Agile Development Conference

SiteMesh.NET and ASP.NET MasterPages
May 20 2004, Bangalore, India. Bangalore .NET User Group

Mock Objects: Driving Top Down Development
March 29 2004, St Neots, UK. OT2004

Mock Objects
December 2 2003, London, UK. XP Day 3


Blog

Creative uses of Hamcrest matchers

The matcher API of Hamcrest is typically associated with assertThat() or mocks. I always knew other people would find good uses for it, but I never really knew what.

I particularly like these:

Collection processing

Håkan Råberg blogged about how Hamcrest can be used with iterators:

List<Integer> numbers = Arrays.asList(-1, 0, 1, 2);
List<Integer> positiveNumbers = detect(numbers, greaterThan(0)));

List<String> words = Arrays.asList("cheese", "lemon", "spoon");
List<String> wordsWithoutE = reject(words, containingString("e"));

Nothing rocket-sciencey about it. But simple and useful because it reduces boilerplate code and get to use the ever growing library of Hamcrest matchers.

On top of that, combining Hamcrest with a CGLib generated proxy, he has built a staticly typed query API:

List<Person> employees = ...;
List<Integer> allAges 
        = collect(from(employees).getAge());
List<Person> allBosses 
        = collect(from(employees).getDepartment().getBoss());
List<Person> allAccountants
        = select(from(employees).getDepartment().getName(), 
                 containingString("Accounts"));

This is nice alternative to a string based query language as you get your IDE completions, refactoring, compile time checking etc, without the noise of boilerplate code.

Web testing

Robert Chatley has taken some of the concepts of his LiFT framework and reimplemented them using Hamcrest and WebDriver for performing web testing.

public void testHasLotsOfLinks() {
  goTo("http://some/url");
  assertPresenceOf(greaterThan(15), links());
  assertPresenceOf(atLeast(1), link().with(text(containingString("Sign in"))));

  clickOn(link().with(text(containingString("Sign in"))));
  assertPresenceOf(exactly(1), title().with(text(equalTo("Sign in page"))));
}

Now initially this seems a bit wordy and strange. Robert has designed this as a literate API. If you adjust the syntax highlighting of your API and make the Java keywords and syntax less visible, you get this:

goTo "http://some/url"
assertPresenceOf greaterThan 15 links
assertPresenceOf atLeast 1 link with text containingString "Sign in"

clickOn link with text containingString "Sign in"
assertPresenceOf exactly 1 title with text equalTo "Sign in page"

The motivation here is that the API usage is self documenting and could be useful to non-programmers. The flip-side to this is that it's actually quite hard to write APIs like this and the usage can take quite a bit of getting used to.

Robert also introduced a Finder interface (the link() and title() methods return Finder implementations). This allows you to factor out your own UI specific components:

assertPresenceOf(atLeast(1), signInLink());
clickOn(signInLink());
assertPresenceOf(exactly(1), 
  blogLink().with(urlParameter("name", containingString("joe"))));

This is the bit I really like.

Allowing abstractions of components and matching rules to be combined in many different ways, so tests can check exactly what they need to, resulting in reduced less brittle tests that are easier to maintain.

Other uses

As I hear of other uses I'm listing them on the Hamcrest wiki.

When it goes bad

Of course, like any technology, it's easy to get carried away.

Here's an example of Hamcrest gone bad:

assertThat(myNumber, anyOf(equalTo(0), allOf(greaterThan(5), lessThan(10))));

I'm not a LISP programmer, so I find that really hard to understand. Just because we have an assertTHAT() method, we don't have to use it all the time. In this case it's much simpler to use plain old assertTRUE():

assertTrue("myNumber should be 0 or between 5 and 10", 
        myNumber == 0 || (myNumber > 5 && myNumber < 10));

Even though the non-Matcher version is longer (it could be shortened by leaving out the message and using a shorter variable name, but that would make it harder to understand), I find it much easier to understand.

But, what if you actually needed to use a matcher (e.g. for the web testing or collection processing examples above)?

One approach is you could use higher level matcher that are composed of other matchers:

matcher = anyOf(equalTo(0), allOf(greaterThan(5), lessThan(10)))
// simplifies to
matcher = anyOf(equalTo(0), between(5, 10))
Complete tangent: An alternative to between(5, 10) is between(5).and(10). The latter makes for more literate code, but is harder to implement - again a design tradeoff.

Another approach is to create a one-off anonymous matcher implementation:

matcher = new CustomMatcher() {
  public boolean matchesSafely(Integer n) {
    return n == 0 || (n > 5 && n < 10);    
  }
}

What are you doing with Hamcrest?

Updates:

  1. JUnit 4.4 now comes with Hamcrest and assertThat().

Hamcrest 1.1 released

http://code.google.com/p/hamcrest

Testing on the Toilet

At Google we have pretty good internal documentation, tutorials and places to find good tips. If you know you don't know something, it won't take long to find the answer.

However, it's slightly tougher to place something to be read, when the target readers don't know they don't know it. This was a problem the testing group were finding, as they wanted to improve sharing of practical testing techniques.

So, Testing on the Toilet was started. A regular weekly(ish) tip posted in toilet cubicles and above urinals. Short enough to be read whilst doing your business.

Soon after, many visitors started noticing these postings and we got requests to make these available to put up in offices of other development teams.

So, we have.

http://googletesting.blogspot.com/

Each episode will be made available as a toilet friendly PDF.

Building testable AJAX apps (Does my button look big in this?)

Last week, Adam Connors and I presented "Does my button look big in this? Building testable AJAX applications." at the Google London Test Automation Conference. Unfortunately the code is unclear on the video, so you can also download the slides separately (13mb!).

QDox is back - 1.6 released

QDox history

QDox is a fast JavaDoc/Java parser built in 2002. It was originally intended as a stop gap until Java supported annotations by allowing tools to easily get access to JavaDoc attributes. Essentially it provided nothing more than a stripped down version of the JavaDoc Doclet tool, with performance suitable for using in continual build cycles (what would take JavaDoc over ten minutes to process would typically take QDox less than ten seconds). It served its purpose well.

The death of QDox

Then came along Java 5 and I stopped actively working on QDox. The first reason was that with the new annotations support, QDox wasn't necessary. The other reason was that it would take a lot of effort to update the parser to support Java 5 syntax (not just for annotations, but generics, enums, etc).

And so QDox went quiet. The dev team lost interest and the releases stopped.

QDox is reborn

It turned out, I was wrong. Even with Java supporting annotations, QDox in a Java 5 world has some benefits:


  • Some Java 5 projects still want to use JavaDoc attributes (as well as annotations). Maybe for legacy reasons.

  • QDox acts on source code, rather than byte code. This can be useful in chicken and egg situations where you need to generate source from existing source, but you can't compile until you've generated the code.

  • QDox exposes information that isn't exposed by reflection, such as names of parameters or JavaDoc comments, which are useful for building tools to help visualize code.

So, by popular demand, I'm resurrecting the project. Yay.

1.6 released

This new release is a stop-gap release. Highlights include:

  • Switched to Apache 2.0 license.
  • Parser can now deal with Java 5 source code (annotations, generics, enums, var args, etc).
  • Numerous bugfixes.

This should be enough for existing projects to carry on using it with Java 5 code.

The next release will focus on making Java 5 specific features available in the API. Stay tuned.

Java and .NET RESTful interoperability with XStream

My ex-colleagues Paul Hammant and Ian Cartwright have written an article on their experiences of building SOA applications using RESTful services in .NET and Java that could interoperate over web services and message queues. XStream made this possible.

Buzzwordtastic.

http://www.infoq.com/articles/REST-INTEROP

I've joined Google

Woooo..... this looks fun.

OSCon: SiteMesh, SiteMesh, SiteMesh, SiteMesh

Just got back from the O'Reilly Open Source Convention in Portland. Fantastic conference - met lots of really interesting people (and the odd nutter).

It was a good conference for SiteMesh. It opened my eyes to two things:

  1. SiteMesh rocks. People who have tried SiteMesh, love it and don't turn back. Their preferred choice for web framework changes, but SiteMesh remains constant.
  2. Our marketing sucks. Despite it being around for 5 years, most of the Java web app community have never felt the need to try it.

I was there to present a session on SiteMesh but a lot of other speakers beat me to it. It kept slipping into other sessions...

Using AppFuse for Test driven Web Development, Matt Raible (details)

Matt gave an overview of the technology stack used in his AppFuse application. Despite having 5 versions of his app that use different frameworks (Struts, WebWork, Tapestry, Spring MVC and JavaServer Faces), all used SiteMesh. Good!

Integrate: Building a Site from Open Source Gems, Erik Hatcher (details)

Erik walked us through the open source products he used to build his Lucene Book website and what customizations he made. The focus, of course, was Lucene and I learned a lot of great tricks about Lucene that hadn't occurred to me before - such as using "sounds like" queries with soundex and indexing images by colors. I continue to love Lucene.

A great point that Erik mentioned was the need to become intimate with the projects you use. If you truely want to make the most of your frameworks, understand how they work, join the community and extend them.

Erik chose Tapestry to build the site but he also had Blojsom and some static content, so SiteMesh was useful to integrate these and he created some custom code to build SiteMesh decorators with Tapestry.

He pointed out that despite submitting this useful Tapestry integration to the SiteMesh project, nothing had made it into the SiteMesh release. Feeling embarressed, I committed his changes immediately, inadvertently breaking the build and providing great ammunition for Eric Pugh's session on the importance of continuous integration.

WebWork vs Spring MVC Smackdown, Matthew Porter and Matt Raible (details)

The basic plot was this... Matthew Porter was arguing why Spring MVC sucks and WebWork rocks. Matt Raible was arguing why Spring MVC rocks and WebWork sucks. The only thing they both agreed on was SiteMesh rocked. A fairly heated and passionate debate - great fun to watch. I would have opted for more violence though.

Matthew Porter got the final laugh when he pointed out that he compared the Spring MVC and WebWork versions of Matt Raible's AppFuse framework and the Spring MVC version had about 25% (I think) more code, not including comments.

(more)

The Evolution of Web Application Architectures, Craig McClanahan (details)

This was an interesting session where Craig compared the approaches taken by Struts, WebWork, Spring MVC, Tapestry and JavaServer Faces. He had done detailed research and, despite his heavy involvement with Struts and JSF, gave a very fair and objective view of the pros and cons of each.

This work could be useful for people evaluating which frameworks to choose and possibly could be overlayed with a guide based on values. The bottom line is there's no single 'ultimate' web framework and depending on your needs and values you should choose the most suitable. I think it would be beneficial to all to have a guide indicating which values each of these frameworks are suited/not-suited for.

So, my question is this: Which values are more important to you when choosing a web framework and in which priority?

These are some example values that spring to mind: commercial support, testability (unit and functional), popularity, extensibility/customization, integration with other frameworks, rich widget support, REST friendlyness, simplicity vs magicness, AJAX friendlyness, learning curve, configuration, etc.

Anyhoo, SiteMesh was probably mentioned enough times to attract another load of people to the SiteMesh session.


I'm really glad these people mentioned SiteMesh and said such kind words about it - it resulted in a lot of interest and a full house for the SiteMesh session. I hope to get these presentations online shortly and write a bit more about how Subversion, Microsoft Word and SiteMesh can be combined to create a rich Content Management System.

Improving SiteMesh's marketing

The fact still remains that SiteMesh has terrible marketing. I'd love some ideas of how to spread the word more and encourage more people to try it but I honestly have no idea what to do. Any suggestions?

Flexible JUnit assertions with assertThat()

Over time I've found I end up with a gazillion permutation of assertion methods in JUnit: assertEquals, assertNotEquals, assertStringContains, assertArraysEqual, assertInRange, assertIn, etc.

Here's a nicer way. jMock contains a constraint library for specifying precise expectations on mocks that can be reused in your own assertion method (and that's the last time I'm going to mention mocks today, I promise - despite the frequent references to the jMock library).

By making a simple JUnit assertion method that takes a Constraint, it provides a replacement for all the other assert methods.

I call mine assertThat() because I think it reads well. Combined with the jMock syntactic sugar, you can use it like this:

assertThat(something, eq("Hello"));
assertThat(something, eq(true));
assertThat(something, isA(Color.class));
assertThat(something, contains("World"));
assertThat(something, same(Food.CHEESE));
assertThat(something, NULL);
assertThat(something, NOT_NULL);

Okay, that's nice but nothing radical. A bunch of assert methods have been replaced with different methods that return constraint objects. But there's more...

Combining constraints

Constraints can be chained making it possible to combine them in different permutations. For instance, for virtually every assertion I do, I usually find that I need to test the negative equivalent at some point:

assertThat(something, not(eq("Hello")));
assertThat(something, not(contains("Cheese")));

Or maybe combinations of assertions:

assertThat(something, or(contains("color"), contains("colour")));

Readable failure messages

The previous example can be written using the vanilla JUnit assert methods like this:

assertTrue(something.indexOf("color") > -1 || something.indexOf("colour") > -1);

Fine, the constraint based one is easier to read. But the real beauty is the failure message.

The vanilla JUnit assert fails with:

junit.framework.AssertionFailedError:

Useless! Means you have to put an explicit error message in the assertion:

assertTrue(something.indexOf("color") > -1 || something.indexOf("colour") > -1,
            "Expected a string containing 'color' or 'colour'");

But the jMock constraint objects are self describing. So with this assertion:

assertThat(something, or(contains("color"), contains("colour")));

I get this useful failure message, for free:

junit.framework.AssertionFailedError:
Expected: (a string containing "color" or a string containing "colour")
but got : hello world

Implementing it

The simplest way is to grab jMock and create your own base test class that extends MockObjectTestCase. This brings in convenience methods for free (I'm still not talking about mocks, honest). If you don't want to extend this class, you can easily reimplement these methods yourself - it's no biggie.

import org.jmock.MockObjectTestCase;
import org.jmock.core.Constraint;

public abstract class MyTestCase extends MockObjectTestCase {

  protected void assertThat(Object something, Constraint matches) {
    if (!matches.eval(something)) {
      StringBuffer message = new StringBuffer("\nExpected: ");
      matches.describeTo(message);
      message.append("\nbut got : ").append(something).append('\n');
      fail(message.toString());
    }
  }
  
}

Now ensure all your test cases extend this instead of junit.framework.TestCase and you're done.

Defining custom constraints

Creating new constraints is easy. Let's say I want something like:

assertThat(something, between(10, 20));

To do that I need to create a method that returns a Constraint object, requiring two methods; eval() for performing the actual assertion, and describeTo() for the self describing error message. This is something that can live in the base test class.

public Constraint between(final int min, final int max) {
  return new Constraint() {  
    public boolean eval(Object object) {
      if (!object instanceof Integer) {
        return false;
      }
      int value = ((Integer)object).intValue();
      return value > min && value < max;
    }
    public StringBuffer describeTo(StringBuffer buffer) {
      return buffer.append("an int between ").append(min).append(" and ").append(max);
    }
  }
}

This can be combined with other constraints and still generate decent failure messages.

assertThat(something, or(eq(50), between(10, 20));
junit.framework.AssertionFailedError:
Expected: (50 or an int between 10 and 20)
but got : 43

In practice I find I only need to create a few of these constraints as the different combinations gives me nearly everything I need.

More about this in the jMock documentation.

Summary

Since using this one assert method I've found my tests to be much easier to understand because of lack of noise and I've spent a lot less time creating 'yet another assertion' method for specific cases. And in most cases I never need to write a custom failure message as the failures are self describing.

Updates

  1. The matchers from jMock have been pulled out into a new project, Hamcrest.
  2. A follow up to this post shows some creative uses of matchers, and talks a bit about when you shouldn't use them.
  3. JUnit 4.4 now comes with assertThat()!

SiteMesh and Content Management @ O'Reilly OpenSource Conference

I'm talking at the O'Reilly OpenSource Conference (OSCON) - Wednesday Aug 3, Portland, Oregon.

Come and say hi.

A problem faced in every web application is how to separate style from content. SiteMesh is a framework that provides an elegant solution to this, resulting in a clean separation that is straightforward to work with, complements other web frameworks, and is easily applied to existing applications.

The first part of this session introduces SiteMesh, including an overview of the architecture and patterns, comparisons with other approaches, and how it can complement existing web frameworks (such as WebWork, Spring, and Struts).

The second part of this session demonstrates how SiteMesh can be blended with other technologies to form the foundation of a rich content management system that distinguishes between the specialized roles of users, their skills, and the most suitable tools. Content writers can use a word processor, web designers can use a WYSIWYG web development tool, and developers can use their IDE.

Allowing these different roles and tools to come together to produce one website is a trivial task with SiteMesh--allowing content management to be easily introduced to existing applications.

Finally, some of the advanced features of SiteMesh are discussed, such as real world tips and tricks, how to create custom strategies for which look and feel to apply, assembling pages from components and building portal style applications.

And for the first time, new features in SiteMesh 3 will be demonstrated, including extending the HTML processor, using it outside of SiteMesh, and offline support.

http://conferences.oreillynet.com/os2005/

XStream 1.1.2 released. Java 5 Enums, JavaBeans, field aliasing, StAX, and more...

New features:

  • Java 5 Enum support.
  • JavaBeanConverter for serialization using getters and setters.
  • Aliasing of fields.
  • StAX integration, with namespaces.
  • Improved support on JDK 1.3 and IBM JDK.

Changelog:
http://xstream.codehaus.org/changes.html

Full download:
http://dist.codehaus.org/xstream/distributions/xstream-1.1.2.zip

Jar only:
http://dist.codehaus.org/xstream/jars/xstream-1.1.2.jar

VB.Net is the bestest

I was happily coding away in VB.Net today (grrr) when I noticed a little weirdity in the intellisense popup.

stupid.gif

Documentation says:

The NotOverridable modifier defines a method of a base class that cannot be overridden in derived classes. All methods are NotOverridable unless marked with the Overridable modifier. You can use the NotOverridable modifier when you do not want to allow an overridden method to be overridden again in a derived class.

Makes C++ look simple.

XStream 1.1.1 released

I'm pleased to announce the release of XStream 1.1.1 - the powerful, yet easy to use Java to XML serialization library.

Some of the improvements in this release:


  • Converters can be registered with a priority, allowing more generic filters to handle classes that don't have more specific converters.

  • Converters can now access underlying HierarchicalStreamReader/Writer implementations to make implementation specific calls.

  • Improved support for classes using ObjectInputFields and ObjectInputValidation to follow the serialization specification.

  • Default ClassLoader may be changed using XStream.setClassLoader().

  • Loads of bugfixes and performance enhancements.

Full change log: http://xstream.codehaus.org/changes.html
Download: http://xstream.codehaus.org/download.html

Accessing generic type information at runtime

A common misconception about generics in Java 5 is that you can't access them at runtime.

What you can't find out at runtime is which generic type is associated with an instance of an object. However you can use reflection to look at which types have been staticly associated with a member of a class.

public class GenericsTest extends TestCase {

    class Thing {
        public Map<String,Integer> stuff;
    }

    public void test() throws Exception {
        Field field = Thing.class.getField("stuff");
        ParameterizedType type = (ParameterizedType) field.getGenericType();
        assertEquals(Map.class, type.getRawType());
        assertEquals(String.class, type.getActualTypeArguments()[0]);
        assertEquals(Integer.class, type.getActualTypeArguments()[1]);
    }

}

Just wanted to clear that up.

(This is something that I'll probably exploit in XStream for J5 users to further simplify the XML.)

XStream 1.1 released

I'm pleased to announce the release of XStream 1.1. New features include:
  • Improved support for serializing objects following the Java Serialization Specification:
    • Calls custom serialization methods, readObject(), writeObject(), readResolve() and writeReplace() in class, if defined.
    • Supports ObjectInputStream.getFields() and ObjectOutputStream.putFields() in custom serialization.
  • Provides implementations of ObjectInputStream and ObjectOutputStream, allowing drop in replacements for standard serialization, including support for streams of objects. [More...]
  • Reads and writes directly to most XML Java APIs: DOM, DOM4J, JDOM, XOM, Electric XML, StAX, Trax (write only), SAX (write only). [More...]
View the complete change log and download.

JUnit tip: Setting the default timezone with a TestDecorator

A problem I was finding when testing XStream is that many of the tests were timezone dependent. Initially I had some code in each setUp()/tearDown() method to set the default timezone and reset it afterwards.

This lead to a lot of duplication. Putting the common code in a super class was an option but this lead to a fragile base class.

Using composition allows this commonality to be put in one case and applied to the relevant test cases. This gets out of the 'single inheritance' issue.

JUnit provides an (often overlooked) decorator class to help with this. Here's my TimeZoneTestSuite:

  public class TimeZoneTestSuite extends TestDecorator {

    private final TimeZone timeZone;
    private final TimeZone originalTimeZone;

    public TimeZoneTestSuite(String timeZone, Test test) {
      super(test);
      this.timeZone = TimeZone.getTimeZone(timeZone);
      this.originalTimeZone = TimeZone.getDefault();
    }

    public void run(TestResult testResult) {
      try {
        TimeZone.setDefault(timeZone); 
        super.run(testResult);
      } finally {
        TimeZone.setDefault(originalTimeZone); // cleanup
      }
    }

  }

To use it, you need to override the default test suite behavior by adding a method to your TestCase:

  public class MyTest extends TestCase {
    public static Test suite() {
      Test result = new TestSuite(MyTest.class); // default behavior
      result = new TimeZoneTestSuite("EST", result); // ensure it runs in EST timezone
      return result;
    }

TestDecorators are a very powerful feature of JUnit - don't forget about them.

XStream: how to serialize objects to non XML formats

As you know, XStream makes it easy to serialize objects to XML:

Person person = ...;
xstream.toXML(out);

Producing:

<com.blah.Person>
  <firstName>Joe</firstName>
  <lastName>Walnes</lastName>
  <homePhone>
    <areaCode>123</areaCode>
    <number>433535</number>
  </homePhone>
  <cellPhone>
    <areaCode>4545</areaCode>
    <number>4534</number>
  </cellPhone>
</com.blah.Person>

I often use this approach whilst debugging to dump out the contents of an object. It works, but my eyes just aren't that good at parsing XML.

By creating an alternative writer implementation, XStream can be used to serialize objects in other formats:

Person person = ...;
xstream.marshal(stuff, new AnAlternativeWriter(out));

Producing the slightly more digestible:

com.blah.Person
  firstName = Joe
  lastName = Walnes
  homePhone
    areaCode = 123
    number = 433535
  cellPhone
    areaCode = 4545
    number = 4534

To gain roundtrip serialization/deserialization support in alternative formats to XML, you need to provide your own implementations of both HierarchicalStreamWriter and HierarchicalStreamReader.

How my backflip went...

Well, I'm still alive :)

This is a brief entry as I'm currently enjoying a two week vacation in Thailand. I'll give you the full details when I get back (assuming my inbox hasn't burst by then).

I achieved a single back handspring, and eventually three back handsprings. I failed to do a complete back somersault after four attempts. Two out of three ain't bad.

I'm happy though. Thanks to an amazing number of sponsors, I've raised 1400 GBP (approx 2500 USD) that is to be donated to Autistic Research.

I've got some photos and videos to upload when I get back. Jez has also uploaded some.

Backflippin' in 4 hours.

Gulp. Six weeks went quickly.

The story so far.

If anyone has a video camera (or one of those digital cameras that can record video snippets), please bring it along... I'm having a bit of a camera shortage.

Is 100% test coverage a BAD thing?

I'm a huuuge advocate of TDD and high test coverage, and I will often go to great lengths to ensure this, but is 100% such a good thing?

I recently heard Tim Lister talking about risk in software projects and the CMM (powerpoint slides).

The 'ultimate' level of CMM ensure that everything is documented, everything goes through a rigorous procedure, blah blah blah. Amusingly, Tim pointed out that no CEO in their right mind would ever want their organization to be like that as they would not be effectively managing risk. You only need this extra stuff when you actually need this extra stuff. If there's little risk, then this added process adds a lot of cost with no real value - you're just pissing away money.

This also applies for test coverage. There are always going to be untested parts of your system but when increasing the coverage you have to balance the cost with the value.

With test coverage, you get the value of higher quality software that's easier to change, but it follows the Law of diminishing returns. The effort required to get from 99% to 100% is huge... couldn't that be spent on something more valuable like adding business functionality or simplifying the system?

Personally, I'm most comfortable with coverage in the 80-90% region, but your mileage may vary.

Looking back at the SiteMesh HTML parser

Before talking about how the new SiteMesh HTML processor works (to be released in SiteMesh 3), I thought I'd write a bit about how the current parser has evolved since it's first attempt in 1999 - purely in the interest of nostalgia.

The original version used a bunch of regular expressions to extract the necessary chunks of text from the document. This was easy to get running, but very error prone as the matches had no context about where they were in a document. For example, a <title> element in a <head> block is very important to SiteMesh, however sometimes they appear elsewhere, such as in a comment, <script> or <xml> block.

This was dumped, in favour of a DOM based parser, which initially used JTidy to convert HTML to XHTML so it could be traversed as a standard DOM tree. Much nicer, but very slooow. Too slow, so I switched to OpenXML, an XML parser that was tolerant to nasty HTML, giving a slight boost to performance. I was much happier with OpenXML - even though it still added a fair amount of overhead and rewrote bits of HTML that I didn't want it to.

Annoyingly, not long after that, the OpenXML project merged with the IBM XML4J parser project, rebranded itself as the mighty Apache Xerces and promptly dropped support for HTML parsing. So now I was dependant on a library that no longer existed.

By this time, SiteMesh had been open-sourced, and along came Victor Salaman, who was the third user to discover it (after Mike Cannon-Brookes and Joseph Ottinger). He saw the potential but hated the parser. About three hours later, he'd produced his own version that used low-level string manipulation. It wasn't pretty, but it went like the clappers - twelve times faster than the OpenXML one, with the bonus feature of not rewriting great chunks of the document. This brought SiteMesh into the mainstream as it was now ready for use on high-traffic sites. 1.0 was released.

This parser really is the core of SiteMesh. It's been our friend thanks to its speed and reliability. It's been our enemy because of it's awkwardness to understand and change. For a couple of years it remained barely untouched, except when we occasionally poked at it from afar with a long pointy stick for the odd change. Three years later, Chris Miller and Hani Suleiman took the plunge and gave its guts an overhaul - making it six times faster! Very brave.

Despite its awkwardness, it proudly lived on and is still the primary ingredient of SiteMesh today. It's even been ported to VB.Net!

I've kept my eye on other HTML parsers, such as HotSAX, NekoHTML and TagSoup, always with the intention of implementing an easier to maintain parser, but I just couldn't get the performance to be anything like what Victor, Chris and Hani achieved.

The problem is that most HTML parsers try to represent an HTML document as tree of nodes, like XML. This makes sense as that's what HTML is meant to be, however, to do this, every single tag in a document must be analysed and balanced accordingly. This is hard, error-prone and adds a lot of overhead.

There's another approach though. The new parser focusses on ease-of-use and ability to customize, without compromising on performance and robustness. I hope you'll like it...

Update: Sorry, I forget to mention Hani in the original posting of this. how could I forget!

The road ahead for SiteMesh 3

Here's an update of what's in store for the upcoming SiteMesh releases and how they benefit you.

Firstly, there are a number of accumulated bugs that we're steadily working our way through. The recent 2.1 and 2.2 releases have been mostly bugfixes, and this will continue for the 2.x series, including those related to using MVC frameworks such as Struts and WebWork.

Meanwhile, SiteMesh 3 has been brewing. It's been four years since SiteMesh was first open-sourced (it existed for two years before that as closed-source) and in that time it hasn't really changed significantly. SiteMesh 3 is going to see the largest set of improvements since it was initially released.

Flexible HTML processing

The core of SiteMesh is based around an HTML parser that is very fast and tolerant to badly formed HTML, however at the cost of being extremely hard to extend.

SiteMesh 3 will contain a new parser, which is easy to customize, without compromising on performance and tolerance to malformed HTML. This will allow extensions to be written that can:

  • Extract user-defined properties from the page beyond the predefined ones from <title>, <meta>, <content>, etc.
  • Remove blocks of content from the page.
  • Transform HTML as the page is parsed.

SiteMesh will come bundled with extensions for popular tasks and it will be trivial to add your own. More on this in a follow-up entry.

Improved Velocity integration

This follows on from some work done by Atlassian and will allow a page to be generated using the Velocity API as an alternative to calling Servlet
RequestDispatchers and the Filter.

This offers significant performance improvements for applications that don't use JSP and allows more of SiteMesh to be used in environments outside of the Servlet container, which leads nicely on to the next feature.

Offline support with StaticMesh

There has been a lot of demand for using SiteMesh to generate web-sites in an offline manner. A common case for this is a simpler alternative to DocBook style tools, allowing documents to be authored in standard HTML capable word-processing tools (such as MS Word, OpenOffice and Mozilla Composer), giving you the full capabilities of a rich-text word-processor and without the need to learn a special markup/schema.

SiteMesh can then process these raw HTML files and generated another set of static HTML files with the appropriate presentation and navigation added.

Building upon the extended HTML processing capabilities, it will also be possible to do things like generate a table of contents, footnotes, and diagrams from inline syntax.

There have been at least three seperate incarnations of StaticMesh appear over the last few years. We hope to bring the best bits from each of these into the final version.

StaticMesh will have a simple API for configuration, bundled with a command-line wrapper and Ant task.

Backwards compatability

Just to ease your minds, you're not going to have to rewrite your applications to use SiteMesh 3. Great effort will be taken to ensure that backwards compatability is preserved. The library will have more features, but at the same time a lot of the old stuff can be simplified. Dependencies will be minimized and optional - for example, you will only need velocity.jar if you're actually using the Velocity stuff.

I'll be posting more information later.

Watch this space...

Joe's Backflipping for Autistic Research - time is nearly up...

A reminder of my challenge:

I've had six weeks (I'm now on week five) to learn to do backflips to raise money for the International Autistic Research Organization.

This includes:

1. A single back handspring
2. Three back handsprings in a row
3. A single back somersault

Full details, including descriptions of the moves, can be found in my previous blog entry.

When/Where?

I'm going to attempt this on Friday 8th October at 3pm. If you're in London, and want to come and watch it'll be at Finsbury Pavement - nearest tube Moorgate or Liverpool Street.

If you can't make it, there will also be a video available on the net, not long afterwards.

To those who have sponsored me

I've been completely overwhelmed by the generousity of the sponsors. I genuinely never expected to raise this amount of money. I'll reveal the total amount on the day. Of course, it's now down to me to actually do the stuff so you pay up :).

So, once again, THANK YOU!

After the challenge, I'll send out details of how to get the money to me.

To those who haven't sponsored me yet

There's still time! Just email me saying how much you'll sponsor for
1) A single back handspring
2) Three back handsprings in a row
3) A single back somersault

For example: "Joe, I will sponsor you £10, £20, £50" (or currency of your choosing).

Note that if I achieve more than one of these, you only pay the maximum amount.

My progress

My shoulder is slightly injured and I'm still feeling dizzy from the time I landed on the ground head-first. Other than that it's been going well. I learned the technique from some tutorials I found round the net, and have been practising every weekend on my bouncy castle.

The biggest problem I have at the moment is fear. No matter how many times I do it, I always get really nervous before launching and usually bottle out. I hope this doesn't happen on the day.

For safety reasons, I've decided that I will attempt the complete back somersault (i.e. with no hands) with an inflatable matress behind me to break my fall, rather than my neck. The back handsprings shall be on grass. I will also be wearing wrist supports.

SiteMesh 2.2 Released

I've just released SiteMesh 2.2 This release fixes a number of minor bugs. No code changes are required if migrating from 2.1.

The following improvements have been made:

  • The <excludes> tag in decorators.xml now takes into account ServletPath, PathInfo and QueryString.
  • Overhaul of the main Servlet Filter to remove unnecessary complexity and more gracefully handle situations where the order of calls on the ServletResponse, PrintWriter and ServletOutputStream occur in an awkward order.

Links:

Stay tuned for news on the cool new features coming up SiteMesh 3!...

Advanced SiteMesh

Sunil Patil has written an article about SiteMesh for ONJava.com:

Read Advanced SiteMesh.

ThoughtBloggers

Martin Fowler

Dan North

Aslak Hellesoy

Darren Hobbs

Geoff Oliphant

Mike Roberts

Chris Stevenson

Jon Tirsen

Loads More...

Agile Bloggers

Ken Arnold

Ward Cunningham

Brian Marick

Robert Martin

Bret Pettichord

Java Bloggers

Ara Abrahamian

Mike Cannon-Brookes

Vincent Massol

Bob McWhirter

Rickard Oberg

Joseph Ottinger

James Strachan

Hani Suleiman

Communities

eXtreme Tuesday Club (XTC)

Thursday GeekSpeek

ThoughtWorks GeekNight

London Java Meetup

The Codehaus

[RSS | RDF]
© 2001-2004, Joe Walnes

Powered by SiteMesh and Moveable Type.