As far as I can tell, available tools for performing functional testing on web applications do not make it easy to reliably reproduce what happens when a real user interacts with the site. Admittedly, it's hard to get a comprehensive list of what tools are available and the relevant feature matrix, so there may be a killer solution that I haven't come across yet, and I hope that's the case. (For once, even Wikipedia is for want of information on the subject.)

I argue that a solution for performing functional testing on modern web applications should have the following properties:

  1. User input should be able to be simulated using native key and mouse events.
  2. Tests should be able to be written in JavaScript.
  3. Test code should be able to suspend execution while waiting for network or other events.
  4. Tests should be able to run in multiple browsers and on multiple platforms.
  5. Tests should not hardcode XPaths, element ids, or CSS class names.
  6. Test suites should be able to be run via a cron job.
First, I will explain why I believe each of these properties is important. Then I will describe a functional testing tool I built using Chickenfoot that exhibits most (but not all) of these properties. Finally, I will discuss what I think browser vendors should do to make it easier for third-parties to build better test frameworks.

User input should be able to be simulated using native key and mouse events

Most testing frameworks that I have looked at do something analogous to selenium-browserbot.js to emulate user input. JavaScript libraries like these use the browser's built-in API to create JavaScript Event objects and dispatch them. To this end, Internet Explorer offers createEventObject() and fireEvent() whereas other browsers (that comply with the DOM Level 2 Events Specification) provide createEvent() and dispatchEvent(). This is a sensible, clean solution for automating input events that can be wrapped in a uniform API that works across browsers.

Unfortunately, this does not accurately recreate what happens when a user clicks the mouse or presses a key. For example, on Firefox, when automating a mouse event, the event that is dispatched has null values for Firefox's custom rangeParent and rangeOffset properties. If a user performs a real click with the mouse, those properties identify a collapsed Range that corresponds to the point in the document where the user clicked. (This is invaluable if you are trying to create a click-to-edit interface and want to easily determine where to position the cursor.) Although such differences may seem small, it would make such a click-to-edit interface impossible to test using a framework that automates its events using the browser's JavaScript API.

To be fair, to build a framework that offers support for native events that works on all browsers and platforms is not easy. It requires writing low-level native code for multiple platforms, which is not something many individual developers know how to do. The most accessible cross-platform solution that I have found for doing this sort of thing is to use java.awt.Robot. I tried to create a trusted applet that could be embedded in any web page so it could be scripted via LiveConnect so that JavaScript test code could use it. Unfortunately, it did not have the same level of fidelity on all browsers.

While I was at Google, I spent a bit of time talking to Simon Stewart who owns the WebDriver project. Simon is a smart guy who really gets it when it comes to web application testing, and he took on the inglorious task of writing the native code for multiple platforms to build a test framework that did honest automation of input events. It appears that since WebDriver was introduced on the Google Open Source blog, its code has started to be migrated into the Selenium repository, also hosted on Google Code.

Although I was disappointed with Selenium's original approach at automating input events, I now have high hopes for Selenium 2 which will include Simon's work! By comparison, any framework that advertises its ability to work with forms as its main strength likely does things such as document.getElementById('input_name').value = 'simulated input', which is not at all what happens when a user types in a field. Starting with a framework that provides true emulation for low-level events makes it possible to simulate everything else.

Tests should be able to be written in JavaScript

Unless your web application is written in Flash or runs as a Java applet, at some level, it is executing JavaScript code. It seems pretty logical that the application code and test code should be able to be written in the same language, so being able to write tests in JavaScript is a must. This guarantees that any features of the language exercised by the application code can be verified in the test code. It also makes it easier for developers to write tests since they will already be comfortable with the testing language.

Though there are many promising things about Selenium 2, the current version of Selenium has support for just about every popular programming language except JavaScript. What gives? Was it really more important to add support for Perl bindings than JavaScript ones? No, but JavaScript bindings are generally harder to write because the most convenient JavaScript interpreter to use is the one built into the browser.

JavaScript executed in a web page by a browser is sandboxed and has no access to things such as the native event queue. JsUnit leverages the browser's interpreter as I describe, and so long as your tests behave within the constraints of the sandbox, JsUnit works quite well and no special work needs to be done to port tests to other web browsers. However, as the next section will demonstrate, writing tests often requires functionality that the sandbox cannot provide.

Test code should be able to suspend execution while waiting for network or other events

Even though JavaScript does not support threads, that does not mean that all logic will be executed synchronously. For example, an XMLHttpRequest is often configured to call a function asynchronously once it has finished loading data over the network. Consider the following application code:
/** Redraws a menu. Returns true if successful; otherwise, returns false. */
var updateMenu = function() {
  var menu = document.getElementById('menu');
  if (menu) {
    redrawMenu(menu);
  }
  return !!menu;
}

menuButton.addEventListener('click', function(e) {
    if (!updateMenu()) {
      // If the menu element was not available to be updated, then another
      // event handler in the current chain will add it to the DOM, so
      // defer updateMenu() until that happens.
      setTimeout(updateMenu, 0);
    }
  }, false);
It is common to use setTimeout() to work around timing bugs in browser rendering. Subtleties like these are particularly important to test because it is so easy for them to regress. Ideally, the following would be a suitable test:
assertFalse(isMenuUpdated());
menuButton.dispatchEvent(e /* a click event */);
assertTrue(isMenuUpdated());
If the browser being tested relies on the code path that uses setTimeout(), then this JavaScript test will likely fail because dispatchEvent() will synchronously call menuButton's event listener which will schedule the timeout and return immediately. Then assertTrue(isMenuUpdated()) will be called, the assertion will fail, and now that the testing code's thread of execution has terminated, the callback scheduled by the timeout will run. Unfortunately, it is too late because the test has already failed.

Ideally, it would be possible to write the test as follows:

assertFalse(isMenuUpdated());
menuButton.dispatchEvent(e /* a click event */);
sleep(10); // suspend execution for 10ms and yield to the browser's JS thread
assertTrue(isMenuUpdated());
This would temporarily give control back to the browser, giving the callback scheduled by the timeout a chance to run. When the test resumes, the assertion will succeed as expected.

Unfortunately, the existence of a sleep() function is antithetical to JavaScript's thread-free nature. If your JavaScript is running in the context of a trusted Firefox extension, there is support for working with threads from JavaScript. Chickenfoot uses this API to implement a sleep() command. I am not aware of equivalent solutions on other browsers.

Tests should be able to run on multiple browsers and on multiple platforms

Anyone who does web development knows that each web browser behaves differently -- just because a test passes on one browser does not mean that it is guaranteed to pass on the others. Considering that most large JavaScript codebases do some branching based on the browser in which they are running, it is impossible to get 100% code coverage if a test is only executed on one browser because only one of the browser-based branches will be followed.

For these reasons, it is imperative that a web application test framework be able to run the test suite on any configuration in which the web application itself may be run (this includes mobile web browsers!). For frameworks that run inside the browser sandbox, like JsUnit, this is trivially achieved. But for frameworks that need to access the chrome of the browser, like Selenium 2, this can be a lot of work.

Tests should not hardcode XPaths, element ids, or CSS class names

Tests that know about XPaths, element ids, or CSS class names are not functional tests. Ideally, a functional test should continue to pass if a developer changes any of these things in such a way that does not alter the behavior of the application from the user's perspective. Hardcoding these values makes tests brittle and harder to maintain.

If, for whatever reason, you cannot avoid this, I recommend honoring the Don't Repeat Yourself (DRY) principle as explained in The Pragmatic Programmer. Instead of hardcoding an id across your tests, create a utility function available to all tests and give it a declarative name:

function getSaveButtonElement() {
  // By giving this function a declarative name, it makes it easier to change
  // the implementation to use an XPath or other accessor, if need be, while
  // still maintaining the contract of the function.
  document.getElementById('save-button');
}
Consistently using the accessor in your tests will make them more readable and less brittle. (Note that small design decisions like this are good for software development in general, not just tests.)

This style also helps make tests more reusable, which can be particularly useful if you are maintaining multiple UIs for the same application (commonly, one designed for desktop browsers and another optimized for mobile ones). As the functionality of these UIs will likely overlap, if designed correctly, the same functional tests can be used to test both interfaces.

For example, suppose getSaveButtonElement() were part of some sort of TestDriver object and all tests in a suite were written to use TestDriver.getSaveButtonElement(). It should be possible to inject the appropriate TestDriver object based on the environment (desktop versus mobile) so that a test written as follows would apply to both interfaces:

assertFalse(isSaved());
var buttonEl = TestDriver.getSaveButtonElement();
click(buttonEl);
assertTrue(isSaved());
Ideally, test engineers who are not developers of the application under test should be able to write functional tests using only the API made available through a TestDriver like the one used in the above example.

Test suites should be able to run via a cron job

If a test suite can be run via a cron job, then it means that all of the setup required to run the suite has been encapsulated in some sort of script. Anyone on the development team should be able to run that script so that the overhead of running tests is as low as possible.

But despite your best efforts, that overhead can never be quite low enough, which is why running the suite automatically via a cron job at regular intervals is the only way to be absolutely sure that tests will get run and catch errors. The policy for how often tests should be run and how to handle test failures when they happen will vary from team to team, but at least having the ability to run tests from cron will empower your team to experiment and make those decisions.

More importantly, developers are more likely to write tests if they know that they will be run regularly (and their results will be published to other team members) because they can see the impact of their tests. When tests are not run regularly, developers may only write tests as a token gesture to claim that they tested their code before submitting it. Such tests do not get run regularly, go stale, and never pass again.

Functional testing using Chickenfoot

Chickenfoot is a Firefox extension I built for my Master's Thesis at MIT that enables end-user programmers to customize web pages and automate tasks they perform on the web. The research goal was to make it powerful yet accessible to end-user programmers, but whenever we presented it to people from industry, as soon as they heard the word "automation," they would always immediately ask if it could be used for testing.

At Google, I spent a lot of time developing a cross-browser click-to-edit solution for Google Tasks. Before Firefox 3 came out, Firefox lacked support for content-editable elements, so I made each task a DIV and would shimmy a textarea on top of it whenever it was clicked, putting the cursor in the appropriate place. (You can still observe this behavior on Firefox 2 today, though I would not be surprised if it gets dropped at some point. Google Wave did not even attempt to support Firefox 2, presumably because of its lack of support for content-editable elements.)

Without getting into the details, this was a complicated feature to implement, so coming up with a comprehensive regression test suite was critical. I tried to use Selenium, but discovered that it could not faithfully automate pressing the up and down key. That is, the key event would fire, triggering the appropriate JavaScript listeners, but because it was not a native event, the cursor did not actually move through the textarea. I could have used JavaScript to position the cursor myself, but the movement of the cursor was the thing I was trying to test!

It became apparent that native input events were a necessity, so I used LiveConnect in conjunction with Java's java.awt.Robot. As mentioned in the first section, I could not get Robot to work reliably via an applet in all browsers, but it worked well in Firefox, and since that was the only place Chickenfoot was available, it was good enough for me.

At first, using Robot was somewhat comical because it actually took over your mouse and keyboard, so it was impossible to use either of them while the tests were running. My life turned into the "Compiling" xkcd comic because I would take a break to play Guitar Hero whenever I kicked off the Chickenfoot test suite. Months later, a colleague wrote a Python script to run the tests in VNC so I could use my machine to do other things while the tests were running. My Guitar Hero skills suffered dramatically.

Because we were using Chickenfoot, tests were written in JavaScript and could access objects in the page directly. Because the test code was executed in the privileged environment of Firefox's browser chrome, special functions such as sleep() as described in the third section were available.

We adopted the design pattern described at the end of section five where the application would expose an object named TestDriver and the test code restricted itself to TestDriver's methods. While developing tests, it might not be clear what sorts of methods we would need to add to TestDriver, but using Chickenfoot as a REPL made it efficient to experiment with the page and to determine the best way to implement a new TestDriver method. The test suite evolved to include its own meta-language where TestDriver.setTasks('A > B C* _ D'); meant "clear the task list and create a task with two sub-tasks, skip a line, create a fourth task, and put the cursor at the end of the third task." The implementation of setTasks() fired off all of the appropriate input events to make that happen, so whatever the test did could easily be reproduced by a developer using a real keyboard and mouse.

As you may have guessed, the one important property that this system lacked was the ability to run tests on other browsers. A critical data-loss bug that only existed on Safari almost made it live to production for exactly this reason (fortunately, the QA team caught it in time). After talking to Simon Stewart, I started working on a parallel implementation of TestDriver in Java that could be injected if the JavaScript was evaluated using Rhino while running the test on WebDriver. Unfortunately, I did not complete that work before I left, so I don't know how well such a solution would have worked, in practice. However, I still find the idea of being able to develop tests quickly using the REPL in Chickenfoot and then having them automatically work in other browsers particularly attractive.

How browser vendors can help

If I were a browser vendor, I would be thinking hard about how to make my browser the best one for doing functional testing. Think about what Firebug has done for Firefox. No matter how much faster Chrome gets, I'm still going to have to use Firefox at least some of the time because its web developer tools are so far superior to everyone else's. Even though the full Firebug extension is only available in Firefox (and therefore cannot help me debug IE-specific issues), it is still incredibly valuable to me. Similarly, if Firefox had an extension that was exceptionally better than all other functional testing tools, I would use it even though it would not help me with cross-browser testing. Ideally, it would challenge the other browser vendors to make similar offerings so that ultimately using the best tool would not be at odds with testing multiple browsers.

In terms of building a tool that satisfies all of my properties, Firefox is likely in the best position because it already uses JavaScript as its scripting language, so creating JavaScript bindings for its test tool should be straightforward. As demonstrated by Chickenfoot, it already exposes appropriate APIs for suspending execution while other events, such as network activity, continue.

Although it appears that LiveConnect will continue to be supported, it would be better if a test framework did not require the Java plugin, so supporting an API equivalent to Robot's natively would be superior. Ideally the API would funnel events into the native queue without taking over the user's mouse and keyboard, avoiding the issue I ran into that could only be solved with VNC.

Finally, it would be great if the browser could be kicked off from the command line in some sort of headless mode that would just run the test and print the results to standard out:

./firefox --test test_file.js --output=json -Xusername=foo -Xpassword=bar
This would start Firefox with a clean profile and run test_file.js in the privileged JavaScript context that would have access to all of the testing utilities and other browser chrome. Command-line arguments specified by -X would be available to the test, as well, so things like usernames and passwords would not have to be hardcoded into the tests. Ideally, it would also be possible to run a test that can manage multiple instances of Firefox with different profiles so that applications such as real-time chat can also be tested. (Multiple profiles are needed for applications such as Gmail that only enable one user to be logged in per browser because of how cookies are used.)

The list of feature requirements for such a tool may be endless, but if Mozilla put the basics in place (built-in support for injecting native events and running the browser headless with a clean profile), it is likely that third-parties would rally around it to build testing tools on top of it. Arguably by providing an API for a scriptable debugger, Firefox (intentionally or not) positioned itself as the most attractive platform for a tool like Firebug. Just like that dead guy in Field of Dreams said, "If you build it, he will come."

Despite the perceived dominance of Selenium in the area of functional web testing, I think the space is still wide open and that browser vendors are in the best position to move things forward. By building a framework with the properties I describe, developers will be able to emulate the user experience more precisely, ultimately enabling them to write better tests and deliver higher-quality web applications. It is in the vendors' interest to become the optimal platform for testing because the browser with the best tools is likely to be the browser on which web applications are tested the most. This in turn increases the perception that the browser provides the best overall user experience because the majority of applications will have been tested (and are therefore guaranteed to work) with that browser.