This will allow us to have global setup and teardown for tests.
For example, we can automatically clear caches after each test, to avoid
annoying intermittent failures like #1879 and #1886.
Instead of `puts`ing when the failure occurs save it until the error
message and print a prose description of what the failure was and the
output from the command. This makes the output from failing tests
significantly easier to read.