Testing [Technology Manifesto Series, Part 2] / by Chris Shaffer

Introduction / Unit Tests

These are the most well-covered layer of testing. We love them because they can be easily automated and run every time someone pushes code. They're great because they can also serve as documentation of how a bit of code is supposed to work. You can write them before you write your code and they'll guide you. And so on.

You'd be hard-pressed to find someone out there who's going to argue that unit tests are a bad idea, so I'm going to refrain from beating that dead horse other than to say that you want to make sure early on you've got an automated testing framework and you want to write unit tests, especially for complex and contained bits of logic.

However, they're not perfect. Plenty of applications have unit tests covering something that ends up breaking, anyway. I'm here to shoot down the overconfidence that comes from high code coverage on unit tests, and focus on some of these other forms of testing:

From least to most expensive, we've got:

  1. QA by design
    • Compilers/Linters
    • Other patterns
  2. Automated integration tests
    • Testing bigger pieces of code end-to-end
      • Including the parts that read or write to the database
      • Including configuration
    • Realistic environments
    • Bulk testing
  3. Manual testing, including
    • User-acceptance testing
    • Testing with a friend
    • Just open it up real quick and look at it before merging a change

QA By Design

The principle here is that, rather than your code and quality living in somewhat separate universes, you build certain controls into the architecture. Quality (and security) is a process, rather than something you slap on at the end.

Classic OOP patterns like abstract classes and access modifiers exist largely for this reason - make this function private so you don't accidentally call it from the wrong context later. Documentation and unit tests could handle that, but a compiler error is near foolproof - someone has to at least stop to think about what they're doing when they change a "private" to a "public".

NoSQL databases are great because you can just use the same models for your UI as you store in the database… but sometimes you still want that conversion layer. Sure, you could have your validation code make sure a user doesn't set the 'approved' flag on their own submission, but having a separate database model and view model makes those categorically difficult to write. Copying a property that the user shouldn't be allowed to alter from their form input to a database model becomes a mistake you have to actively make, whereas forgetting to write validation code to filter those out leaves room for a passive mistake. Unit tests won't save you, here: if you forgot to write that validation code, you almost certainly forgot to write the unit test, too.

Compilers, type-safety, and linters play a role here, too. Sure, you could write unit tests for all of your arithmetic operations, but it's certainly less foolproof than doing them in a place where 1+1 always equals 2 and never 11. Compilers are bad (but free) unit tests.

Should I write this test?

Not every test you can think of needs to be written. Consider:

  1. The cost of writing the test, weighed against
  2. The likelihood that it'll break, times
  3. The cost of it breaking at the next layer

And, yes, the "next layer" includes the end user. It's far less important to test an internal or low-usage application where a breakage means someone compiling a quarterly report has a bad morning than it is to test an external one where a breakage could affect revenue or compromise security. A display bug is less expensive (when it's fixed, it's fixed) than a bug that causes data to get saved incorrectly (when it's fixed, you still need to migrate/fix the corrupt data that was produced in the interim).

Automated end-to-end tests

Even if you have unit tests covering a piece of code, you're still liable to have bugs when it comes to integration, deployment, and usability. Once you have an end-to-end integration test covering something, you're in a much stronger position. We still want unit tests because they let us discover issues faster than those slower-running or difficult to automate end-to-end tests. We still want unit tests because it's prohibitively expensive to write integration tests for everything.

But the fact remains: a perfect integration test obviates the need for a unit test. If you have an automated test that creates an HTTP client, posts data to an API, and investigates the response, you don't really need a unit test at the controller level with the same inputs/outputs. (you might still find that unit test useful, however, if it's breaking regularly and the unit test shaves a minute off of each iteration)

Yes, you do need to test the new code paths and cases that are unlocked by differing configuration files. XML and JSON are code, too, when they influence how your code runs. Ref data doesn't cease to require testing just because you moved it from a case statement to a database table.

One fairly common practice that I've never seen work and never understood is writing unit tests around database CRUD functionality, using some mock data store in place of a real database. What exactly are you testing? Use an actual database. A lot of CI tools these days have this functionality built in, but at its most basic, you can create an empty/dummy database somewhere and then give your CI in the credentials to that. Your first CI step might have to be to refresh the database, but that's fine. If your application depends on reading and writing to a database, you can't not test it - or only test it against an in-memory "database" you wrote yourself and only use in the unit tests - and expect to not miss things.

Realistic Environments

Ideally, your end-to-end tests will run in environments that mirror production as closely as possible. In reality, though, most of us don't have the scratch to set up things like load balancers in our CI environments to catch issues that arise from caching, replication, and race conditions. Those might be things you have to catch later down the line - just understand what gaps are going to be produced by any departure and be prepared to test them later down the line.

These terms tend to differ a bit from org to org, but these are the environments you need. The further down you get, the more important it is that your infrastructure closely resemble production's.

  • Development - this is unreleased code, pointed at fake data: Everything - code and data - is always changing, and you expect that some things won't work perfectly, because this is an active work zone.
  • QA - this is unreleased code, pointed at fake data: It's similar to the development environment, but adds the expectation that there's some level of stability, such that if something is broken, it's probably going to be broken in production as well.
  • CI - the process for your automated builds and tests. This usually looks like (or runs in) the QA/Development environment(s).
  • Test - this is unreleased code, pointed at a copy of production data: This is your opportunity to catch those things that don't break on idealized data, but will when they get to the messy real world, before imposing consequences on your users.
  • Staging - this is unreleased code, pointed at production data: Typically a short step in the life cycle, right before releasing to production. This may even be accomplished by taking production servers out of the production "pool" and rotating them back in after testing.
  • Production - this is released code, pointed at real data: Real life, what your customers use.

If everyone on your technology team is allowed access to production data - either because your organization is small, because your data is not very sensitive, or some combination of the two - you can get away without that QA environment.

If your application is read-only, you can skip Test and just spend more time in Staging. You might also be able to get away without Test for a while and use QA if you're confident in your ability to generate the same kind of entropy as real users - if you're building something relatively simple and self-contained. Once you start integrating less predictable third-party APIs, though, or have to deal with legacy formats of your own data, and the messiness of real data becomes hard to anticipate, you want this as a dedicated environment.

The key point with Test is that it needs to have the same security as production - both technology-wise and policy-wise - and ideally an additional layer (IP filtering, Active Directory login), in case a code change introduces a vulnerability in your own authentication.

Personal Branches

This is my answer to the infamous "works on my machine" conundrum. Before merging into master, a developer's branches are (in addition to being required to pass unit tests in CI) are deployed to either the QA or Test environment (or both) for testing and experimentation. This is especially useful for another team member who's not running their own development environment, usability testing, or anything that's going to require a manual look.

This started as a relatively complicated bit of custom code to ~build a new Docker image and change a load balancer’s config file to point mybranch.mycompany-test.com at said Docker~ and has gradually simplified. Azure has made it trivially simple with the idea of “deployment slots” — you can tie a branch to one in a few clicks, or just have everyone push to a pre-determined branch (chris.mycompany-test.com) when they want to QA or demo something in a stable environment. Some build tools, like BitBucket Pipelines, even let you write bash scripts to run on wildcard branch names — so you can deploy any branch to an author-specific location on every git push.

Bulk Testing

Also known as testing to exhaustion... Let's say you have a complex process that happens to … every user, or every record, or something along those lines. You want to regression test, but the historical data is loaded with more permutations than you can plan for in your unit tests. Let's also say that "correct" behavior may be ambiguously defined, or is up to you to define. You might build a harness that runs your code on every record (or a sufficiently large random subset), takes a few key features, outputs them as JSON, and then have it run the master and feature branch code against the universe and diff the results.

The impetus for this was a loan application which had previously been underwritten manually, with an incomplete set formalized business rules - if we make this change to our underwriting algorithm, how many people who'd gotten loans under the old methodology would we deny under the new one and vice versa? It's different here… should this person have gotten a loan? What's the rule that applied here that's not in my spec document? When you layer in the question of "did this person pay back their loan or fall behind?" it becomes a BI tool as much as a QA one.

I also might use a similar approach for things like: a data or document formatting migration, a computer vision application that's looking for a specific thing (street signs, etc.). It's basically A/B testing for the situations in which you have the data to A/B test without taking the inferior version out of the lab.

Is he really going to defend manual testing?

The point I'm making isn't that automated tests shouldn't replace manual tests (they should), the point is that, in fact, automated tests should replace manual tests and that's often not what happens.

The most extreme example - and everyone who's worked for a few years in software has probably seen this - is the person who never bothers compiling and running their code on their own machine, and when it doesn't work at all, defends their approach with, "of course it works; it passed all of the tests." Bonus points if they then ask, "do you not believe in test-driven-development?" or otherwise imply that it's everyone else's job to write tests that anticipate every mistake I might make while I Leroy Jenkins my way around the code base.

Automated tests should replace manual tests and that means you need to start with a test plan that you run through and understand manually, and then automate as much of it as possible. As with any other process you might automate, the first step to automating something is a checklist.

Users

Unit tests are important, but users don't care whether they pass or not. They don't care what percentage of code they cover. Users only care whether the product works; your integration, end-to-end, and manual tests mimic this best. While manual testing should never be where you spend the bulk of your resources, it is the final thing that absolutely has to be right in order to have a successful product.

You can’t choose not do manual testing. Not shouldn't, can't. If you don't do them, that means you're testing the product directly on your users. If you're not the first person to look at your finished product, your customer is. Every test up to that point is valuable only in so much as they help you achieve that more reliably, and/or more quickly and cheaply.

Friends

Unit tests are usually written by the code's author, and the test plan doesn't often enough get reviewed by anyone else. While that's good for test-driven-development and regression testing, you end up missing those most pernicious of bugs - the mistakes you make twice. Of course my unit tests didn't catch that thing I didn't know about or conceptually understand, because I didn't know about or conceptually understand it when I wrote my tests, either.

Writing unit tests for everything protects you when your code fails to achieve the expected result; but it's not enough to save you from failing to expect the correct result. It helps to look over not just code, but tests, test plans, and the finished software with someone else to catch these - and ideally include someone with domain knowledge in that review.

Just Run the Code and Look Real Quick

Plenty of companies out there do continuous deployment, and there are numerous examples of it working nearly flawlessly. But it doesn't work because those companies never do manual tests. At the very least, they look with their eyes when they first write an automated test to make sure it's testing what it's supposed to. The tester can be a robot, but at some point, a person needs to test the tester. Eventually, the surface of code that you need to manually test gets to be a smaller and smaller share of the code base, and that's a worthy goal.

The medical field learned long ago that passing unit tests (works in a petri dish) doesn’t mean a treatment is going to work in humans. It's recently learned that integration tests (reduces tumor size) don't necessarily mean it prolongs the length or increases the quality of life. It's starting to learn that replicating environments is important because what works on graduate student volunteers doesn't always work across all demographics. The software field needs to come to the same understanding.