Deciding to have lots of children and lots of tests is still fun later on

I recently saw a paraphrased quote by James Bach from a testing meetup in Sydney.

Deciding to have lots of (automated) checks [sic: tests] is like deciding to have lots of children. It’s fun at first, but later…

I read it a number of times and each time I read it I disagreed with it a little more.

As a proud father of three beautiful boys, I truly believe having lots of children is fun at first AND fun later on. Sure, having lots of kids is hardest thing you’ll ever do and continues to be hard as each day goes by, but hard and fun aren’t opposites or mutually exclusive whatsoever1; I’ve actually found them to be strongly correlated (think of your funnest job: was it easy?). So don’t let anybody put you off having lots of kids ever, because they are still loads of fun later on (assuming you’re not scared of hard work). I love my boys: they’re the funnest people I know and they get funner every day.

As a developer of software, I also believe having lots of automated tests is fun later on, on the proviso that you’ve put thought into them upfront. I truly believe the only way to make sustainable software that you can change and refactor with confidence is to develop it using self-testing code. Sure, having too many automated e2e tests can be a PITA2 but I’d choose lots of automated tests over no or very few automated tests any day of the week3. Again, don’t let someone put you off having lots of automated tests: just do them right!


Addendum

I asked James Bach on Twitter about his quote (and how many children he has, the answer is one), and in the typical self-righteous context driven testing ‘community’ style I was called ‘reckless’ for choosing to have three beautiful boys with my lovely wife.

It didn’t end there with other members of the ‘community’ doing what they do4 and taking the opportunity to jump in uninvited, attack me for even wondering how someone with only one child can comment on having lots of children, and try to intimidate me by accusing me of using ‘ad-hominem’ falacies/attacks against James Bach (they like big words).

This entire episode reaffirms my choice to have nothing whatsoever to do with the context driven testing ‘community’ and anyone who associates themselves with it (which started by me deleting my twitter account so they can’t attack me or have anything to do with me).

My final word of warning to those of you who still consider yourself part of that ‘community’, a comment about ‘context-driven testing':

“I chose not to engage in those dogmatic discussions. I once had a job interview where the term context-driven led one of the devs to do some googling. I had to defend myself for affiliating as he’d found some right contentious and dogmatic stuff and wondered if I were some kind of extremist for including that term in my resume. It’s no longer in my resume, FWIW.”

[source]

Footnotes

[1] I recently read that happiness and unhappiness aren’t actually the opposite of one another: you can be both happy and unhappy at the same time.

[2] In case you didn’t know: PITA means ‘pain in the ass’, and lots of end to end tests are a pain in the ass. There’s lots of articles on here about why, the most recent one being about Salesforce.com and its 100,000 e2e tests.

[3] FWIW most codebases I have worked on have had zero to little automated tests, so I don’t think having too many automated tests is our common industry problem.

[4] It’s not hard to find examples of where members of this ‘community’ rally against and intimidate a particular person they disagree with on twitter, for examples: here, here, here, here, here, etc. I personally know a fellow tester who had a very similar negative experience to me a couple of years ago and has since distanced herself also.

100,000 e2e selenium tests? Sounds like a nightmare!

This story begins with a promo email I received from Sauce Labs…

“Ever wondered how an Enterprise company like Salesforce runs their QA tests? Learn about Salesforce’s inventory of 100,000 Selenium tests, how they run them at scale, and how to architect your test harness for success”

saucelabs email

100,000 end-to-end selenium tests and success in the same sentence? WTF? Sounds like a nightmare to me!

I dug further and got burnt by the molten lava: the slides confirmed my nightmare was indeed real:

Salesforce Selenium Slide

“We test end to end on almost every action.”

Ouch! (and yes, that is an uncredited image from my blog used in the completely wrong context)

But it gets worse. Salesforce have 7500 unique end-to-end WebDriver tests which are run on 10 browsers (IE6, IE7, IE8, IE9, IE10, IE11, Chrome, Firefox, Safari & PhantomJS) on 50,000 client VMs that cost multiple millions of dollars, totaling 1 million browser tests executed per day (which equals 20 selenium tests per day, per machine, or over 1 hour to execute each test).

Salesforce UI Testing Portfolio

My head explodes! (and yes, another uncredited image from this blog used out of context and with my title removed).

But surely that’s only one place right? Not everyone does this?

A few weeks later I watched David Heinemeier Hansson say this:

“We recently had a really bad bug in Basecamp where we actually lost some data for real customers and it was incredibly well tested at the unit level, and all the tests passed, and we still lost data. How the f*#% did this happen? It happened because we were so focused on driving our design from the unit test level we didn’t have any system tests for this particular thing.
…And after that, we sort of thought, wait a minute, all these unit tests are just focusing on these core objects in the system, these individual unit pieces, it doesn’t say anything about whether the whole system works.”

~ David Heinemeier Hansson – Ruby on Rails creator

and read that he had written this:

“…layered on top is currently a set of controller tests, but I’d much rather replace those with even higher level system tests through Capybara or similar. I think that’s the direction we’re heading. Less emphasis on unit tests, because we’re no longer doing test-first as a design practice, and more emphasis on, yes, slow, system tests (Which btw do not need to be so slow any more, thanks to advances in parallelization and cloud runner infrastructure).”

~ David Heinemeier Hansson – Ruby on Rails creator

I started to get very worried. David is the creator of Ruby on Rails and very well respected within the ruby community (despite being known to be very provocative and anti-intellectual: the ‘Fox News’ of the ruby world).

But here is dhh telling us to replace lower level tests with higher level ‘system’ (end to end) tests that use something like Capybara to drive a browser because unit tests didn’t find a bug and because it’s now possible to parallelize these ‘slow’ tests? Seriously?

Speed has always seen as the Achille’s heel of end to end tests because everyone knows that fast feedback is good. But parallelization solves this right? We just need 50,000 VMs like Salesforce?

No.

Firstly, parallelization of end to end tests actually introduces its own problems, such as what to do with tests that you can’t run in parallel (for example, ones that change global state of a system such as a system message that appears to all users), and it definitely makes test data management trickier. You’ll be surprised the first time you run an existing suite of sequential e2e tests in parallel, as a lot will fail for unknown reasons.

Secondly, the test feedback to someone who’s made a change still isn’t fast enough to enable confidence in making a change (by the time your app has been deployed and the parallel end-to-end tests have run; the person who made the change has most likely moved onto something else).

But the real problem with end to end tests isn’t actually speed. The real problem with end to end tests is that when end to end tests fail, most of the time you have no idea what went wrong so you spend a lot of time trying to find out why. Was it the server? Was it the deployment? Was it the data? Was it the actual test? Maybe a browser update that broke Selenium? Was the test flaky (non-deterministic or non-hermetic)?

Rachel Laycock and Chirag Doshi from ThoughtWorks explain this really well in their recent post on broken UI tests:

“…unlike unit tests, the functional tests don’t tell you what is broken or where to locate the failure in the code base. They just tell you something is broken. That something could be the test, the browser, or a race condition. There is no way to tell because functional tests, by definition of being end-to-end, test everything.”

So what’s the answer? You have David’s FUD about unit testing not catching a major bug in BaseCamp. On the other hand you need to face the issue of having a large suite of end to end tests will most likely result in you spending all your time investigating test failures instead of delivering new features quickly.

If I had to choose just one, I would definitely choose a comprehensive suite of automated unit tests over a comprehensive suite of end-to-end/system tests any day of the week.

Why? Because it’s much easier to supplement comprehensive unit testing with human exploratory end-to-end system testing (and you should anyway!) than trying to manually verify units function from the higher system level, and it’s much easier to know why a unit test is broken as explained above. And it’s also much easier to add automated end-to-end tests later than trying to retrofit unit tests later (because your code probably won’t be testable and making it testable after-the-fact can introduce bugs).

To answer our question, let’s imagine for a minute that you were responsible for designing and building a new plane. You obviously need to test that your new plane works. You build a plane by creating parts (units), putting these together into components, and then putting all the components together to build the (hopefully) working plane (system).

If you only focused on unit tests, like David mentioned in his Basecamp example, you could be pretty confident that each piece of the plane would be have been tested well and works correctly, but wouldn’t be confident it would fly!

If you only focussed on end to end tests, you’d need to fly the plane to check the individual units and components actually work (which is expensive and slow), and even then, if/when it crashed, you’d need to examine the black-box to hopefully understand which unit or component didn’t work, as we currently do when end-to-end tests fail.

But, obviously we don’t need to choose just one. And that’s exactly what Airbus does when it’s designing and building the new Airbus A350:

As with any new plane, the early design phases were riddled with uncertainty. Would the materials be light enough and strong enough? Would the components perform as Airbus desired? Would parts fit together? Would it fly the way simulations predicted? To produce a working aircraft, Airbus had to systematically eliminate those risks using a process it calls a “testing pyramid.” The fat end of the pyramid represents the beginning, when everything is unknown. By testing materials, then components, then systems, then the aircraft as a whole, ever-greater levels of complexity can be tamed. “The idea is to answer the big questions early and the little questions later,” says Stefan Schaffrath, Airbus’s vice president for media relations.

The answer, which has been the answer all along, is to have a balanced set of automated tests across all levels, with a disciplined approach to having a larger number of smaller specific automated unit/component tests and a smaller number of larger general end-to-end automated tests to ensure all the units and components work together. (My diagram below with attribution)

Automated Testing Pyramid

Having just one level of tests, as shown by the stories above, doesn’t work (but if it did I would rather automated unit tests). Just like having a diet of just chocolate doesn’t work, nor does a diet that deprives you of anything sweet or enjoyable (but if I had to choose I would rather a diet of healthy food only than a diet of just chocolate).

Now if we could just convince Salesforce to be more like Airbus and not fly a complete plane (or 50,000 planes) to test everything every-time they make a change and stop David from continuing on his anti-unit pro-system testing anti-intellectual rampage which will result in more damage to our industry than it’s worth.

Free yourself from your filters

One of the most interesting articles I have read recently was ‘It’s time to engineer some filter failure’ by Jon Udell:

“The problem isn’t information overload, Clay Shirky famously said, it’s filter failure. Lately, though, I’m more worried about filter success. Increasingly my filters are being defined for me by systems that watch my behavior and suggest More Like This. More things to read, people to follow, songs to hear. These filters do a great job of hiding things that are dissimilar and surprising. But that’s the very definition of information! Formally it’s the one thing that’s not like the others, the one that surprises you.”

Our sophisticated community based filters have created echo chambers around the software testing profession.

“An echo chamber is a situation in which information, ideas, or beliefs are amplified or reinforced by transmission and repetition inside an “enclosed” system, often drowning out different or competing views.” ~ Wikipedia

I’ve seen a couple of echo chambers have evolved:

  • The context driven testing echo chamber where the thoughts of a couple of the leaders are amplified and reinforced by the followers (eg. checking isn’t testing)
  • The broader software testing echo chamber where testers define themselves as testers and are only interesting in hearing things from other testers (eg. developers are evil and can’t test)
  • The agile echo chamber where anything agile is good and anything waterfall is bad (eg. if you’re not doing continous delivery you’re not agile)

So how do we break free of these echo chambers we’ve built using our sophisticated filters? We break those filters!

Jon has some great suggestions in his article (eg. dump all your regular news sources and view the world through a different lens for a week) and I have some specific to software testing:

  • attend a user group or meetup that isn’t about software testing – maybe a programming user group or one for business analysts: I attend programming user groups here in Brisbane;
  • learn to program, or manage a project, or write CSS.
  • attend a conference that isn’t about context driven testing: I’m attending two conferences this year, neither are context driven testing conferences (ANZTB Sydney and JSConf Melbourne);
  • follow people on twitter who you don’t agree with;
  • read blogs from people who you don’t agree with or have different approaches;
  • don’t immediately agree (or retweet, or ‘like’) something a ‘leader’ says until you validate it actually makes sense and you agree with it;
  • don’t be afraid to change your mind about something and publicize that you’ve changed your mind; and
  • avoid the ‘daily me‘ apps like the plague.

You’ll soon be able to break yourself free from your filters and start thinking for yourself. Good luck.

Checking IS testing

The ‘testing vs checking’ topic has been in discussion for many years in the software testing community. Two very vocal participants are James Bach[1] and Michael Bolton[2].

“…we distinguish between aspects of the testing process that machines can do versus those that only skilled humans can do. We have done this linguistically by adapting the ordinary English word “checking” to refer to what tools can do.”

“One common problem in our industry is that checking is confused with testing.”

~ James Bach & Michael Bolton [1]

The issue I have with the checking vs testing topic is that it is dogmatic in implying that almost everyone around the world confuses checking with testing. Apparently unit testing is actually unit checking, the test pyramid is a check pyramid, test driven development is check driven development, and there is no such thing as automated testing, only automated fact checking.

“The “testing pyramid” is a simple heuristic that has little to do with testing. It’s called the testing pyramid because whomever created it probably confuses testing with checking. That’s a very common problem and we as an industry should clean up our language.”

~ James Bach [3]

We don’t need to clean up our language: we need to adapt, invent new language and move on.

The meaning of words aren’t static. ‘Literally’ originally meant in a literal way or sense but many people now use it to stress a point[4].  ‘Awful’ used to mean inspiring wonder but now has strong negative connotations[4]. Testing now means checking. Checking now means testing.

So perhaps instead of accusing everyone of confusing  ‘testing’ and ‘checking’, we move on, accept people call checking ‘testing’, and come up with another term to describe the value added human stuff we do on projects: you know, the questioning, studying, exploring, evaluating etc.

It’ll be much easier to educate everyone on some new terminology for pure human testing  exploratory testing based on intuition, instead of trying to get them to split their current view of testing in half and admit confusion on their behalf.

[1] Testing and Checking Refined: James Bach – 26 March 2013
[2] On Testing and Checking Refined: Michael Bolton – 29 March 2013
[3] Disruptive Testing Part 1: James Bach – 6 Jan 2014
[4] From abandon to nice… Words that have literally changed meaning through the years

Improving your agile flow

I’ve noticed two counterforces to flow on an agile team: rework and human multitasking. It’s common knowledge that rework is wasted effort, and human multitasking should be avoided as it reduces velocity through inefficient human context-switching, and can increase further errors through insufficient attention to tasks at hand.

But luckily there’s two simple things I have found that increase flow and reduce rework and multitasking.

User Story Kickoffs

It is essential that just before development work begins on every user story that a kickoff discussion occurs. This is a casual discussion around a computer between the business analyst, tester and any programmer who is working on the user story.

In my experience this takes about ten minutes standing around someone’s desk where we read aloud the acceptance criteria from Trello and discuss any ambiguities. We ensure that everything that is needed for the story to be considered complete and ready for testing is listed and that it’s not too large nor will take too long to complete.

We have special children’s sticker on our story wall which we put onto a story card that has been properly kicked off.

User story test handovers/shoulder checks

shoulder checks are essential

shoulder checks are essential

It’s also essential that as soon as development is complete that the tester and any programmers who worked on the story gather for a quick ‘shoulder check’ or test handover. This often involves letting the tester ‘play’ with the functionality on the programmer’s machine, and running through the now completed Trello acceptance criteria. Any misunderstandings or bugs can be discussed and resolved before the card becomes ready for testing.

We have special children’s sticker on our story wall which are then added to a story card that has been handed over/shoulder checked. The aim is to have two stickers on every story card in ready for test.

How these two simple activities improve flow

By conducting a user story kickoff every time it means that everyone working on developing the functionality has a common understanding of what is required and therefore there is a lot less chance of developing something that is not needed or misunderstood which requires subsequent rework.

By conducting a story test handover/shoulder check every time it means that obvious bugs and misunderstandings are raised immediately, so they can be fixed quickly before the programmer(s) moves onto working on new user stories. If discovered later these cause the programmer to multitask and context-switch between working on bug fixes and new functionality.

But I’m too busy testing stories…

I used to think that, but now I’ve got a personal rule that regardless of what I am doing or working on, I will drop it to attend a story kickoff or test handover. The benefits of me conducting these activities outweigh any work that I need to resume after these activities are complete.

Bonus Time… is it essential your bugs are fixed?

The great thing about agile software development is that developing something and testing something are a lot closer together… but they’re still apart. It’s more efficient to get someone to fix a bug whilst it’s fresh in their memory, but it’s even more efficient to not fix it at all.

What I am proposing is instead of raising medium/minor bugs against a story to be tested, raise them as bugs in the backlog to be prioritized. Depending on your organization, your business may not consider these important enough to fix, and therefore this saves you both rework and context-switching so you can continue on developing new functionality.

Software testing as a career

This post is part of the Pride & Paradev series.


What do I think of software testing as a career?


Software Testing is the Worst Career on the Planet

It’s amazing how quickly you tire of testing the same thing over again in Internet Explorer 7 because the programmers don’t use Internet Explorer and hadn’t thought to test it in that.

The harder you work at finding bugs the lazier the developers become at letting them through.

People constantly question you about why you’re still a software tester and haven’t turned into a programmer yet as though technical specialism is a natural career progression.

Lots of people call themselves software testers because they’ve played with software over a couple of years and attended a testing certification course over a couple of days. You’re grouped into the same group as those people.

Just when you think you’ve got a user story tested in three different operating systems, four devices and eight browsers, the programmer decides to ‘refactor’ their code, or switch to a more in vogue JavaScript framework, rendering all your testing work void because every screen you have tested no longer functions.

And they expect you to test it by the end of the iteration which happens to be today.

Despite what iterative development brings testing always gets squeezed and you’re expected to constantly go above and beyond to get things done.

Career progression means either becoming a specialist ‘automated tester’ or a test manager, one involves writing code, that no one ever sees, the other usually involves writing wordy template driven test strategies, again, that no one ever sees.

But the absolutely worst thing about being a software tester is the distrust you develop in software. You constantly see software at its worst: it’s hard to believe that any software can be developed that actually works without any issues. This means you hold a deep breath every time you hit submit on a credit card form, praying that it will actually work and not crash and charge your credit card three times.

Software Testing is the Best Career on the Planet

Some days I am amazed at how much fun my job is. I get to play with cool gadgets: I have four smart phones and an iPad on my desk, use three operating systems and eight browsers on a daily basis.

I get to look at software from all different angles: from a user’s point of view, from the business/marketing view, from a technical viewpoint and try all kinds of crazy things on it.

I get to really know and understand how a system works from end-to-end, and get to know its quirks and pitfalls. Finding bugs prevents them from being released into Production and causing someone else a great inconvenience.

I develop great relationships with programmers who like the feedback I give, and business people who I work with to develop acceptance criteria and discuss issues in business terms and how they will be effected.

I get to understand code, database schema, servers and browsers. I am involved in automating acceptance tests. I get to go to awesome software testing conferences around the world to meet other testers.

I get to tell my family about all the cool things I’ve tested and they get excited to occasionally see things I have worked on in the media etc.

It’s a really cool career.

Is test management wrong?

I was somewhat confused by what was meant by the recent article entitled “Test Management is Wrong“. I couldn’t quite work out whether the author meant Test Management (the activity) is wrong, Test Managers (the people) are wrong or Test Management Tools (the things) are wrong, but here’s my view of these three things:

Test Management (the activity): now embedded in agile teams;
Test Managers (the people): on the way out; and
Test Management Tools (the things): gathering dust

Let me explain with an example. Most organizations see the benefit of agile ‘iterative’ development and have or are in the process of restructuring teams to work in this way. A typical transformation looks like this:

Agile Transformation

Instead of having three separate larger ‘analysis’, ‘development’ and ‘test’ teams, the organization may move to four smaller cross functional teams consisting of say one tech lead, one analyst, one tester and four programmers.

Previously a test manager managed the testing process (and testing team) probably using a test management tool such as Quality Centre.

Now, each agile team is responsible for its own quality, the tester advocates quality and encourages activities that build quality in such as accurate acceptance criteria, unit testing, automated acceptance testing, story testing and exploratory testing. These activities aren’t managed in a test management tool, but against each user story in a lightweight story management tool (such as Trello). The tester is responsible for managing his/her own testing.

Business value is defined and measured an iteration at a time by the team.

So what happens to the Analysis, Development and Test Managers in the previous structure? Depending on the size of the organization, there may be a need for a ‘center of excellent’ or ‘community practice’ in each of the areas to ensure that new ideas and approaches are seeded across the cross-functional teams. The Test Manager may be responsible for working with each tester in the teams to ensure this happens. But depending on the organization and the testers, this might not be needed. This is the same for the Analysis Manager, and to a lesser extent, the Development Manager.

Step by Step test cases (such as those in Quality Center) are no longer needed as each user story has acceptance criteria, and each team writes automated acceptance tests written for functionality they develop which acts as both automated regression tests and living documentation.

So the answer the author’s original message: no I don’t think test management is wrong, we just do it in a different way now.