Category Archives: Software Testing

100,000 e2e selenium tests? Sounds like a nightmare!

This story begins with a promo email I received from Sauce Labs…

“Ever wondered how an Enterprise company like Salesforce runs their QA tests? Learn about Salesforce’s inventory of 100,000 Selenium tests, how they run them at scale, and how to architect your test harness for success”

saucelabs email

100,000 end-to-end selenium tests and success in the same sentence? WTF? Sounds like a nightmare to me!

I dug further and got burnt by the molten lava: the slides confirmed my nightmare was indeed real:

Salesforce Selenium Slide

“We test end to end on almost every action.”

Ouch! (and yes, that is an uncredited image from my blog used in the completely wrong context)

But it gets worse. Salesforce have 7500 unique end-to-end WebDriver tests which are run on 10 browsers (IE6, IE7, IE8, IE9, IE10, IE11, Chrome, Firefox, Safari & PhantomJS) on 50,000 client VMs that cost multiple millions of dollars, totaling 1 million browser tests executed per day (which equals 20 selenium tests per day, per machine, or over 1 hour to execute each test).

Salesforce UI Testing Portfolio

My head explodes! (and yes, another uncredited image from this blog used out of context and with my title removed).

But surely that’s only one place right? Not everyone does this?

A few weeks later I watched David Heinemeier Hansson say this:

“We recently had a really bad bug in Basecamp where we actually lost some data for real customers and it was incredibly well tested at the unit level, and all the tests passed, and we still lost data. How the f*#% did this happen? It happened because we were so focused on driving our design from the unit test level we didn’t have any system tests for this particular thing.
…And after that, we sort of thought, wait a minute, all these unit tests are just focusing on these core objects in the system, these individual unit pieces, it doesn’t say anything about whether the whole system works.”

~ David Heinemeier Hansson – Ruby on Rails creator

and read that he had written this:

“…layered on top is currently a set of controller tests, but I’d much rather replace those with even higher level system tests through Capybara or similar. I think that’s the direction we’re heading. Less emphasis on unit tests, because we’re no longer doing test-first as a design practice, and more emphasis on, yes, slow, system tests (Which btw do not need to be so slow any more, thanks to advances in parallelization and cloud runner infrastructure).”

~ David Heinemeier Hansson – Ruby on Rails creator

I started to get very worried. David is the creator of Ruby on Rails and very well respected within the ruby community (despite being known to be very provocative and anti-intellectual: the ‘Fox News’ of the ruby world).

But here is dhh telling us to replace lower level tests with higher level ‘system’ (end to end) tests that use something like Capybara to drive a browser because unit tests didn’t find a bug and because it’s now possible to parallelize these ‘slow’ tests? Seriously?

Speed has always seen as the Achille’s heel of end to end tests because everyone knows that fast feedback is good. But parallelization solves this right? We just need 50,000 VMs like Salesforce?

No.

Firstly, parallelization of end to end tests actually introduces its own problems, such as what to do with tests that you can’t run in parallel (for example, ones that change global state of a system such as a system message that appears to all users), and it definitely makes test data management trickier. You’ll be surprised the first time you run an existing suite of sequential e2e tests in parallel, as a lot will fail for unknown reasons.

Secondly, the test feedback to someone who’s made a change still isn’t fast enough to enable confidence in making a change (by the time your app has been deployed and the parallel end-to-end tests have run; the person who made the change has most likely moved onto something else).

But the real problem with end to end tests isn’t actually speed. The real problem with end to end tests is that when end to end tests fail, most of the time you have no idea what went wrong so you spend a lot of time trying to find out why. Was it the server? Was it the deployment? Was it the data? Was it the actual test? Maybe a browser update that broke Selenium? Was the test flaky (non-deterministic or non-hermetic)?

Rachel Laycock and Chirag Doshi from ThoughtWorks explain this really well in their recent post on broken UI tests:

“…unlike unit tests, the functional tests don’t tell you what is broken or where to locate the failure in the code base. They just tell you something is broken. That something could be the test, the browser, or a race condition. There is no way to tell because functional tests, by definition of being end-to-end, test everything.”

So what’s the answer? You have David’s FUD about unit testing not catching a major bug in BaseCamp. On the other hand you need to face the issue of having a large suite of end to end tests will most likely result in you spending all your time investigating test failures instead of delivering new features quickly.

If I had to choose just one, I would definitely choose a comprehensive suite of automated unit tests over a comprehensive suite of end-to-end/system tests any day of the week.

Why? Because it’s much easier to supplement comprehensive unit testing with human exploratory end-to-end system testing (and you should anyway!) than trying to manually verify units function from the higher system level, and it’s much easier to know why a unit test is broken as explained above. And it’s also much easier to add automated end-to-end tests later than trying to retrofit unit tests later (because your code probably won’t be testable and making it testable after-the-fact can introduce bugs).

To answer our question, let’s imagine for a minute that you were responsible for designing and building a new plane. You obviously need to test that your new plane works. You build a plane by creating parts (units), putting these together into components, and then putting all the components together to build the (hopefully) working plane (system).

If you only focused on unit tests, like David mentioned in his Basecamp example, you could be pretty confident that each piece of the plane would be have been tested well and works correctly, but wouldn’t be confident it would fly!

If you only focussed on end to end tests, you’d need to fly the plane to check the individual units and components actually work (which is expensive and slow), and even then, if/when it crashed, you’d need to examine the black-box to hopefully understand which unit or component didn’t work, as we currently do when end-to-end tests fail.

But, obviously we don’t need to choose just one. And that’s exactly what Airbus does when it’s designing and building the new Airbus A350:

As with any new plane, the early design phases were riddled with uncertainty. Would the materials be light enough and strong enough? Would the components perform as Airbus desired? Would parts fit together? Would it fly the way simulations predicted? To produce a working aircraft, Airbus had to systematically eliminate those risks using a process it calls a “testing pyramid.” The fat end of the pyramid represents the beginning, when everything is unknown. By testing materials, then components, then systems, then the aircraft as a whole, ever-greater levels of complexity can be tamed. “The idea is to answer the big questions early and the little questions later,” says Stefan Schaffrath, Airbus’s vice president for media relations.

The answer, which has been the answer all along, is to have a balanced set of automated tests across all levels, with a disciplined approach to having a larger number of smaller specific automated unit/component tests and a smaller number of larger general end-to-end automated tests to ensure all the units and components work together. (My diagram below with attribution)

Automated Testing Pyramid

Having just one level of tests, as shown by the stories above, doesn’t work (but if it did I would rather automated unit tests). Just like having a diet of just chocolate doesn’t work, nor does a diet that deprives you of anything sweet or enjoyable (but if I had to choose I would rather a diet of healthy food only than a diet of just chocolate).

Now if we could just convince Salesforce to be more like Airbus and not fly a complete plane (or 50,000 planes) to test everything every-time they make a change and stop David from continuing on his anti-unit pro-system testing anti-intellectual rampage which will result in more damage to our industry than it’s worth.

Free yourself from your filters

One of the most interesting articles I have read recently was ‘It’s time to engineer some filter failure’ by Jon Udell:

“The problem isn’t information overload, Clay Shirky famously said, it’s filter failure. Lately, though, I’m more worried about filter success. Increasingly my filters are being defined for me by systems that watch my behavior and suggest More Like This. More things to read, people to follow, songs to hear. These filters do a great job of hiding things that are dissimilar and surprising. But that’s the very definition of information! Formally it’s the one thing that’s not like the others, the one that surprises you.”

Our sophisticated community based filters have created echo chambers around the software testing profession.

“An echo chamber is a situation in which information, ideas, or beliefs are amplified or reinforced by transmission and repetition inside an “enclosed” system, often drowning out different or competing views.” ~ Wikipedia

I’ve seen a couple of echo chambers have evolved:

  • The context driven testing echo chamber where the thoughts of a couple of the leaders are amplified and reinforced by the followers (eg. checking isn’t testing)
  • The broader software testing echo chamber where testers define themselves as testers and are only interesting in hearing things from other testers (eg. developers are evil and can’t test)
  • The agile echo chamber where anything agile is good and anything waterfall is bad (eg. if you’re not doing continous delivery you’re not agile)

So how do we break free of these echo chambers we’ve built using our sophisticated filters? We break those filters!

Jon has some great suggestions in his article (eg. dump all your regular news sources and view the world through a different lens for a week) and I have some specific to software testing:

  • attend a user group or meetup that isn’t about software testing – maybe a programming user group or one for business analysts: I attend programming user groups here in Brisbane;
  • learn to program, or manage a project, or write CSS.
  • attend a conference that isn’t about context driven testing: I’m attending two conferences this year, neither are context driven testing conferences (ANZTB Sydney and JSConf Melbourne);
  • follow people on twitter who you don’t agree with;
  • read blogs from people who you don’t agree with or have different approaches;
  • don’t immediately agree (or retweet, or ‘like’) something a ‘leader’ says until you validate it actually makes sense and you agree with it;
  • don’t be afraid to change your mind about something and publicize that you’ve changed your mind; and
  • avoid the ‘daily me‘ apps like the plague.

You’ll soon be able to break yourself free from your filters and start thinking for yourself. Good luck.

Checking IS testing

The ‘testing vs checking’ topic has been in discussion for many years in the software testing community. Two very vocal participants are James Bach[1] and Michael Bolton[2].

“…we distinguish between aspects of the testing process that machines can do versus those that only skilled humans can do. We have done this linguistically by adapting the ordinary English word “checking” to refer to what tools can do.”

“One common problem in our industry is that checking is confused with testing.”

~ James Bach & Michael Bolton [1]

The issue I have with the checking vs testing topic is that it is dogmatic in implying that almost everyone around the world confuses checking with testing. Apparently unit testing is actually unit checking, the test pyramid is a check pyramid, test driven development is check driven development, and there is no such thing as automated testing, only automated fact checking.

“The “testing pyramid” is a simple heuristic that has little to do with testing. It’s called the testing pyramid because whomever created it probably confuses testing with checking. That’s a very common problem and we as an industry should clean up our language.”

~ James Bach [3]

We don’t need to clean up our language: we need to adapt, invent new language and move on.

The meaning of words aren’t static. ‘Literally’ originally meant in a literal way or sense but many people now use it to stress a point[4].  ‘Awful’ used to mean inspiring wonder but now has strong negative connotations[4]. Testing now means checking. Checking now means testing.

So perhaps instead of accusing everyone of confusing  ‘testing’ and ‘checking’, we move on, accept people call checking ‘testing’, and come up with another term to describe the value added human stuff we do on projects: you know, the questioning, studying, exploring, evaluating etc.

It’ll be much easier to educate everyone on some new terminology for pure human testing  exploratory testing based on intuition, instead of trying to get them to split their current view of testing in half and admit confusion on their behalf.

[1] Testing and Checking Refined: James Bach – 26 March 2013
[2] On Testing and Checking Refined: Michael Bolton – 29 March 2013
[3] Disruptive Testing Part 1: James Bach – 6 Jan 2014
[4] From abandon to nice… Words that have literally changed meaning through the years

Improving your agile flow

I’ve noticed two counterforces to flow on an agile team: rework and human multitasking. It’s common knowledge that rework is wasted effort, and human multitasking should be avoided as it reduces velocity through inefficient human context-switching, and can increase further errors through insufficient attention to tasks at hand.

But luckily there’s two simple things I have found that increase flow and reduce rework and multitasking.

User Story Kickoffs

It is essential that just before development work begins on every user story that a kickoff discussion occurs. This is a casual discussion around a computer between the business analyst, tester and any programmer who is working on the user story.

In my experience this takes about ten minutes standing around someone’s desk where we read aloud the acceptance criteria from Trello and discuss any ambiguities. We ensure that everything that is needed for the story to be considered complete and ready for testing is listed and that it’s not too large nor will take too long to complete.

We have special children’s sticker on our story wall which we put onto a story card that has been properly kicked off.

User story test handovers/shoulder checks

shoulder checks are essential
shoulder checks are essential

It’s also essential that as soon as development is complete that the tester and any programmers who worked on the story gather for a quick ‘shoulder check’ or test handover. This often involves letting the tester ‘play’ with the functionality on the programmer’s machine, and running through the now completed Trello acceptance criteria. Any misunderstandings or bugs can be discussed and resolved before the card becomes ready for testing.

We have special children’s sticker on our story wall which are then added to a story card that has been handed over/shoulder checked. The aim is to have two stickers on every story card in ready for test.

How these two simple activities improve flow

By conducting a user story kickoff every time it means that everyone working on developing the functionality has a common understanding of what is required and therefore there is a lot less chance of developing something that is not needed or misunderstood which requires subsequent rework.

By conducting a story test handover/shoulder check every time it means that obvious bugs and misunderstandings are raised immediately, so they can be fixed quickly before the programmer(s) moves onto working on new user stories. If discovered later these cause the programmer to multitask and context-switch between working on bug fixes and new functionality.

But I’m too busy testing stories…

I used to think that, but now I’ve got a personal rule that regardless of what I am doing or working on, I will drop it to attend a story kickoff or test handover. The benefits of me conducting these activities outweigh any work that I need to resume after these activities are complete.

Bonus Time… is it essential your bugs are fixed?

The great thing about agile software development is that developing something and testing something are a lot closer together… but they’re still apart. It’s more efficient to get someone to fix a bug whilst it’s fresh in their memory, but it’s even more efficient to not fix it at all.

What I am proposing is instead of raising medium/minor bugs against a story to be tested, raise them as bugs in the backlog to be prioritized. Depending on your organization, your business may not consider these important enough to fix, and therefore this saves you both rework and context-switching so you can continue on developing new functionality.

Software testing as a career

This post is part of the Pride & Paradev series.


What do I think of software testing as a career?


Software Testing is the Worst Career on the Planet

It’s amazing how quickly you tire of testing the same thing over again in Internet Explorer 7 because the programmers don’t use Internet Explorer and hadn’t thought to test it in that.

The harder you work at finding bugs the lazier the developers become at letting them through.

People constantly question you about why you’re still a software tester and haven’t turned into a programmer yet as though technical specialism is a natural career progression.

Lots of people call themselves software testers because they’ve played with software over a couple of years and attended a testing certification course over a couple of days. You’re grouped into the same group as those people.

Just when you think you’ve got a user story tested in three different operating systems, four devices and eight browsers, the programmer decides to ‘refactor’ their code, or switch to a more in vogue JavaScript framework, rendering all your testing work void because every screen you have tested no longer functions.

And they expect you to test it by the end of the iteration which happens to be today.

Despite what iterative development brings testing always gets squeezed and you’re expected to constantly go above and beyond to get things done.

Career progression means either becoming a specialist ‘automated tester’ or a test manager, one involves writing code, that no one ever sees, the other usually involves writing wordy template driven test strategies, again, that no one ever sees.

But the absolutely worst thing about being a software tester is the distrust you develop in software. You constantly see software at its worst: it’s hard to believe that any software can be developed that actually works without any issues. This means you hold a deep breath every time you hit submit on a credit card form, praying that it will actually work and not crash and charge your credit card three times.

Software Testing is the Best Career on the Planet

Some days I am amazed at how much fun my job is. I get to play with cool gadgets: I have four smart phones and an iPad on my desk, use three operating systems and eight browsers on a daily basis.

I get to look at software from all different angles: from a user’s point of view, from the business/marketing view, from a technical viewpoint and try all kinds of crazy things on it.

I get to really know and understand how a system works from end-to-end, and get to know its quirks and pitfalls. Finding bugs prevents them from being released into Production and causing someone else a great inconvenience.

I develop great relationships with programmers who like the feedback I give, and business people who I work with to develop acceptance criteria and discuss issues in business terms and how they will be effected.

I get to understand code, database schema, servers and browsers. I am involved in automating acceptance tests. I get to go to awesome software testing conferences around the world to meet other testers.

I get to tell my family about all the cool things I’ve tested and they get excited to occasionally see things I have worked on in the media etc.

It’s a really cool career.

Is test management wrong?

I was somewhat confused by what was meant by the recent article entitled “Test Management is Wrong“. I couldn’t quite work out whether the author meant Test Management (the activity) is wrong, Test Managers (the people) are wrong or Test Management Tools (the things) are wrong, but here’s my view of these three things:

Test Management (the activity): now embedded in agile teams;
Test Managers (the people): on the way out; and
Test Management Tools (the things): gathering dust

Let me explain with an example. Most organizations see the benefit of agile ‘iterative’ development and have or are in the process of restructuring teams to work in this way. A typical transformation looks like this:

Agile Transformation

Instead of having three separate larger ‘analysis’, ‘development’ and ‘test’ teams, the organization may move to four smaller cross functional teams consisting of say one tech lead, one analyst, one tester and four programmers.

Previously a test manager managed the testing process (and testing team) probably using a test management tool such as Quality Centre.

Now, each agile team is responsible for its own quality, the tester advocates quality and encourages activities that build quality in such as accurate acceptance criteria, unit testing, automated acceptance testing, story testing and exploratory testing. These activities aren’t managed in a test management tool, but against each user story in a lightweight story management tool (such as Trello). The tester is responsible for managing his/her own testing.

Business value is defined and measured an iteration at a time by the team.

So what happens to the Analysis, Development and Test Managers in the previous structure? Depending on the size of the organization, there may be a need for a ‘center of excellent’ or ‘community practice’ in each of the areas to ensure that new ideas and approaches are seeded across the cross-functional teams. The Test Manager may be responsible for working with each tester in the teams to ensure this happens. But depending on the organization and the testers, this might not be needed. This is the same for the Analysis Manager, and to a lesser extent, the Development Manager.

Step by Step test cases (such as those in Quality Center) are no longer needed as each user story has acceptance criteria, and each team writes automated acceptance tests written for functionality they develop which acts as both automated regression tests and living documentation.

So the answer the author’s original message: no I don’t think test management is wrong, we just do it in a different way now.

The new QA: the Quality Advocate

“Quality is not an act, it is a habit.”
~ Aristotle

When I was recently writing about Quality Guardians/Gatekeepers for Pride and Paradev, I asked myself what does the term Quality Assurance actually mean?

In the world of software testing, there’s a lot of contention over the use of the term QA instead of testing. Often testers consider themselves to be Quality Assurance (QA) even though they don’t ensure quality, they just test it.

It actually makes more sense to call independent software testing after development Quality Control (QC) as it is simply verifying quality rather than ensuring it.

As previously stated, I believe that is is no longer sufficient to test quality in; agile software delivery teams need to build it in. A person performing a Quality Assurance role in an agile team can work closely with the team to strive to build quality in, but still can’t ensure quality actually exists (programmers can still check in bad code).

So, I propose we rename QA to mean Quality Advocate.

A Quality Advocate (QA) in an agile team advocates quality. Whilst their responsibilities include testing, they aren’t limited to just that. They work closely with other team members to build quality in, whether that be though clear, consistent acceptance criteria, ensuring adequate automated test coverage at the unit/integration level, asking for dev box walkthrough, or encouraging collaboration/discussion about doing better testing within the team.

Whilst the Quality Advocate promotes quality as part of their role: quality is everyone’s responsibility.

The benefit of doing this is that testing becomes more efficient for the Quality Advocate and a better product is produced overall as quality has been built in from the beginning.

“Quality means doing it right when no one is looking.”
~ Henry Ford

Addendum: Whilst writing this post, a colleague of mine suggested I use the term Quality Activist instead of Advocate. Whilst I like the sound of it, I think it’s a little too extreme: I want to promote quality, just not tie myself to office furniture over it.

Reactions to my technical testers post

I was surprised at some reactions to my ‘Do software testers need technical skils?’ post.

I was told that including quotes from Joel Spolsky undermined the contrast in my article as Joel apparently thinks testers are entry-level positions. This is supported by Joel’s 13 year old blog post, not the much more recent article I had included the quotes from.

I don’t believe Joel Spolsky does consider testing entry level; for instance, if you have a look at current QA openings at his company, you’ll see that they require ‘top notch testing skills‘.

But the most surprising reaction was this one:

I liken it more to asking whether paramedics should study medicine.

What surprised me the most was it being retweeted by James Bach, especially considering how I enjoyed the article he wrote recently about how useful he found his non-technical sister as a tester in developing a personal computer program.

“A tester of any kind can contribute early in a development process, and become better able to test, by pairing with a programmer regardless of his own ability to code.”

~ James Bach

Anyone who knows me will know that I am a technical tester myself. So if I was hiring a tester to be part of my agile team I would much prefer to have a technical tester than a non technical one.

But if I had to choose between an intelligent technical tester who wanted to do nothing but code automated test scripts, or an intelligent, curious tester without technical skills, I would choose the non technical tester every time.

Visualising software quality: using ink

Not so recently, Gojko Atzic wrote a blog post asking for reader’s suggestions on techniques to visualise software quality of a system in development. I’ve recently been giving this some thought and came up with the following idea.

A story

Early last year at a client site, I met a genuinely lovely person working as a tester. She did traditional manual testing of a large complex system being developed in a non-iterative (big bang) manner.

I noticed her clear red pen had a small label on it: a piece of paper with a date sticky taped on, so I asked her about what it meant. She told me how she hates waste and loves to use a pen in its entirety, and the date is a way to keep track of how long she’s been using the particular pen for.

We went on talking to understand what she used the red pen for. What she’d do was create lots of manual test cases in a template on her computer, and then print them all to create a large pile of paper when it came time to execute the tests. As she excuted these tests, her red pen would be used to mark failures on these test case printouts, and write notes about what the defects were for the failures. As the pen was clear, and you could see how much red ink remained, she then joked about how the pen was an indicator of how good the system she worked on was. She’d used lots of red ink from her red pen since the start of the year, so the system wasn’t good! Aha!

An Idea

I started reading some suggestions for visualising software quality. I see two problems with most of them: firstly, most are far too complex, and secondly, most rely on capturing detailed metrics which creates overhead onto itself.

What if you could have a lo-fidelity way to visualise software quality without creating any overhead? Perfect. Enter red and green ink.

A proposal for visualising software quality using red and green ink

Let me start by saying that this idea is freshly baked, possibly half cooked: I haven’t even tried it and I don’t know if it’ll work at all. But I think it’s cool and that’s why I am sharing it.

Imagine you’re working in a small cross-functional team developing a piece of software. You work as the tester on the team and have varied responsibilities: work with the business analyst and SME to define acceptance criteria, work with a developer to automate these acceptance criteria, and conduct exploratory (session-based) testing on individual user stories as they are completed.

At the start of the project, you’ll need three additional things:

  • Two brand new matching red and green pens with clear barrels (so you can see the ink)
  • A ream of blank white paper: roughly A4 or A3 sized (or whatever you can get your hands on)

Now you’re ready to visualise software quality

Each story has a set amount of time allocated to it for exploratory (session-based) testing. When you are about the start an exploratory testing session, you need to grab the two the pens and a couple of blank white sheets of paper. As you test, write your thoughts on the paper in either ink: good thoughts (me likey) are in green, bad thoughts (bugs, crashes, poor design etc.) are in red.

Instant feedback on software quality

As soon as the session is  complete, stick these sheets of paper on your wall, and talk to the team about them, explaining each red and green thought. The paper will instantly show what you think of the quality of the system: a predominantly green sheet is good, a predominantly red one is bad.

Longer term feedback on software quality

Over time, the ink remaining in each pen will paint a picture (excuse the pun) about the quality of your system. Are you using loads of red ink and not much green?

Thoughts?

As I mentioned, this is just an idea I recently had and I have no idea whether it’d be successful in visualising software quality. But I reckon it’d be fun.

Introducing the software testing ice-cream cone (anti-pattern)

As previously explained, I like using the software testing pyramid as a visual way to represent where you should be focusing your testing effort, and often switch between using a cloud or an Eye of Providence to represent the manual session-based tests at the top of the pyramid that you should use to supplement and test your automated tests.

I often see organizations fall into the trap of creating ‘inverted’ pyramids of software testing, and only yesterday did a colleague point out to me that if you invert my pyramid with the cloud, you end up with an ice-cream cone! So, introducing the software testing ice-cream cone (anti-pattern)!