I read a LinkedIn blog post from 2015 by Keqiu Hu from LinkedIn about flaky UI tests. He explains how they fixed their flaky UI tests for the LinkedIn app. Among other things they implemented what they called the “Trunk Guardian service” which runs automated UI tests on the last known good build twice and if the test passes on the first run but fails on the second it is marked as ‘flaky’ and disabled and the owner is notified to fix it or get rid of it. I wondered what your thoughts were on such a “Trunk Guardian service” – if the culture / process was in place to solve the other issues that create flaky tests, could such a thing be worth the effort to implement? Article: Test Stability – How We Make UI Tests Stable
Continue reading “AMA: Trunk Guardian Service?”
If you saw my talk at GTAC last year, ‘your tests aren’t flaky‘, then you’re probably aware of my view on flaky tests actually being indicative of broader application/systems problems that we should address over making our tests less flaky.
But what if you’re in a situation where you work with a system where you can’t feasibly improve the reliability? Say you’ve got a domains page that should show you a list of available domains but since it’s using an external third-party service it sometimes just shows nothing?
Continue reading “Prioritising Test Reliability over Perfection”
I enjoyed your GTAC presentation on “your tests are not flaky“. How did you pick your topic?
Firstly thanks for the compliment, you’re very kind.
When I came up for the topic I was working on a system where we were practicing continuous delivery by frequently doing production releases. As we began releasing more frequently the business expected this and so the reliability of our automated tests became more important. We wouldn’t release on a failed build since we were working on a high volume eCommerce site where a small bug could cause an outage costing a very large amount of revenue. We didn’t have a team of testers to fall back onto for any manual regression testing, so we were 100% dependent on our automated tests.
Even though we were clever about building testability into our system, we still had too many full-stack automated tests which would create non-deterministic results.
I believe everyone looks at the same thing slightly differently as we each have a unique lens that we see our world through and everyone’s lens has varying degrees of difference:
“Each of us tends to think we see things as they are, that we are objective. But this is not the case. We see the world, not as it is, but as we are—or, as we are conditioned to see it. When we open our mouths to describe what we see, we in effect describe ourselves, our perceptions, our paradigms.”
~ Stephen R. Covey
As people who were developing and maintaining tests, we were looking at our non-deterministic tests as the test’s fault. What we didn’t do was look through another lens to see that it could actually be the fault of our system as we had built it instead.
This aha! moment struck me when we released a bad build to production that had passed all automated QA by someone re-running our automated tests a number of times (to get them to pass).
We were blinded by perceived ‘test flakiness’: we refused to believe our problems were something else, so I thought it would be a good topic to present. From the feedback I received both at and after the event, it seems I am very much not alone.
I am pleased to announce I will be speaking at the Google Test Automation Conference (GTAC) in Boston on the 10th of November.
My talk is titled ‘Your tests aren’t flaky’
Flaky tests are the bugbear of any automated test engineer; as Alister says “insanity is running the same tests over and over again and getting different results”. Flaky tests cause no end of despair, but perhaps there’s no such thing as a flaky or non-flaky test, perhaps we need to look at this problem through a different lens. We should spend more time building more deterministic, more testable systems than spending time building resilient and persistent tests. Alister will share some examples of when test flakiness hid real problems underneath the system, and how it’s possible to solve test flakiness by building better systems.
I hope to see some of you there!