I read a LinkedIn blog post from 2015 by Keqiu Hu from LinkedIn about flaky UI tests. He explains how they fixed their flaky UI tests for the LinkedIn app. Among other things they implemented what they called the “Trunk Guardian service” which runs automated UI tests on the last known good build twice and if the test passes on the first run but fails on the second it is marked as ‘flaky’ and disabled and the owner is notified to fix it or get rid of it. I wondered what your thoughts were on such a “Trunk Guardian service” – if the culture / process was in place to solve the other issues that create flaky tests, could such a thing be worth the effort to implement? Article: Test Stability – How We Make UI Tests Stable
Thanks for the link to the article.
I’d be interested to read an updated version of that article since it’s nearly 2 years old and whilst often nothing can change in 2 years everything can also change in 2 years: especially in the world of mobile test automation.
If you’ve seen my GTAC talk or read my article you’d know I don’t really believe in any such thing as flaky tests (instead I believe we have flaky systems that we write tests for). So, in my experience, fixing systemic issues results in more stable tests.
LinkedIn’s approach of running tests twice against the last known ‘good’ build (and removing subsequently failed tests as ‘flaky’) is possibly needed when you have 250 automated UI tests like they do (despite reducing this number from 700).
But unless every single one of these tests has zero dependencies on anything else, like a network connection, a database server, a filesystem, an operating system update, or an emulator/physical device, which I doubt they would otherwise they’d be called unit tests, I think this could be misleading. As what if for example, on the second run the database is legitimately running out of memory which is a genuine problem but because it was fine during the first run then it has become a problem with your ‘flaky’ test. What if on the second run an operating system patch was automatically applied to your server which caused half the tests to fail and be disabled in your test suite?
My approach would be to replace these 250 automated UI tests with fully stubbed unit tests that test the user interface without any reliance on external dependencies so these naturally aren’t flaky. To test the application works in real life conditions I’d write a dozen, or less, true end-to-end automated tests that test all the units work together as one in real life circumstances. Real life circumstances can be flaky: networks go down, emails can get lost, mobile phones lose signal. These end-to-end tests can help point this out to you: and when they do, you could listen to these signals and work out where the flaky parts of your system are.
I hope this helps.