At Automattic we’re always dogfooding; we’re always testing as we develop and therefore we’re always finding bugs.
When I come across one of these bugs I try to capture as much as I can about what I was doing at the time when it happened, and also capture some evidence (a screenshot, visual recording, error messages, browser console) of it happening. But even then on occasion I can’t work out how to reproduce a bug.
This doesn’t mean the bug isn’t reproducible; it’s just that because there’s so many factors in the highly-distributed, multi-level systems that we develop and test, this means I don’t know how to reproduce these exact circumstances, therefore I don’t know how to reproduce this bug.
The exact set of circumstances that may cause a bug are never entirely reproducible; just like you can never live the exact same moment of your life ever again.
So, when this bug occurred, perhaps the server was under a particular level of load at that exact moment in time. Or one of the load-balancers was restarting, or maybe even my home broadband speed was affected by my children streaming The Octonauts in the next room.
So it can seem almost like a miracle when you do know how to reproduce a bug.
But what do you do with all of those bugs that you haven’t been able to reproduce?
Continue reading “Should you raise bugs you don’t know how to reproduce?”
Software quality metrics are a very interesting topic, and from my experience, there doesn’t seem to be a widely used or accepted list of metrics that can be used to measure software quality. After many years of thought on the topic, and many years of trialing different metrics, I believe the number one metric that accurately measures software quality is defects in production. Quality software won’t include defects in production, so I believe that’s the metric we should use to measure whether testing is done successfully.
Various organizations I have worked in have used this metric in different ways. One organization called each production defect a ‘quality spill’. Another used a mean time to failure metric which is often used to measure the reliability of a production system, or machine. This could be used for example with your car and how long it takes to break down.
The issue I have with some other software quality metrics is that they motivate people the wrong way. For example, having a metric about bug count encourages testers to report bugs. But it can also encourage them to report bugs that aren’t bugs, or split one major bug into multiple bug tickets, so the metrics look good. Also, is a high bug count (in test) a bad thing? Doesn’t it mean you got all the bugs? Or does a low bug count mean the developers are doing a good job? Or perhaps you didn’t catch all the bugs? That’s why production defects is a true measure of software quality. No one wants bugs in production, they cause all sorts of headaches. In the last few days there have been numerous, embarrassing, public computer glitches, some related to the beginning of the year 2010. Have we become complacent after Y2K?
- 3 Jan 2010: “Businesses stung by BOQ computer bug” (link)
- 3 Jan 2010: “Bank of Queensland’s (BOQ) Eftpos terminals go down costing retailers thousands” (link)
- 3 Jan 2010: “Chaos as check-in problems affect Qantas” (link)
- 3 Jan 2010: “Flights delayed after check-in system malfunction” (link)
- 10 Dec 2009: “Computer glitch brings Brisbane trains to a standstill” (link)
- 16 Dec 2009: “Check-in failure sparks Brisbane Airport delays” (link)
- 16 Nov 2009: “Computer glitch delays Qantas flights” (link)
What’s interesting is the Amadeus system Qantas uses failed in November and failed again today. The lesson here is if you do discover bugs in production, make sure you fix them.