AMA: JS vs Ruby

Butch Mayhew asks…

I have noticed you blogging more about JS frameworks. How do these compare to Watir/Ruby? Would you recommend one over the other?

My response…

I had a discussion recently with Chuck van der Linden about this same topic as he has a lot of experience with Watir and is now looking at JavaScript testing frameworks like I have done.

Some Background built an entirely new UI for managing sites using 100% JavaScript with React for the main UI components. I am responsible for e2e automated tests across this UI, and whilst I originally contemplated, and trialled even, using Ruby, this didn’t make long term sense for where the original WordPress developers are mostly PHP and the newer UI developers are all JavaScript.

Whilst I see merit in both views: I still think having your automated acceptance tests in the same language as your application leads to better maintainability and adoptability.

I still think writing automated acceptance tests in Ruby is much cleaner and nicer than JavaScript Node tests, particularly as Ruby allows meta-programming which means page objects can be implemented really neatly.

The JavaScript/NodeJS landscape is still very immature where people are using various tools/frameworks/compilers and certain patterns or de facto standards haven’t really emerged yet. The whole ES6/ES2015/ES2016 thing is very confusing to newcomers like me, especially on NodeJS where some ES6+ features are supported, but others require something like Babel to compile your code.

But generally with the direction ES is going, writing page objects as classes is much nicer than using functions for everything as in ES5.

Whilst there’s nothing I have found that is better (or even as good) in JavaScript/Mocha/WebDriverJS than Ruby/RSpec/Watir-WebDriver, I still think it’s a better long term decision for to use the JavaScript NodeJS stack for our e2e tests.

On why I left the Watir project

“You will find that it is necessary to let things go; simply for the reason that they are heavy. So let them go, let go of them. I tie no weights to my ankles.”

~ C. Joybell C.

A number of people have asked me about why I left the Watir Project last year, and up until now I haven’t been comfortable explaining why. But that was then and this is now.

There were two reasons why I left the Watir Project. The first is because of a particular member of the Watir team who likes to call himself the ‘Project Director’. I co-organized a conference in Austin with this person last year, for which I organized a Minesweeper Contest which was advertised as part of the conference. I wrote a presentation on my robot which I developed with a colleague here in Brisbane, I even had some entries from other attendees. I rehearsed the presentation here in Brisbane and myself and my colleagues were excited for me to be presenting this is Austin.

Whilst I made it clear numerous times that I wanted to present this, the co-organizer provided me no opportunity to do so. He ‘directed’ the schedule, and when it came to the end of the conference and there wasn’t any time left, he said it was my fault for not proposing it as an ‘open space’ topic even though it was an long advertised component of the conference, and he gave me no opportunity whatsoever to present it.

I was so embarrassed I still haven’t told anybody here in Australia that I didn’t actually present my Minesweeper Robot in Austin. When people asked me how it went, I had to lie and tell them it went well because I was so embarrassed. All because of one person controlling the agenda.

I hate being on bad terms with someone, it’s just not who I am, so I made an effort to recently contact this person to discuss and see whether a year in time has made him willing to talk about the situation and how we can move forward. He rudely dismissed me and didn’t want to talk to me so I take that as he hasn’t. That’s why I am finally comfortable writing this post.

Letting things go

“Some people believe holding on and hanging in there are signs of great strength. However, there are times when it takes much more strength to know when to let go and then do it.”

~ Ann Landers

The second reason I left Watir was that I believe things have a time and place, and when that time and place is up, it’s time to let go. Like your favorite pair of jeans you wear until they are faded to almost white and have holes in the crutch, it’s time to let them go.

The same applies for open source projects. You can’t keep contributing to an open source project forever. New, more enthusiastic, people come along and it’s hard but you need to let go and let them take over the reigns. You must. That’s how you can avoid having an open source ‘Project Director’ who hasn’t sent a project related email or written a blog post or line of code for almost a year.

I’ve let the Watir project go, I’ve let this person go, and I am a much happier person for it.

As a bonus, I presented my Minesweeper presentation locally here in Brisbane and it was very well received. Austin missed out.

Watir-WebDriver with GhostDriver on OSX: headless browser testing

GhostDriver has been released which means it is now easy to run reliable headless WebDriver tests on Mac OSX.

Steps to get working on OSX

  1. First make sure you have homebrew installed
  2. Run
    brew update


    brew install phantomjs

    which should install PhantomJS 1.8.1 or newer

  3. Run irb and start using GhostDriver!
    require 'watir-webdriver'
    b = :phantomjs
    b.goto ""
    b.url #""
    b.title #"Google"

I’ve tested it on a large test suite (123 scenarios) and it behaves the same as other browsers with full JavaScript support. It took 8m13s in total: surprisingly it is slightly slower than ChromeDriver (7m30s) in my testing, but a little faster than the Firefox Driver (9m33s).

Well done to all involved in this project. It’s great to see a reliable, realistic headless browser with full JavaScript support for WebDriver finally released.

And yes, in case you’re wondering, it does screenshots!

The sacred cow that is an open source project

Before I start, I want to make it clear that I am huge believer in and supporter of open source software, and I appreciate all the time and effort that people put in to make it so. I want to make open source software better and it is the only reason I write this post.


It all started a couple of months ago when I made what I considered an off hand remark about how I felt the Internet Explorer Webdriver driver was unreliable. Unfortunately this wasn’t taken very well at all by the core IE WebDriver driver developer who found my view unexpected and unfounded. I was surprised but took actions to rectify the issue I had created, such as emailing/tweeting the developer directly with apologies and apologizing to the mailing list where I made the remark. There’s only one Watir developer left working on the Watir IE driver, and I even asked this individual at the time if they would work on improving the Webdriver IE driver but this person didn’t feel confident considering it’s written in C++, and they’re a Ruby developer.

I thought it was all okay, as I hadn’t heard of it again. I didn’t raise any criticisms again (after how they were taken), but then yesterday out of the blue there was this:

And then this one surprised me even more, considering what I did originally:

Update: 19 June 2012: Apparently these comments had nothing to do with me.

Oh boy, Groundhog Day, and I hadn’t even dared to speak of the unreliability again. But that wasn’t the end of it. The Selenium Team caught on and starting retweeting the original tweet, and then adding to the message accusations of trolling, as if anyone who raises any issue, without a reproducible test case, even if they don’t know the underlying technology/language, is apparently a troll.

Oh boy + +

Some background on the original issue

I’m developing a reasonably big set of automated web tests against a large web application behind a corporate firewall. I am writing them for a client, and since they are a Microsoft .NET environment, I am using the C# 4.0 Selenium WebDriver bindings, developing the tests against on Windows.

I have 30 odd end to end scenarios that run perfectly (100% reliably again and again) against Firefox with native events disabled. And then I try to run them in IE. Half of them fail. I rerun them and about a quarter fail, but different ones to the original run. I run them another time and again they fail in different places. What is going on?

The error messages don’t make any sense. The screenshots don’t line up with where the code is failing, as though navigation is failing. After lots of frustration I realize that the driver is silently failing. It thinks it has clicked (but actually hasn’t) so it fails on locating an element that it thinks should be there, but isn’t, because it didn’t click the previous element which was meant to change the page. But what makes matters worse, is it’s inconsistent, one run it works fine, the next it fails, or it fails in a different spot.

I believe it’s native events related, as the IE Driver uses native events to send mouse clicks, and if the browser doesn’t have perfect focus, these events may fail (but think they succeeded). So I make some inquiries to see if I can disable native events, or use synthetic JavaScript events like Firefox in Internet Explorer, but no such support exists.

I look on StackOverflow where I find dozens of questions about this issue dating back over the last year or so (some examples: here, here, here, here and here). There doesn’t seem to be a really solid solution or workaround, but some mentioned include firing every click twice in case one doesn’t work (which obviously introduces other concurrency issues), using sendKeys(“\n”) on elements (strange), refocusing to the webDriver window every time (seems very inefficient to me) and finding the parent of each element using an xpath expression and clicking that (urrrgh). Not really what I was looking for.

I raise it as a criticism: the IE driver is unreliable, but it isn’t taken well. I am very surprised it is the first time the Selenium team have heard of the issue considering all the references (see links above) and workarounds (hacks) I manage to find across the Internet. But it’s very hard to consistently reproduce inconsistent reliability issues.

The problem with reliability issues

The problem with reliability issues is that they’re not easily reproducible. Imagine you’ve got a Volkswagen Passat, and every so often you’re driving down the highway and the engine cuts out. It’s happened to you a dozen times over a few months, but each time has been different (sometimes uphill, down, somethings when braking) so you take it into the Volkswagen dealership. They ask you to take them for a drive to show them the problem, but guess what, it doesn’t happen. Does this mean the problem doesn’t exist? Because you can’t reproduce it? No.

Sporadic reliability issues are notoriously hard to reproduce, but it doesn’t mean they don’t exist, and it certainly doesn’t mean anyone who notices one, or mentions one, is a troll.

The problem with reproducing webdriver issues on client sites

I don’t want seem to be providing excuses for not reproducing issues, but you must realize it’s often very hard to do so. Reproducing inconsistent behavior is not only difficult to begin with, it’s a lot more difficult when working on a corporate application. Firstly, as a consultant, you are legally bound by a non-disclosure agreement to not discuss/share the work/code you are writing, so you must take any issue you’re working on and reproduce it in an external environment completely different from your work at hand. This includes both the client side webdriver code, and the web application code you’re testing, which you’ll have to reproduce yourself. Secondly, the code you’re working on is behind a firewall, often running on a customized desktop Microsoft Windows SOE, not something that can be easily reproduced on a MacBook. And thirdly, in my current case, I don’t have a Visual Studio license or configuration to even work on reproducing the issue.

We need to move pass treating open source projects as sacred cows

I’ve coped a lot of flak over my criticisms of open source projects in the past. I don’t think it’s that fair considering I’ve received a lot of criticisms myself. I have previously shared my views on Capybara which got a lot of people upset. But how are open source projects ever going to get better without criticism? Are open source projects really the sacred cows they make themselves out to be?

It’s quite a sport to criticize commercial software. Think about Microsoft Windows and Internet Explorer: there’s hate sites on the Internet devoted to them. Imagine you worked on the IE core team as a developer. Software developers have also put huge amounts of personal and collective effort into all these commercial products, the only difference is companies have charged for the usage of them. Think about QuickTest Professional (QTP). I lost count of how many times I heard shots against QTP being fired at the Selenium Conference (including those much recited job ad statistics: ‘Selenium’ vs ‘QTP’ on Indeed) and Selenium was even named as a antidote to Mercury poisoning (Mercury Interactive originally made QTP before selling it to HP).

So, does not charging for, or open sourcing software, make it not open for criticism? What does the open in open-source stand for?

We need to recognize that yes, a lot of hard work goes into an open source project, whether it be a simple rubygem, or a large automated testing tool such as Selenium or Watir. But we also need to recognize that just because someone spends their spare time on a such a tool doesn’t instantly make it great and immune to criticism. Greatness comes from usage and feedback.

Sure, we can’t tolerate blatant disrespectful criticism (I believe I may have came across this way initially about the IE Driver issues and I subsequently apologized for it), but we can’t only take criticisms or mention of issues unless accompanied by a self reproducing test case and an offer to dive into unfamiliar technology and fix the problem. I often fix issues with open source software that I write that aren’t accompanied by a reproducible test case: because I’m the expert who wrote the thing and chances are I’ll know what’s going on. An example is this defect someone raised, and I fixed on the same day, last week.

I would personally rather hear about an issue on one of my open source projects without a test case than not hear about it at all, because the person was too afraid of being called a troll to speak up or raise an issue without it being reproducible.

I understand the Watir/Selenium teams have a long, rather non-amicable history together, but I thought that it was getting better with the fantastic Watir-WebDriver library and Selenium Conference, but it seems from this incident we have a long way to go. Here’s hoping we can get over our minor differences and work together to create the best open source projects for ourselves and for our users.

A tale of three ruby automated testing APIs (redux)

Redux Note: I originally wrote a similar article to this before going on parental leave about six weeks ago. Whilst I didn’t intend to offend, it seemed that a few people took my article the wrong way. I understand that a lot of effort goes into creating a web testing API, but that doesn’t mean that everyone will agree with what you’ve made.

Sadly, an anonymous coward attacked myself and the company who I work (even though I don’t mention that company on this blog), so for the first time in this blog’s history, I have had to turn comment moderation on. I am sorry to the other genuine commenters whose comments have been lost in transition, and now have to wait for their new comments to be approved.

Since then I have received numerous emails asking where my article went, and commenting that people found it interesting and worthwhile. So I have decided to repost this article, hopefully with a little less contention this time around, making it clear, this is my opinion and experience: YMMV.


As a consultant I get to see and work on a lot of automated testing solutions using different automated web testing APIs. Lately I’ve been thinking about how these APIs are different and what makes them so.

My main interest is in ruby, and fortunately ruby has three solid examples of three different kinds of web testing APIs, two of which extend the lowest level API: selenium-webdriver.

I’ll (try to) explain here what I consider to be three kinds of automated web testing APIs and where I consider the sweet spot to be and and why.

A meaty example

As a carnivore, I thought I would explain my concept in terms I can relate to. If you’re a beef eater, there are many different kinds of beef that you can use to make some tasty food to eat. I’ll use three different kinds of beef for my example. The first (rawest) kind would involve getting a beef carcass and filleting it yourself to eventually make some edible food. The second kind of beef you could use is beef that is already in a slightly usable form, but you can then use yourself to make some edible food. For example, you can buy minced beef at a butcher, and then make your own hamburger patties, taco fillings etc from it. The final type of beef you could use is beef that has already been prepared so you can directly consume it, for example, sausages which can be cooked and consumed as is.

I consider these three examples of different kinds of beef to roughly correlate to automated web testing APIs, of which I also consider to be three kinds of.

The first is a Web Driver API, which is the rawest form of an API, its job is to drive a browser by issuing it commands. It provides a high level of user control, but like filleting a beef carcass it’s more ‘work’. An example in ruby of this API is the selenium-webdriver API, which controls the browser using the webdriver drivers.

The second kind of automated web testing API is the Browser API, which is a higher level API but still provides user control. This is the minced beef of APIs, as whilst it’s in a more usable form than a carcass, you still have a lot of control (and potential to what you can do with it). An example in ruby of this API is the watir-webdriver API, which uses the underlying selenium-webdriver carcass to control the browser.

The final kind of automated web testing API is the Web Form DSL (Domain Specific Language) which is a very high level API that provides users with specific methods to automate web forms and their elements. This is the beef sausages of APIs as sometimes you feel like eating something else besides sausages, but it’s difficult to make anything else edible but sausages from sausages. An example in ruby of this Web Form DSL is the Capybara DSL.

Visually, this looks something like this:

Show me the code™

So exactly what do these APIs look like?

I knew you’d ask, that’s why I came prepared.

Say I want to accomplish a fairly basic scenario on my example Google Doc form:

  • Start a browser
  • Navigate to the watir-webdriver-demo form
  • Check whether text field with id ‘entry_0’ exists (this should exist)
  • Check whether text field with id ‘entry_99’ exists (this shouldn’t exist)
  • Set a text field with id ‘entry_0’ to ‘1’
  • Set a text field with id ‘entry_0’ to ‘2’
  • Select ‘Ruby’ from select list with id ‘entry_1’
  • Click the Submit button

This is how I would do it in the three different APIs:

# * Start browser
# * Navigate to watir-webdriver-demo form
# * Check whether text field with id 'entry_0' exists
# * Check whether text field with id 'entry_99' exists
# * Set text field with id 'entry_0' to '1'
# * Set text field with id 'entry_0' to '2'
# * Select 'Ruby' from select list with id 'entry_1'
# * Click the Submit button

require 'bench'

benchmark 'selenium-webdriver' do
  require 'selenium-webdriver'

  driver = Selenium::WebDriver.for :firefox ''
    driver.find_element(:id, 'entry_0')
  rescue Selenium::WebDriver::Error::NoSuchElementError
    # doesn't exist
    driver.find_element(:id, 'entry_99').displayed?
  rescue Selenium::WebDriver::Error::NoSuchElementError
    # doesn't exist
  driver.find_element(:id, 'entry_0').clear
  driver.find_element(:id, 'entry_0').send_keys '1'
  driver.find_element(:id, 'entry_0').clear
  driver.find_element(:id, 'entry_0').send_keys '2'
  driver.find_element(:id, 'entry_1').find_element(:tag_name => 'option', :value => 'Ruby').click
  driver.find_element(:name, 'submit').click

benchmark 'watir-webdriver' do
  require 'watir-webdriver'
  b = Watir::Browser.start '', :firefox
  b.text_field(:id => 'entry_0').exists?
  b.text_field(:id => 'entry_99').exists?
  b.text_field(:id => 'entry_0').set '1'
  b.text_field(:id => 'entry_0').set '2'
  b.select_list(:id => 'entry_1').select 'Ruby'
  b.button(:name => 'submit').click

benchmark 'capybara' do
  require 'capybara'
  session =
  session.has_field?('entry_0') # => true
  session.has_no_field?('entry_99') # => true
  session.fill_in('entry_0', :with => '1')
  session.fill_in('entry_0', :with => '2')'Ruby', :from => 'entry_1')
  session.click_button 'Submit'

run 10

This is how long they took for me to run:

                        user     system      total        real
selenium-webdriver  1.810000   0.840000  22.130000 ( 73.123340)
watir-webdriver     1.940000   0.870000  24.380000 ( 79.388494)
capybara            1.950000   0.890000  24.080000 ( 79.920051)

Note: Capybara doesn’t always require a ‘session’, it’s only for non ruby rack applications, but since my example (Google) is not a rack application, as are most of the applications I test, my example must use the session.

When using ruby, why Watir-WebDriver is my sweet spot

I personally find Watir-WebDriver to be the most elegant solution, as the API is high enough for me to be highly readable/usable, but low enough to be powerful and for me to feel like I’m in control.

For example, being able to select an element by a explicit identifier (name, class name, id, anything) is a huge deal to me. I personally don’t like relying on the API to determine which selector to use: for example Capybara only supports name, id and label, but you can’t tell fill_in which specific one to choose: it appears to try each selector one by one until it finds it.

I have found that Watir-WebDriver also also provides lots of flexibility/neatness. For example: it’s the only API shown here that allows URLs to not have a ‘http://’ prefix (how many people do you know who type in http:// into a browser?).

In my opinion, the high level APIs like Capybara don’t provide enough control (for example – being able to specify the explicit selector), but the low level APIs like webdriver don’t provide enough functionality. This is evident when I am using a language other than ruby (like C#) when I find myself writing a large number of web element extension methods because webdriver doesn’t provide any of them. A .set method is a classic example, even Simon Stewart writes a clearAndType method in his examples even though he wrote webdriver which sadly misses it (you must call .clear, and .send_keys).

My biggest concern about high level field APIs

But my biggest issue with the high level APIs is that I’ve frequently seen them used to write test scripts that are step by step interactions with a web form. Instead of thinking of a business application as that, people see it as a series of forms that you ‘fill in’. This means people create scenarios like Aslak Hellesøy included in his recent post about cucumber web steps (which uses Capybara) and the problems it has created.

Scenario: Successful login
  Given a user "Aslak" with password "xyz"
  And I am on the login page
  And I fill in "User name" with "Aslak"
  And I fill in "Password" with "xyz"
  When I press "Log in"
  Then I should see "Welcome, Aslak"

I’m not saying it’s not possible to end up with something as ugly as above using other APIs, but I am saying the web form DSL style naturally relates to this: as the APIs look so similar to this style because that’s what the DSL was designed for: filling in forms. I’ve seen people frequently write generic, reusable cucumber steps to match the web form DSL like:

When /^I fill in "(.+)" with "(.+)"$/ do |value, field|
  fill_in field, :with => value

But this means you end up with less readable, less maintainable test scripts rather than business readable executable specifications.


Ultimately what I am looking for in an automated web testing API is simplicity and full control. I personally find browser APIs like Watir-WebDriver and Watir give me this, and this is why I love them so. Your mileage may vary, you may like different styles of APIs better, but I’ve seen other APIs so badly abused by people not even thinking about it, so it makes sense to think about what you’re trying to achieve and whether what you’re doing is the right way.

Introducing the most elegant way to use webdriver with ruby

I have recently launched the most elegant way to use webdriver with ruby. It’s a collection of succinct examples of how to use watir-webdriver. I found it very enjoyable piecing together this information from various blog posts I have written, and organizing it in a structured, logical manner. I hope you find it useful.