Watir-WebDriver with GhostDriver on OSX: headless browser testing

GhostDriver has been released which means it is now easy to run reliable headless WebDriver tests on Mac OSX.

Steps to get working on OSX

  1. First make sure you have homebrew installed
  2. Run
    brew update


    brew install phantomjs

    which should install PhantomJS 1.8.1 or newer

  3. Run irb and start using GhostDriver!
    require 'watir-webdriver'
    b = Watir::Browser.new :phantomjs
    b.goto "www.google.com"
    b.url #"http://www.google.com.au/"
    b.title #"Google"

I’ve tested it on a large test suite (123 scenarios) and it behaves the same as other browsers with full JavaScript support. It took 8m13s in total: surprisingly it is slightly slower than ChromeDriver (7m30s) in my testing, but a little faster than the Firefox Driver (9m33s).

Well done to all involved in this project. It’s great to see a reliable, realistic headless browser with full JavaScript support for WebDriver finally released.

And yes, in case you’re wondering, it does screenshots!

Getting the WebDriver driver from a WebDriver element

In watir-webdriver it is really easy to access both the webdriver driver and the watir browser objects from an element:

require 'watir-webdriver'
b = Watir::Browser.new
b.goto 'www.google.com'
button = b.button(name: 'btnK')
button.driver #webdriver
button.browser #watir browser

I never knew how to do this in C# until someone named Robert left a comment yesterday on an old blog post with instructions on how to do so.

You can get the webdriver by using.
var driver = ((IWrapsDriver)webElement).WrappedDriver;

Neat. This means the C# check image present extension method I wrote about previously can be implemented directly from the element itself:

public static bool IsImageVisible(this IWebElement image)
  var driver = ((IWrapsDriver)image).WrappedDriver;
  var script = TestConfig.DriverType == "ie"
             ? "return arguments[0].complete"
             : "return (typeof arguments[0].naturalWidth!=\"undefined\"" +
             " && arguments[0].naturalWidth>0)";
  return (bool)((IJavaScriptExecutor)driver).ExecuteScript(script, image);

This means it can be simply called like:


instead of the cumbersome:


Thanks Robert!

Checking an image is actually visible using WebDriver

I didn’t realize it’s actually a little tricky to check that an image is loaded when using WebDriver. WebDriver will only complain if the image tag you’re looking for isn’t in the DOM, not if the image link is broken and not actually visible.

For example, in watir-webdriver (ruby), this doesn’t really work as I would expect as the image isn’t actually visible on the ‘brokenimage’ page.

require 'watir-webdriver'
b = Watir::Browser.new :firefox
b.goto 'https://dl.dropbox.com/u/18859962/brokenimage.html'
puts b.image(id: 'watermelon').visible? #true but is not visible

The way to check that is is actually visible is to check a JavaScript property ‘naturalWidth’ is greater than 0.

b = Watir::Browser.new :firefox
b.goto 'https://dl.dropbox.com/u/18859962/brokenimage.html'
puts b.execute_script("return (typeof arguments[0].naturalWidth!=\"undefined\" && arguments[0].naturalWidth>0)", b.image(id: 'watermelon'))

Unfortunately this doesn’t work in IE, so you should use the ‘complete’ JavaScript method in IE (which doesn’t work in other browsers):

b = Watir::Browser.new :firefox
b.goto 'https://dl.dropbox.com/u/18859962/brokenimage.html'
puts b.execute_script("return arguments[0].complete", b.image(id: 'watermelon'))

In C#, you can wrap this up into a WebDriver extension method so you can this directly from Driver passing in the image element.

public static bool IsImageVisible(this IWebDriver driver, IWebElement image)
    var script = TestConfig.DriverType == "ie"
                ? "return arguments[0].complete"
                : "return (typeof arguments[0].naturalWidth!=\"undefined\"" +
                  " && arguments[0].naturalWidth > 0)";
    return (bool) ((IJavaScriptExecutor) driver).ExecuteScript(script, image);

// Usage

If it’s important that images load correctly in your application, you should probably start putting some of these in your WebDriver page objects. It’s simple to write a verify images method on a page that iterates through each image in the DOM and checks that it’s visible using the techniques above. Have fun.

Update: 30 November
I wrote about a slightly more elegant C# approach to do this directly from the element.

The webdriver-user-agent gem now supports random user agents

My webdriver-user-agent gem now supports random user agents. This idea belonged to Christoph Pilka who released the webdriver-user-agent-randomizer gem and suggested that we merge this feature back into the orginal gem.

Well, I have done it and now you can access this functionality like so:

require 'selenium-webdriver'
require 'webdriver-user-agent'
driver = UserAgent.driver(:agent => :random)
driver.execute_script('return navigator.userAgent')
# random agent like "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.2) Gecko/20010726 Netscape6/6.1"

See README for full details.

Webdriver select lists in ruby

Selecting an option from a select list using the selenium-webdriver gem:

require 'selenium-webdriver'
driver = Selenium::WebDriver.for :firefox
driver.navigate.to 'http://www.shino.de/parkcalc/'
Selenium::WebDriver::Support::Select.new(driver.find_element(:id => 'ParkingLot')).select_by :text, 'Economy Parking'

is much simpler in watir-webdriver

require 'watir-webdriver'
browser = Watir::Browser.new
browser.goto 'http://www.shino.de/parkcalc/'
browser.select_list(:id => 'ParkingLot').select 'Economy Parking'

That is all.

Automatic Firefox authentication when using Selenium-WebDriver with AutoAuth

I came across a particularly challenging problem today automating a web app for a client that runs behind a corporate proxy on a different Windows domain. The corporate proxy used NTLM authentication, but since I was on a different domain, I couldn’t get Firefox to send this information automatically so an authentication dialog would always appear that looked similar to this (IE worked fine):

Normally with browser authentication it is fairly straightforward to embed the username and password into the URL and Firefox will pass this to the web application without any problems (it’ll even ignore the confirmation normally displayed to the user), but in this case it didn’t work as it was the proxy that was requesting the information, not the application.

require 'watir-webdriver'
b = Watir::Browser.new :firefox
b.goto 'http://admin:password@'

I manually could get Firefox to store the credentials, but every time the WebDriver tests would run, this darn Authentication Required’ dialog would appear (without credentials if using the standard new WebDriver profile for each test run). I tried setting all sorts of Firefox about:config settings to do with NTLM but nothing would work. After lots of trial and error, and finding nothing useful on the Internet about this issue, a colleague pointed out a Firefox add-on called AutoAuth that automatically submits these dialogs using stored Firefox credentials. Voila!

Example using Watir-WebDriver (the quick way)

The easiest way is to the install the AutoAuth add-on on your default Firefox profile (the one that Firefox uses when launched manually), and store the credentials needed in the default Firefox password manager. All you then need to do is tell Watir-WebDriver to use the default profile:

require 'watir-webdriver'
b = Watir::Browser.new :firefox, :profile => 'default'
b.goto ''

The issue with the above code is that it’s not repeatable across machines, as the machine’s default profile must have AutoAuth installed, and the username and password in the password manager.

Example using Watir-WebDriver (the most repeatable way)

To make this more repeatable, first you need to create a Firefox profile by following the instructions here (we’ll call it WatirWebDriver).

Manually launch this profile and visit the site you need to authenticate to, enter the username and password and make sure you save the credentials in Firefox when prompted.

The script is then pretty simple: create a profile as a copy of the one you made, add the AutoAuth extension (download it and place the xpi file in your project directory), and visit the site:

profile = Selenium::WebDriver::Firefox::Profile.from_name 'WatirWebDriver'
profile.add_extension 'autoauth-2.1-fx+fn.xpi'
b = Watir::Browser.new :firefox, :profile => profile
b.goto ''

This script should visit the site and AutoAuth should kick in and automatically submit that pesky ‘Authentication Required’ dialog: take that!


Whilst this NTLM proxy authentication issue was a bit of an issue to begin with, we found a reasonable way to work around it. I don’t really like the dependency on an existing Firefox profile with the proxy credentials, but until I work out how to store credentials in a Firefox profile I create at runtime using Selenium (which I don’t believe is possible), I think that it’s necessary.

I’ve also updated the Watir-WebDriver Basic Browser Authentication page.

The sacred cow that is an open source project

Before I start, I want to make it clear that I am huge believer in and supporter of open source software, and I appreciate all the time and effort that people put in to make it so. I want to make open source software better and it is the only reason I write this post.


It all started a couple of months ago when I made what I considered an off hand remark about how I felt the Internet Explorer Webdriver driver was unreliable. Unfortunately this wasn’t taken very well at all by the core IE WebDriver driver developer who found my view unexpected and unfounded. I was surprised but took actions to rectify the issue I had created, such as emailing/tweeting the developer directly with apologies and apologizing to the mailing list where I made the remark. There’s only one Watir developer left working on the Watir IE driver, and I even asked this individual at the time if they would work on improving the Webdriver IE driver but this person didn’t feel confident considering it’s written in C++, and they’re a Ruby developer.

I thought it was all okay, as I hadn’t heard of it again. I didn’t raise any criticisms again (after how they were taken), but then yesterday out of the blue there was this:

And then this one surprised me even more, considering what I did originally:

Update: 19 June 2012: Apparently these comments had nothing to do with me.

Oh boy, Groundhog Day, and I hadn’t even dared to speak of the unreliability again. But that wasn’t the end of it. The Selenium Team caught on and starting retweeting the original tweet, and then adding to the message accusations of trolling, as if anyone who raises any issue, without a reproducible test case, even if they don’t know the underlying technology/language, is apparently a troll.

Oh boy + +

Some background on the original issue

I’m developing a reasonably big set of automated web tests against a large web application behind a corporate firewall. I am writing them for a client, and since they are a Microsoft .NET environment, I am using the C# 4.0 Selenium WebDriver bindings, developing the tests against on Windows.

I have 30 odd end to end scenarios that run perfectly (100% reliably again and again) against Firefox with native events disabled. And then I try to run them in IE. Half of them fail. I rerun them and about a quarter fail, but different ones to the original run. I run them another time and again they fail in different places. What is going on?

The error messages don’t make any sense. The screenshots don’t line up with where the code is failing, as though navigation is failing. After lots of frustration I realize that the driver is silently failing. It thinks it has clicked (but actually hasn’t) so it fails on locating an element that it thinks should be there, but isn’t, because it didn’t click the previous element which was meant to change the page. But what makes matters worse, is it’s inconsistent, one run it works fine, the next it fails, or it fails in a different spot.

I believe it’s native events related, as the IE Driver uses native events to send mouse clicks, and if the browser doesn’t have perfect focus, these events may fail (but think they succeeded). So I make some inquiries to see if I can disable native events, or use synthetic JavaScript events like Firefox in Internet Explorer, but no such support exists.

I look on StackOverflow where I find dozens of questions about this issue dating back over the last year or so (some examples: here, here, here, here and here). There doesn’t seem to be a really solid solution or workaround, but some mentioned include firing every click twice in case one doesn’t work (which obviously introduces other concurrency issues), using sendKeys(“\n”) on elements (strange), refocusing to the webDriver window every time (seems very inefficient to me) and finding the parent of each element using an xpath expression and clicking that (urrrgh). Not really what I was looking for.

I raise it as a criticism: the IE driver is unreliable, but it isn’t taken well. I am very surprised it is the first time the Selenium team have heard of the issue considering all the references (see links above) and workarounds (hacks) I manage to find across the Internet. But it’s very hard to consistently reproduce inconsistent reliability issues.

The problem with reliability issues

The problem with reliability issues is that they’re not easily reproducible. Imagine you’ve got a Volkswagen Passat, and every so often you’re driving down the highway and the engine cuts out. It’s happened to you a dozen times over a few months, but each time has been different (sometimes uphill, down, somethings when braking) so you take it into the Volkswagen dealership. They ask you to take them for a drive to show them the problem, but guess what, it doesn’t happen. Does this mean the problem doesn’t exist? Because you can’t reproduce it? No.

Sporadic reliability issues are notoriously hard to reproduce, but it doesn’t mean they don’t exist, and it certainly doesn’t mean anyone who notices one, or mentions one, is a troll.

The problem with reproducing webdriver issues on client sites

I don’t want seem to be providing excuses for not reproducing issues, but you must realize it’s often very hard to do so. Reproducing inconsistent behavior is not only difficult to begin with, it’s a lot more difficult when working on a corporate application. Firstly, as a consultant, you are legally bound by a non-disclosure agreement to not discuss/share the work/code you are writing, so you must take any issue you’re working on and reproduce it in an external environment completely different from your work at hand. This includes both the client side webdriver code, and the web application code you’re testing, which you’ll have to reproduce yourself. Secondly, the code you’re working on is behind a firewall, often running on a customized desktop Microsoft Windows SOE, not something that can be easily reproduced on a MacBook. And thirdly, in my current case, I don’t have a Visual Studio license or configuration to even work on reproducing the issue.

We need to move pass treating open source projects as sacred cows

I’ve coped a lot of flak over my criticisms of open source projects in the past. I don’t think it’s that fair considering I’ve received a lot of criticisms myself. I have previously shared my views on Capybara which got a lot of people upset. But how are open source projects ever going to get better without criticism? Are open source projects really the sacred cows they make themselves out to be?

It’s quite a sport to criticize commercial software. Think about Microsoft Windows and Internet Explorer: there’s hate sites on the Internet devoted to them. Imagine you worked on the IE core team as a developer. Software developers have also put huge amounts of personal and collective effort into all these commercial products, the only difference is companies have charged for the usage of them. Think about QuickTest Professional (QTP). I lost count of how many times I heard shots against QTP being fired at the Selenium Conference (including those much recited job ad statistics: ‘Selenium’ vs ‘QTP’ on Indeed) and Selenium was even named as a antidote to Mercury poisoning (Mercury Interactive originally made QTP before selling it to HP).

So, does not charging for, or open sourcing software, make it not open for criticism? What does the open in open-source stand for?

We need to recognize that yes, a lot of hard work goes into an open source project, whether it be a simple rubygem, or a large automated testing tool such as Selenium or Watir. But we also need to recognize that just because someone spends their spare time on a such a tool doesn’t instantly make it great and immune to criticism. Greatness comes from usage and feedback.

Sure, we can’t tolerate blatant disrespectful criticism (I believe I may have came across this way initially about the IE Driver issues and I subsequently apologized for it), but we can’t only take criticisms or mention of issues unless accompanied by a self reproducing test case and an offer to dive into unfamiliar technology and fix the problem. I often fix issues with open source software that I write that aren’t accompanied by a reproducible test case: because I’m the expert who wrote the thing and chances are I’ll know what’s going on. An example is this defect someone raised, and I fixed on the same day, last week.

I would personally rather hear about an issue on one of my open source projects without a test case than not hear about it at all, because the person was too afraid of being called a troll to speak up or raise an issue without it being reproducible.

I understand the Watir/Selenium teams have a long, rather non-amicable history together, but I thought that it was getting better with the fantastic Watir-WebDriver library and Selenium Conference, but it seems from this incident we have a long way to go. Here’s hoping we can get over our minor differences and work together to create the best open source projects for ourselves and for our users.

Waiting for watir-webdriver downloads (and determining the file name)

There was an interesting question on Stackoverflow recently about waiting for a file download whilst running a watir-webdriver test. Whilst it’s happy to download files, watir-webdriver won’t wait until they have downloaded, so essentially this is up to you to check.

The easiest way I have found to do this is, just before you download the file, check the contents of your download directory, then commence the download, and then wait until there’s an additional file in your downloads directory.

Once you have the additional file, you can read its file name to determine what was actually downloaded. This is useful for dynamic file names, or even static file names when you’re downloading multiples (as Firefox will add a (1), (2), (3) etc. to the file name).

The code to do this as follows:

require 'watir-webdriver'

file_name = nil
download_directory = "#{Dir.pwd}/downloads"
download_directory.gsub!("/", "\\") if Selenium::WebDriver::Platform.windows?
downloads_before = Dir.entries download_directory
profile = Selenium::WebDriver::Firefox::Profile.new
profile['browser.download.folderList'] = 2 # custom location
profile['browser.download.dir'] = download_directory
profile['browser.helperApps.neverAsk.saveToDisk'] = "text/csv,application/pdf"
b = Watir::Browser.new :firefox, :profile => profile

b.goto 'https://dl.dropbox.com/u/18859962/hello.csv'

30.times do
  difference = Dir.entries(download_directory) - downloads_before
  if difference.size == 1
    file_name = difference.first 
  sleep 1
raise "Could not locate a new file in the directory '#{download_directory}' within 30 seconds" if not file_name
puts file_name

As usual, let me know if you know of a better way to do this.

webdriver-user-agent gem now supports ruby 1.8.x

I have just released verion 0.1.0 of my webdriver-user-agent gem that makes it easy to run your watir-webdriver/selenium-webdriver tests against mobile device user agents.

The latest version supports ruby 1.8.x which previously generated an error because I was trying to use the .downcase method on a symbol (which was introduced in ruby 1.9.x).

Its usage remains the same. Enjoy.

Example using selenium-webdriver

require 'selenium-webdriver'
require 'webdriver-user-agent'

driver = UserAgent.driver(:browser => :chrome, :agent => :iphone, :orientation => :landscape)
driver.get 'http://tiffany.com'
driver.current_url.should == 'http://m.tiffany.com/International.aspx'

Example using watir-webdriver

require 'watir-webdriver'
require 'webdriver-user-agent'

driver = UserAgent.driver(:browser => :chrome, :agent => :iphone, :orientation => :landscape)
browser = Watir::Browser.new driver
browser.goto 'tiffany.com'
browser.url.should == 'http://m.tiffany.com/International.aspx'