AMA: R.Y.O. Page Objects 2.0

Michael Karlovich asks…

Do you have any updated thoughts on rolling your own page objects with Watir? The original post is almost 4 years old but is still the basis (loosely) of every page object framework I’ve built since then.

My response…

Wow, I can’t believe that post is almost four years old. I have also have used this for the basis of every page object framework I have built since then.

I recently had a look at our JavaScript (ES2015) code of page objects and despite ES2015 not having meta-programming support like ruby, our classes are remarkably similar to what I was proposing ~4 years ago.

I believe this is because some patterns are classic and therefore almost timeless, they can be applied over and over again to different contexts. There’s a huge amount of negativity towards best practices of late, but I could seriously say that page objects are a best practise for test automation of ui systems, which isn’t saying they will be exactly the same in every context, but there’s a common best-practice pattern there which you most likely should be using.

Page objects, as a pattern, typically:

  • Inherit from a base page object/container which stores common actions like:
    • instantiating the object looking for a known element that defines that page’s existence
    • optionally allow a ‘visit’ to the page during instantiation using some defined URL/path
    • provides actions and properties common to all pages, for example: waiting for the page, checking the page is displayed, getting the title/url of the page, and checking cookies and local storage items for that page;
  • Define actions as methods which are ways of interacting with that page (such as logging in);
  • Do not expose internals about the page outside the page – for example they typically don’t expose elements or element selectors which should only be used within actions/methods for that page which are exposed; and
  • Can also be modelled as components for user interfaces that are built using components to give greater reusability of the same components across different pages.

The biggest benefit I have found from using page objects as a pattern is having more deterministic end-to-end tests since instantiating a page I know I am on that page, so my tests will fail more reliably with a better understanding of what went wrong.

Are there any other pattern attributes you would consider vital for page objects?

AMA: JS vs Ruby

Butch Mayhew asks…

I have noticed you blogging more about JS frameworks. How do these compare to Watir/Ruby? Would you recommend one over the other?

My response…

I had a discussion recently with Chuck van der Linden about this same topic as he has a lot of experience with Watir and is now looking at JavaScript testing frameworks like I have done.

Some Background built an entirely new UI for managing sites using 100% JavaScript with React for the main UI components. I am responsible for e2e automated tests across this UI, and whilst I originally contemplated, and trialled even, using Ruby, this didn’t make long term sense for where the original WordPress developers are mostly PHP and the newer UI developers are all JavaScript.

Whilst I see merit in both views: I still think having your automated acceptance tests in the same language as your application leads to better maintainability and adoptability.

I still think writing automated acceptance tests in Ruby is much cleaner and nicer than JavaScript Node tests, particularly as Ruby allows meta-programming which means page objects can be implemented really neatly.

The JavaScript/NodeJS landscape is still very immature where people are using various tools/frameworks/compilers and certain patterns or de facto standards haven’t really emerged yet. The whole ES6/ES2015/ES2016 thing is very confusing to newcomers like me, especially on NodeJS where some ES6+ features are supported, but others require something like Babel to compile your code.

But generally with the direction ES is going, writing page objects as classes is much nicer than using functions for everything as in ES5.

Whilst there’s nothing I have found that is better (or even as good) in JavaScript/Mocha/WebDriverJS than Ruby/RSpec/Watir-WebDriver, I still think it’s a better long term decision for to use the JavaScript NodeJS stack for our e2e tests.

Using appium in Ruby for iOS automated functional testing

As previously explained, I recently started on an iOS project and have spent a bit of time comparing iOS automation tools and chose Appium as the superior tool.

The things I really like about Appium is that it is language/framework agnostic as it uses the WebDriver standard WIRE protocol, it doesn’t require any modifications to your app, supports testing web views (also known as hybrid apps) and it supports Android since we are concurrently developing an Android application (it also supports OSX and Firefox OS but we aren’t developing for those, yet). There isn’t another iOS automated testing tool, that I know of, that ticks that many boxes for me.

Getting Started

The first thing to do is download the package from the appium website. I had an issue with the latest version (0.11.2) launching the server which can be resolved by opening the preferences and checking “Override existing sessions”.

You run the server from inside the which takes your commands and relays them to the iOS simulator. There’s also a very neat ‘inspector’ tool which shows you all the information you need to know about your app and how to identify elements.

Note: there’s currently a problem with XCode 5.0.1 (the latest version as I write) which means Instruments/UIAutomation won’t work at all. You’ll need to downgrade (uninstall/reinstall) to XCode 5.0 to get appium to work at all.

Two Ruby Approaches

This confused me a little to start, but there’s actually two vastly different ways to use appium in ruby.

1) Use the standard selenium-webdriver gem

If you’re used to using WebDriver, like me, this will be the most straightforward approach (this is the approach I have taken). Appium extends the API to add different gestures by calling execute_script from the driver, so all other commands stay the same (for example, find_element).

2) Use the appium_lib library

There is a Ruby gem appium_lib that has a different API to the selenium-webdriver gem to control appium. I don’t see any massive benefits to this approach besides having an API that is more specific to app testing.

Using Selenium-WebDriver to start appium in ruby

Launching an appium app is as simple as defining some capabilities with a path to your .app file you have generated using XCode (this gets put into a deep folder so you can write the location to a file and read it from that file).

capabilities = {
'browserName' => 'iOS',
'platform' => 'Mac',
'version' => '6.1',
'app' => appPath
driver = Selenium::WebDriver.for :remote,
desired_capabilities: capabilities,
url: ""

Locating elements

Once you’ve launched your app, you’ll be able to use the appium inspector to see element attributes you can use in appium. Name is a common attribute, and if you find that it’s not being shown, you can add a property AccessibilityIdentifier in your Objective C view code which will flow throw to appium. This makes for much more robust tests than relying on labels or xpath expressions.

driver.find_element(:name, "ourMap").displayed?

Enabling location services for appium testing

This got me stuck for a while as there’s quite a bit of conflicting information about appium on how to handle the location services dialog. Whilst you should be able to interact with it as a normal dialog in the latest version of appium, I would rather not see it at all, so I wrote a method to copy a plist file with location services enabled in it to the simulator at the beginning of the test run. It’s quite simple (you can manually copy the clients.plist after manually enabling location services):

def copy_location_services_authentication_to_sim
source = "#{File.expand_path(File.dirname(__FILE__))}/clients.plist"
destination = "#{File.expand_path('~')}/Library/Application Support/iPhone Simulator/7.0/Library/Caches/locationd"
FileUtils.cp_r(source, destination, :remove_destination => true)

Waiting during appium tests

This is exactly the same as selenium-webdriver. There’s an implicit wait, or you can explicitly wait like such:

driver.manage.timeouts.implicit_wait = 10
wait = :timeout => 30
wait.until {driver.find_element(:name, 'monkeys').displayed? }

Mobile gestures

The obvious difference between a desktop web browser and a mobile app is gestures. Appium adds gestures to WebDriver using execute_script. I recommend using the percentage method (0.5 etc) instead of pixel method as it is more resilient to UI change.

For example:

driver.execute_script 'mobile: tap', :x => 0.5, :y => 0.5


b = driver.find_element :name, 'Sign In'
driver.execute_script 'mobile: tap', :element => b.ref

Testing Embedded Web Views

The native and web views seamlessly combine so you can use the same find_element method to find either. The inspector displays the appropriate attributes.

Note: I can’t seem to be able to execute a gesture (eg. swipe) over a Web View. I don’t know whether this is a bug or a limitation of Appium.


I have found that using the familiar selenium-webdriver gem with appium has been very powerful and efficient. Being able to open an interactive prompt (pry or irb) and explore your app using the selenium-webdriver library and the inspector is very powerful as you can script on the fly. Whilst appium still seems relatively immature, it seems a very promising approach to iOS automation.

Now to get watir-webdriver to work with appium.

Packaging a ruby script as an Windows exe using OCRA

I recently wrote a watir-webdriver ruby script that I needed to be able distribute to others to run on Windows machines that don’t have ruby installed.

I came across the OCRA gem that allows you to easily generate a windows executable from a ruby script. This packages the ruby interpreter and all dependencies into an executable file.

It was quite straightforward to get it working, you simply install the gem on windows and run the ocra command with the name of your ruby script.

The only (minor) issues I had were:

  • if you wish to access external files from your executable (such as a config file) just add ‘$:.unshift File.dirname($0)‘ at the start of your ruby file
  • if you are using ruby logging, then for some reason you can’t call logger.close as it crashes OCRA, but you can just not close the logger which is fine
  • for some reason on Windows I needed to explicitly require ‘securerandom’ to use SecureRandom.uuid whereas it just worked on Mac OSX

Once I resolved these it quickly generated an executable which was runnable without any version of ruby installed. Neat.

The webdriver-user-agent gem now supports random user agents

My webdriver-user-agent gem now supports random user agents. This idea belonged to Christoph Pilka who released the webdriver-user-agent-randomizer gem and suggested that we merge this feature back into the orginal gem.

Well, I have done it and now you can access this functionality like so:

require 'selenium-webdriver'
require 'webdriver-user-agent'
driver = UserAgent.driver(:agent => :random)
driver.execute_script('return navigator.userAgent')
# random agent like "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.2) Gecko/20010726 Netscape6/6.1"

See README for full details.

Rspec, page objects and user flows

I love it when people challenge my views.

Recently, Chris McMahon from WMF challenged my view: is Cucumber best for end-to-end testing when RSpec will do? Especially in an environment where non-technical people aren’t involved in writing the specifications.

Admittedly, Cucumber does create some overhead (those pesky step definitions with pesky regular expressions (that I love)), but it does give you plain English (or LOLCAT) written, human readable, executable specifications. But the overhead gives you something more. It gives you a way to create reusable chunks of code that you can call wherever you want when want to test something (or set something up). For example: Given I am logged into Wikipedia. But what do you do if you’re using RSpec?

What is needed is something to sit in between RSpec code and page objects that provide common functionality to reduce repetition of code, such as logging in, in each RSpec specification.

I’ve been trying to come up with a good term to describe these and the best I can come up with is ‘user flows‘, as essentially they are a flow that a user follows throughout the system, spanning multiple pages.

Adding user flows to the Wikimedia custom page object project

I thought I would experiment with user flows in the Wikimedia custom page object project that I recently created. First I replicated my Cucumber features as RSpec specs, which was easy enough, but started to notice a lot of duplication of code.

For example: this was repeated at the beginning of most specs:

visit Wikipedia::LoginPage do |page|
  page.login_with USERNAME, PASSWORD
  page.should be_logged_in

Whilst it isn’t a huge amount of repetition (thanks to the useful login_with method on the LoginPage), it’s still not ideal. Enter ruby modules.

Using modules to store pages and flows

As far as I know, modules perform two main functions in ruby: first as a namespace for classes (such as the Wikipedia::LogonPage class), and secondly as a way to group methods that don’t belong in a class, which are often ‘mixed into’ other classes. Perfect! A spot to store our flows.

So, since I already had both a Wikipedia and Commons module, I could simply add module methods to these modules to represent our user flows.

module Wikipedia

  extend PageHelper
  extend RSpec::Expectations
  extend RSpec::Matchers

  def self.ensure_logged_in
    visit Wikipedia::LoginPage do |page|
      page.login_with USERNAME, PASSWORD
      page.should be_logged_in

Wiring these up to RSpec

I needed to do a little wiring to ensure these user flows can be easily used in RSpec. In my spec_helper.rb file (which is the equivalent to cucumber.rb in Cucumberland), I added the following to ensure that my browser object I created in RSpec is available to use in the flows:

RSpec.configure do |config|
  config.include PageHelper
  config.before(:each) do
    @browser = browser
    Commons.browser = @browser
    Wikipedia.browser = @browser
  config.after(:suite) { browser.close }

and that was all that was needed to start using my user flows in my RSpec specifications.

A completed RSpec specification in my WMF suite looks something like this:

describe 'Editing a Wikipedia Page' do

  context 'Logged on user' do

    it 'should be able to edit own user page' do
      content, edit_message = Wikipedia::edit_user_page_by_adding_new_content
      visit Wikipedia::UserPage do |page|
        page.page_content.should include content
        page.history_content.should include edit_message
        page.history_content.should include Wikipedia::USERNAME




where Wikipedia::ensure_logged_in and Wikipedia::edit_user_page_by_adding_new_content are two user flows that I defined in the Wikipedia module.


I found using page objects directly in RSpec lacking, so I created a concept of user flows in modules that can be easily used from RSpec to reduce repetition and increase readability of specs. If I were to use RSpec for end to end tests, I would find this incredibly useful as a replacement for what Cucumber steps provide.

Three ways to generate test data for your ruby automated tests

I like generating test data that is varied, but still is realistic looking and fun. These are my three favourite ways to generate test data for my ruby automated tests.


When ever I need some form of fake data, whether it be names, company names or email addresses etc, I use the brilliant faker gem. This gem makes it super easy to generate random fake data that still looks realistic (unlike a randomly generated word you can generate yourself like ‘HSKHJKUWG’). My favourite method is, which as the name implies, generates some great BS!

require 'faker'

# Nathanael Botsford
# Labadie, Marvin and Kassulke
puts Faker::Company.catch_phrase
# Self-enabling bottom-line project
# grow B2C platforms


When ever I need to input a piece of data that needs to be uniquely identifiable, I use the UUID (universally unique identifer) capability in built in ruby 1.9.3 to generate a UUID. I prefer this to using the current date/time as it requires less formatting to get it to unique.

Ruby 1.9.3 has this in built, otherwise there’s the UUID gem.

require 'securerandom'

puts SecureRandom.uuid
# ffe71bd2-2650-4135-b366-f8da08b4b708


A relatively newcomer (it was released last week) is my quoth gem to generate random wikiquotes. I used this in the Wikimedia example tests to append interesting content to my test user page. You could use this in tests that need to insert blocks of content where you may like something with variety and that is interesting.

require 'quoth'

puts Quoth.get
# If I have ventured wrongly, very well, life then helps me with its penalty.
# But if I haven't ventured at all, who helps me then? ~ Søren Kierkegaard


I find these three methods useful. What do you use to generate test data? Or do you use hard coded test data?