Relenium

by Lluis Ramon
and Aleix Ruiz de Villa

Aleix Ruiz de Villa
TSS - Transport Simulation Systems
RugBcn - Barcelona R Users Group
V Jornadas
December 12th, 2013

Introduction

Most popular tools for webscraping
  • RCurl: tipically manages the connection with the web page
  • XML: parses the html document and extracts the information
Limitation: they cannot interact with javascript scripts.


Selenium is a module in Java, C#, Python, Ruby, Php, Perl and Javascript to perform automated web unit testing


Relenium is a package in R that imports selenium via rJava. Relenium is a shortened version of selenium intended to web scraping.

Installation

install.packages("rJava")
install.packages("devtools")
require(devtools)
install_github('seleniumJars', 'LluisRamon')
install_github('relenium', 'LluisRamon')

Starting

Firefox navigator:
firefox <- firefoxClass$new()

Methods available via tab completion
firefox$ # and press the TAB key

Go to a webpage
firefox$get("http://lluisramon.github.io/relenium/toyPageExample.html")

Page source
firefox$getPageSource()
firefox$printHtml()

Input Box


Search element and send information
inputElement <-
  firefox$findElementByXPath("//*[@id='main_content']/div[1]/form/input")
inputElement$sendKeys("R Project")
inputElement$sendKeys(key = "ENTER")

inputElement is a webElement type object. Info:
inputElement
inputElement$keys

Button


Click the button
buttonElement <- firefox$findElementByXPath("//*[@id='main_content']/a")
buttonElement$click()

Get table information using the readHTMLtable (from the XML package).
infoTable <- firefox$findElementByXPath("//*[@id='myModal']/div/div/div/table")
readHTMLTable(infoTable$getHtml(), header = TRUE)[[1]]

Or, readHTMLTable(firefox$getPageSource(), header = TRUE)

Close the window
buttonElement <- firefox$findElementByXPath("//*[@id='myModal']/div/div/div/button") buttonElement$click()

Select


Select the 'select' element
selectElement <- firefox$findElementByXPath("//*[@id='main_content']/select") selectElement$printHtml()

Show options
optsList <- selectElement$getOptions()
sapply(optsList, function(optEle){
optEle$getText()
})

Select items
selectElement$selectByValue("Mango")
selectElement$selectByValue("Nectarine")
optsSel <- selectElement$getAllSelectedOptions()
sapply(optsSel, function(optEle){
optEle$getText()
})
selectElement$deselectAll()

Navigation

firefox$get("http://lluisramon.github.io/relenium/")
firefox$back()
firefox$close()

/

#