XML package taster

Aleix Ruiz de Villa
TSS - Transport Simulation Systems
RugBcn - Barcelona R Users Group
V Jornadas
December 12th, 2013

Introduction

XML Generalizes html. More flexibilty.
Web page XML package
Approaches:
  • Read the whole document and parse it.
  • For big files, not reading the whole document: event-driven (SAX) parsing (not explained here).

Example

Getting the html tree
require(XML)
firefox <- firefoxClass$new()
firefox$get("http://lluisramon.github.io/relenium/toyPageExample.html")
doc <- htmlParse(firefox$getPageSource())

Navigating through the tree
rootNode <- xmlRoot(doc)
selectNode <- getNodeSet(rootNode, "//select")[[1]]

Navigating through the tree
xmlName(selectNode)
xmlSize(selectNode)
xmlChildren(selectNode)
xmlName(xmlParent(selectNode))

Example

Node attributes
xmlAttrs(selectNode)
xmlGetAttr(selectNode, "multiple")

Selecting children
names(selectNode)
selectNode[[1]]
selectNode['option']
selectNode[1:2]

Example

Apply type family functions
xmlApply(selectNode, xmlValue)
xmlSApply(selectNode, xmlValue)
xpathSApply(rootNode, '//option', xmlValue)

/

#