Reading Tabular data using Selenium

Akash Yadav
3 min readFeb 22, 2022

Introduction

Selenium is a powerful tool used to automate browsers. One can simulate browser interactions and achieve tasks, which would take a lot of manual effort otherwise. Widely used by testers to validate functionalities across multiple browsers and asserting multiple scenarios.

Browsers offer Drivers which are program that can be used to interact and performs a task on the browser instance (for e.g. clicking on a html element )with a browser instance of that type. Selenium achieves its functionality by interacting with driver server (local / remote).

Selenium offers an abstract common API for all browsers so that user program can be executed on any selected browser of choice. Selenium API offers methods to read all properties of a selected element and read data from Webpages. For example, getText is used to read text visible in a div or text box. Following diagram represents a high level interaction of Program and Driver to read one attribute (text, class etc.) for an element

Problem

Often time we are required to read unstructured data from a web page , Like reading all rows and columns from a table. A simple approach would be to get source text or read the table as one unit and then parse the data on the program. When data on table is present nested across multiple non-homogenous elements, you would need to pick up rows and scan each row for columns and read data present in each column.

<table id="reportTable">
<thead>
<tr>
<td>Sno</td>
<td>Name</td>
<td>Dummy1</td>
</tr>
</thead>
<tbody>
<tr>
<td id="tdsno"><span>1</span></td>
<td id="tdname"><div>Alpha</div></td>
<td id="tddummy"><span>Val1</span></td>
<tr>

<tr>
<td id="tdsno"><span>2</span></td>
<td id="tdname"><div>Beta</div></td>
<td id="tddummy"><span>Val2</span></td>
<tr>
</tbody>
</table>

A sample program to read these values would look like following to read each row and then each div within rows. This is time-consuming since each div read invokes an interaction between selenium and web driver.

Sample code for reading elements one by one

Solution

The performance impact on reading the table comes from the fact that each XX.getText() is an interaction between User Program -> Selenium -> WebDriver, which even if executing on local machine is external to program and needs communication between selenium and driver.

The solution is inspired from following facts

  1. Selenium lets you write a adhoc javascript code to be executed against the current state.
  2. Selenium translates data types among java( or other ) and javascript seamlessly.
  3. Executing a script is a single command to selenium and driver.

Blue print of the proposed solution

prepare script to extract data
get rows as html element
for each row get all columns
each column value append it to an row array
append each row to a list
return prepared list
execute above script using JavascriptExecutor
Process the results

Sample implementation

Executing Javascript to read table as one unit

Conclusion

Selenium is a powerful and flexible tool that provides multiple opportunities and a variety of ways to implement those. Using above approach one can read a set of data from webpage rather reading it one by one element on the program.

--

--