Reading
HTMLTables.readtable — Functionreadtable(
source,
sink=nothing;
header::Bool=true,
id::Union{Nothing,String}=nothing,
class::Union{Nothing,String,Vector{String}}=nothing,
index::Int=1,
number_type::Type=Any
)Reads an HTML table into a sink function such as DataFrame.
Arguments
source: URL or path to the HTML table.sink: the function that materializes the table data.
Keyword Arguments
header::Bool: whether to include the table header.id::String: the id of the HTML table in the HTML document.class::Union{String,Vector{String}}: the class of the HTML table.index::Int: the index of the HTML table in the HTML document.number_type::Type: the return type of the numeric table data.
Returns
sink: the sink function such asDataFramewith the HTML table data ifsinkis specified.tuples::Vector: the table data ifsinkis not specified and theheaderkeyword argument is false.headers::Vector: the table headers ifsinkis not specified and theheaderkeyword argument is true.
Examples
reading an HTML table from a website into a DataFrame:
using HTMLTables, DataFrames
url = "https://www.w3schools.com/html/html_tables.asp"
df = readtable(url, DataFrame)
println(df)6×3 DataFrame
Row │ Company Contact Country
│ String String String
─────┼─────────────────────────────────────────────────────────
1 │ Alfreds Futterkiste Maria Anders Germany
2 │ Centro comercial Moctezuma Francisco Chang Mexico
3 │ Ernst Handel Roland Mendel Austria
4 │ Island Trading Helen Bennett UK
5 │ Laughing Bacchus Winecellars Yoshi Tannamuri Canada
6 │ Magazzini Alimentari Riuniti Giovanni Rovelli Italyreading the second HTML table from a file into a DataFrame:
using HTMLTables, DataFrames
url = "tables.html"
df = readtable(url, DataFrame, index=2)
println(df)4×2 DataFrame
Row │ Name Age
│ String String
─────┼─────────────────
1 │ Bob 25
2 │ Charlie 35
3 │ Alice 30
4 │ David 40reading an HTML table with the id "htmltable" from a string into a DataFrame:
using HTMLTables, DataFrames
html_str = """
<table id="htmltable">
<tr>
<th>Name</th>
<th>Age</th>
</tr>
<tr>
<td>Bob</td>
<td>25</td>
</tr>
<tr>
<td>Charlie</td>
<td>35</td>
</tr>
<tr>
<td>Alice</td>
<td>30</td>
</tr>
<tr>
<td>David</td>
<td>40</td>
</tr>
</table>
"""
df = DataFrame(readtable(html_str, id="htmltable", number_type=Int64))
println(df)4×2 DataFrame
Row │ Name Age
│ String Int64
─────┼─────────────────
1 │ Bob 25
2 │ Charlie 35
3 │ Alice 30
4 │ David 40read the data from the HTML table as a vector of tuples:
using HTMLTables
url = "tables.html"
tuples = readtable(url, number_type=Int64, header=false)
println(tuples)[("Bob", 25), ("Charlie", 35), ("Alice", 30), ("David", 40)]read the data from the HTML table as a matrix:
using HTMLTables, Tables
url = "tables.html"
mtx = readtable(url, Tables.matrix; number_type=Int64, header=false)
println(mtx)Any["Bob" 25; "Charlie" 35; "Alice" 30; "David" 40]