Reading
HTMLTables.readtable
— Functionreadtable(
source,
sink=nothing;
header::Bool=true,
id::Union{Nothing,String}=nothing,
class::Union{Nothing,String,Vector{String}}=nothing,
index::Int=1,
number_type::Type=Any
)
Reads an HTML table into a sink function such as DataFrame
.
Arguments
source
: URL or path to the HTML table.sink
: the function that materializes the table data.
Keyword Arguments
header::Bool
: whether to include the table header.id::String
: the id of the HTML table in the HTML document.class::Union{String,Vector{String}}
: the class of the HTML table.index::Int
: the index of the HTML table in the HTML document.number_type::Type
: the return type of the numeric table data.
Returns
sink
: the sink function such asDataFrame
with the HTML table data ifsink
is specified.tuples::Vector
: the table data ifsink
is not specified and theheader
keyword argument is false.headers::Vector
: the table headers ifsink
is not specified and theheader
keyword argument is true.
Examples
reading an HTML table from a website into a DataFrame:
using HTMLTables, DataFrames
url = "https://www.w3schools.com/html/html_tables.asp"
df = readtable(url, DataFrame)
println(df)
6×3 DataFrame
Row │ Company Contact Country
│ String String String
─────┼─────────────────────────────────────────────────────────
1 │ Alfreds Futterkiste Maria Anders Germany
2 │ Centro comercial Moctezuma Francisco Chang Mexico
3 │ Ernst Handel Roland Mendel Austria
4 │ Island Trading Helen Bennett UK
5 │ Laughing Bacchus Winecellars Yoshi Tannamuri Canada
6 │ Magazzini Alimentari Riuniti Giovanni Rovelli Italy
reading the second HTML table from a file into a DataFrame:
using HTMLTables, DataFrames
url = "tables.html"
df = readtable(url, DataFrame, index=2)
println(df)
4×2 DataFrame
Row │ Name Age
│ String String
─────┼─────────────────
1 │ Bob 25
2 │ Charlie 35
3 │ Alice 30
4 │ David 40
reading an HTML table with the id "htmltable" from a string into a DataFrame:
using HTMLTables, DataFrames
html_str = """
<table id="htmltable">
<tr>
<th>Name</th>
<th>Age</th>
</tr>
<tr>
<td>Bob</td>
<td>25</td>
</tr>
<tr>
<td>Charlie</td>
<td>35</td>
</tr>
<tr>
<td>Alice</td>
<td>30</td>
</tr>
<tr>
<td>David</td>
<td>40</td>
</tr>
</table>
"""
df = DataFrame(readtable(html_str, id="htmltable", number_type=Int64))
println(df)
4×2 DataFrame
Row │ Name Age
│ String Int64
─────┼─────────────────
1 │ Bob 25
2 │ Charlie 35
3 │ Alice 30
4 │ David 40
read the data from the HTML table as a vector of tuples:
using HTMLTables
url = "tables.html"
tuples = readtable(url, number_type=Int64, header=false)
println(tuples)
[("Bob", 25), ("Charlie", 35), ("Alice", 30), ("David", 40)]
read the data from the HTML table as a matrix:
using HTMLTables, Tables
url = "tables.html"
mtx = readtable(url, Tables.matrix; number_type=Int64, header=false)
println(mtx)
Any["Bob" 25; "Charlie" 35; "Alice" 30; "David" 40]