Camelot

Posted by neverset on November 15, 2020

camelot allow user to tweak table extraction from pdf. extracted table can be pandas DataFrame or other formats, including JSON, Excel, HTML, and Sqlite.

Installation and usage

!pip install "camelot-py[cv]"
import camelot
table = camelot.read_pdf(
    filepath,
    pages='1', # can be specified in range, which will result in a table list
    password=None,
    flavor='lattice',
    suppress_stdout=False,
    layout_kwargs={},
    **kwargs,
)
#check extraction result report
table.parsing_report
#convert to df
df=table.df