camelot allow user to tweak table extraction from pdf. extracted table can be pandas DataFrame or other formats, including JSON, Excel, HTML, and Sqlite.
Installation and usage
!pip install "camelot-py[cv]"
import camelot
table = camelot.read_pdf(
filepath,
pages='1', # can be specified in range, which will result in a table list
password=None,
flavor='lattice',
suppress_stdout=False,
layout_kwargs={},
**kwargs,
)
#check extraction result report
table.parsing_report
#convert to df
df=table.df