We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Via #1087 (comment), it seems that there's a bug in how pdfplumber joins lines.
pdfplumber
Yes.
Download the PDF in the linked comment. Then:
import pdfplumber pdf = pdfplumber.open("2022.Sustainability.Report_NYSE_WM_2022.pdf") page = pdf.pages[41] im = page.to_image() im.reset().debug_tablefinder({ "join_x_tolerance": 0 })
And compare to:
( im.reset() .draw_lines( pdfplumber.table.merge_edges( pdfplumber.utils.filter_edges(page.edges, "h"), snap_x_tolerance=0, snap_y_tolerance=0, join_x_tolerance=-1, join_y_tolerance=0, ) ) )
See linked issue.
pdfplumber's table-finding approach should merge all the sub-lines in each visual line into a single line.
The method appears to do something strange with the lines, "finding" only certain portions of them.
See above
0.11.0
The text was updated successfully, but these errors were encountered:
jsvine
No branches or pull requests
Describe the bug
Via #1087 (comment), it seems that there's a bug in how
pdfplumber
joins lines.Have you tried repairing the PDF?
Yes.
Code to reproduce the problem
Download the PDF in the linked comment. Then:
And compare to:
PDF file
See linked issue.
Expected behavior
pdfplumber
's table-finding approach should merge all the sub-lines in each visual line into a single line.Actual behavior
The method appears to do something strange with the lines, "finding" only certain portions of them.
Screenshots
See above
Environment
0.11.0
The text was updated successfully, but these errors were encountered: