In PyMuPDF4LLM, ignore_graphics set to True causes table detection to be disabled, even though in the documentation it is stated that “Vector graphics are still used for table detection.”
Hi @Adel Thanks for the info! @HaraldLieder is this expected functionality? @Adel can you confirm what your tag for table_strategy
here when you call the method?
Sorry - we forgot to update the documentation.
There is no realistic way how PyMuPDF’s table finder could successfully detect tables without access to any vector elements.
@HaraldLieder Thanks for the clarification, my aim in setting ignore_graphics
to True
is to improve text detection on crowded slides, not necessarily to speed up processing times. It would be ideal if graphics are still processed only for the purpose of table detection.
I suggest to use a moderate value for graphics_limit
instead, e.g. start with 1000 or so. Should be enough for table detection but also limit most of the crazy cases.