[Bug] [PyMuPDF4LLM] ignore_graphics disables Table detection

In PyMuPDF4LLM, ignore_graphics set to True causes table detection to be disabled, even though in the documentation it is stated that “Vector graphics are still used for table detection.”

Hi @Adel Thanks for the info! @HaraldLieder is this expected functionality? @Adel can you confirm what your tag for table_strategy here when you call the method?

1 Like

Sorry - we forgot to update the documentation.
There is no realistic way how PyMuPDF’s table finder could successfully detect tables without access to any vector elements.

1 Like

@HaraldLieder Thanks for the clarification, my aim in setting ignore_graphics to True is to improve text detection on crowded slides, not necessarily to speed up processing times. It would be ideal if graphics are still processed only for the purpose of table detection.

I suggest to use a moderate value for graphics_limit instead, e.g. start with 1000 or so. Should be enough for table detection but also limit most of the crazy cases.

1 Like