Text not fitting into TextBox using "insert_textbox()"

PyMuPDF is great for replacing text in a PDF with following workflow, which is working fine so far:

  1. Find Placeholder-Text
  2. Mark Text-Area (bbox)
  3. Redact Placeholder-Text
  4. Draw new Text

In step 4 insert_textbox() is used instead of insert_text, to make use of the alignment features. This works fine for Arial, but not for e.g. Avenir Next or other fonts. In these cases the height of textbox is always too small to render text. Even if the replacement text is shorter than the placeholder! chars_written always return negative values, meaning not enough space to render the text.

Here some screenshots to visualize the issue (left placeholder, right replacement):

Legend: black dot is text-origin, blue rect is text-height/fontsize, red rect is bbox and green rect is textbox, rounded corners for better visibility.


When measuring text-height and box-height, text is always slightly larger than box (only decimal places):

Font-Height: 35.96999979019165 (bigger)
Box-Height: 35.969970703125 (smaller)

So my assumption to fit the text in: increase the box-height by 0.1 which should be sufficient.

But the result of chars_written is -2.13, meaning no text is drawn (negative value).
Does the box-height needs some (undocumented?) extra margin of +2.14 height???

When doing so (box-height + 2.14), the text ist rendered, but this cannot be the wanted behaviour? Every change in font-size, font-type, etc. results in non-deterministically changes of this required margin. Just giving an amount of +5 margin or so is not an option.

When looking closer to the last screenshot it’s remarkable that text is not starting at the origin, that might be a hint to the culprit?

Has anyone else experienced this, or has an idea whats going on, or how to handle this?
BTW: The problem only concerns height, if width of text and box are matching it works (no margin needed).

This is how I calculate font-height:

font_size = span.get("size")
font_asc = span.get("ascender")
font_desc = span.get("descender")
line_height = font_asc + (-font_desc)
font_height = line_height * font_size

Here how box-height is determined:

bbox = span.get("bbox")
box_height = get_box_height(bbox)

def get_box_height(rect: fitz.Rect) -> float:
    return rect.y1 - rect.y0

Here is how the text is drawn, the embedded font is extracted from PDF and reused as font-file:

chars_written = page.insert_textbox(
    replacement.box,     # FontBox: (87.50, 484.63, 209.10, 520.60)
    replacement.text,    # replaced
    color=color,         # (0,0,0,1)
    align=alignment,     # 1 (centered)
    fontsize=font.size,  # 22.0
    fontname=font.name,  # AvenirNext-Medium
    fontfile=font_path,  # /tmpfile/AvenirNext-Medium.cff
)

If the value is negative it means that:
If negative : no execution. The value returned is the space deficit to store text lines. Enlarge rectangle, decrease fontsize , decrease text amount, etc.

Is the replacement text much larger in terms of the number of characters than the previous text?

It happens with less amounts of characters, I added some screenshots in the previous post. I think count of characters is somehow irrelevant, as not width, but height ist causing the problem! Maybe textbox isn’t the correct tool for this case and alignment is better done by myself???

Needs some investigation by our expert @HaraldLieder I think.
IN the meantime, what about using Story to place your text, e.g. Stories - PyMuPDF 1.26.3 documentation . My understanding is that the layout methods it uses are more sophisticated and you can sprinkle some basic HTML into the mix which might help.

1 Like

As I aim for print and therefore need to work in CMYK I am not sure, if HTML-text is the right answer here…

Appreciate that - I don’t think using stories will convert your document color space or anything, but I get why you may not want to go there.

There is this convenience method too: Page - PyMuPDF 1.26.3 documentation and it should automatically scale the text content to fit whatever box you define I believe. Your text parameter can still just be a plain string so you don’t have to define HTML.

Fonts come with their own specific line height computed as font.ascender - font.descender) * font-size. Both values may be missing or be specified with crazy values.
As a general rule, ascender must be positive, typical values are 0.8 to 1.3.
Descender must be negative with typical values ranging from -0.01 to -0.3.
The insert_textbox method starts writing text at distance font-size * ascender below the top of the textbox. Because we need enough space to also fit the font-size * descender part into the height of the textbox, you should make sure to choose the right (small enough) font size to ensure this.
As per your other comment: Do not expect that the replacement text will be as long as the original one - even when it is the same text and the same font size! Fonts have different character widths.

An additional comment:
There is no need to use insert_textbox just for the reason to profit from its text alignment capabiliies - except you want justified alignment. Left, centered and right adjustment are easy to achieve with the more elementary insert_text. Which has the additional advantage to always appear on page - no hideous logic suppressing output if it won’t in the textbox.

  1. create font = pymupdf.Font(fontfile="myfont.ttf").
  2. compute tlength = font.text_length(text, fontsize=fontsize).
  3. for centered alignment compute shift = (bbox.width - tlength)/2.
  4. for right alignment compute shift = bbox.width - tlength.
  5. for left obviously shift = 0.
  6. Insert text at insertion point bbox.bl + (shift, 0).

Special case is tlength > bbox.width. To fit the text nevertheless into the width of the bbox, we must squeeze it and the scaling factor is bbox.width / tlength. We make a matrix = pymupdf.Matrix(factor, 1) and insert the text via insert_text(bbox.bl, text, morph=(bbox.bl, matrix), ...).

The same could be used also if tlength is smaller than bbox.width, in which case the text is shown stretched.

@HaraldLieder thanks for the explanations. In the given Example using AvenirNext-Medium in font-size 22 the ascender is 1.199 and descender -0.436. So I would assume they are still in the typical range, although the descender is slightly above -0.3.

Nevertheless it doesn’t explain, why the text doesn’t fit into the textbox. ONCE AGAIN IT IS NOT ABOUT TEXT-WIDTH, BUT TEXT-HEIGHT. Let’s do the calculations for the required height:

  • font-size: 22
  • ascender: 1.199
  • descender: 0.436 (using positive value for calculations)

Regarding your explanation, the box-height must be 35.97 at minimum:

  • (font-size * ascender) + (font-size * descender) = (22 * 1.199) + (22 * 0.436) = 35.97

For the Text-Height, we have (same calculation):

Lets create a Rect with these boundaries of the text replace and try to fit it in:

text_height = ascender + descender * font_size  # 35.97 
text_width = font_obj.text_length("replace", font_size)  # 89.61

rect = fitz.Rect(
    0,  # left border
    0,  # top border
    text_width,  # right border
    text_height,  # bottom border
)

The result of chars_written is: -2.13, and the text will only display, until adding +2.14 of additional height:

rect = fitz.Rect(
    0,  # left border
    0,  # top border
    text_width,  # right border
    text_height + 2.14,  # bottom border
)

For now I am reverting to insert_text() with the suggested self-calculated alignment, but there is no reasonable explanation for this behaviour yet…

I looked again into the code and saw that I missed a detail:
When checking whether the text will fit in the box, we add descender * fontsize to the occupied area bottom.
This once was necessary for some awkward fonts (with incorrect descender values) which would lap outside the textbox without this precaution.
So you should be successful if you always increase the y1 value by this amount and use
textbox + (0, 0, 0, descender * fontsize).

This textbox fitting is an ever recurring nuisance. I am seriously considering to auto-decrease font size until we have a fit …

Thanks @HaraldLieder for further investigation. This means 0.436 * 22 = 9.6 of additional height is added, which is much more than the 2.14 additional space needed to draw the text:

It would be nice to have a deterministic behaviour here (reproduce the 2.14 value) or the option to disable this additional space to do calculations on your own.