Thought I’d open up a discussion on the techniques to add watermarks. We have text based watermarks and image based, I generally use something like this to add the watermark to each page of a document:
import pymupdf
def add_image_watermark(input_pdf, output_pdf, watermark_image):
doc = pymupdf.open(input_pdf)
# create a pixmap from the image
pixmap = pymupdf.Pixmap(watermark_image)
for page_num in range(doc.page_count):
page = doc[page_num]
page_rect = page.rect
# Calculate scaling to fit image appropriately
scale_x = page_rect.width * 0.3 / pixmap.width # 30% of page width
scale_y = page_rect.height * 0.3 / pixmap.height # 30% of page height
scale = min(scale_x, scale_y) # Maintain aspect ratio
# Calculate position (center of page)
img_width = pixmap.width * scale
img_height = pixmap.height * scale
x = (page_rect.width - img_width) / 2
y = (page_rect.height - img_height) / 2
# Define the rectangle where image will be placed
target_rect = pymupdf.Rect(x, y, x + img_width, y + img_height)
# Insert the pixmap image at the back of the page
page.insert_image(target_rect, pixmap=pixmap, overlay=False)
doc.save(output_pdf)
doc.close()
# Usage
add_image_watermark("test.pdf", "logo_watermarked.pdf", "logo.png")
And this is fine, but what if I want opacity in the image? Or to rotate the image. I’m sure this can be improved …
1 Like
If an image is inserted on a page, one can choose from images supplied as a Pixmap (as you did), or via a filename or an in-memory image (i.e. as a bytes
object).
The insertion method insert_image()
will return the object number (“xref”) it has used for the image.
Specifically for watermarks, the image will be the same for all pages. To support maximum speed, the insertion method supports “xref=xref” as parameter. Providing a positive xref
value, will obsolete everything else (Pixmap, filename, bytes object). Therefore, the following loop is the fastest way to put the same image (company logo, watermark, …) on every page:
obj_num = 0
for page in doc:
obj_num = page.insert_image(
page.rect,
filename="image-file.png",
xref=obj_num,
)
When processing the first page, xref
is 0 so the image will be inserted in the PDF. For subsequent pages, obj_num
is positive, so only a reference to the already existing image is added - which works with blinding speed.
For image positioning on the page, by default the method will determine the maximum possible size to fit in the specified rectangle. This means that the displayed image boundary box will equal the rectangle’s width or height or both. In any case, the image is centered: the center points of rectangle and image bbox will always be the same.
1 Like
Currently, image insertion via Page
method insert_image()
supports rotation by integer multiples of 90° only.
Transparent images are supported, but transparency cannot be influenced or created: non-transparent images will hide content underneath. The only way out is putting images in background - which is not a good solution, particularly for watermarks.
We are considering to remove these restrictions and support both, arbitrary rotation angles and adding transparency (without modifying the image itself).
2 Likes
Aha! Yes, makes sense to optimize like this - thanks!
1 Like