A pdfwrite device for MuPDF.js?

The colorspace options for mutool recolor are rather limited (gray, rgb, cmyk). To make my own recoloring, I tried to write a javascript for use with mutool run. The script should read a PDF file, apply some changes (recoloring) to each page, and write the modified PDF file.

Creating a script that copies the objects on each page into a new PDF document was not too difficult. My next step, I thought, would be to use Page.prototype.run(device, transform) with my own Device object that does the recoloring (or anything else I need). This means writing several callback functions like fillText(text, ctm, colorspace, color, alpha).

This is where I get stuck. I can change the color or colorspace in fillText, but then what am I supposed to do? I think I want to call the fillText callback of a device that creates PDF, like the pdfwrite device in ghostscript. However, such a device does not exist in MuPDF, it seems.

My question is, what is the best way to create a pdfwrite device, and is that even possible?

I did try to do what I want in ghostscript, but I have a PDF document with complicated figures that ghostscript does not copy correctly. Also, having a pdfwrite device for MuPDF.js would open up many possibilities for PDF transformations in javascript.

Agreed, a PDF write in MuPDF.js would be nice to have - I can see various ways of creating PDF here and adding content: Advanced - MuPDF 1.26.3
But this seems to be fairly low level - DrawDevice is okay but I think that just creates images, not PDF. I don’t know if the Trace Device section on that page is helpful to you? Advanced - MuPDF 1.26.3

I looked at the Draw and Trace devices, and came to the same conclusions.
Making a pdfwrite device based on the Trace device seems possible, but requires more in-depth knowledge of PDF than I have at the moment.

Yes, there is this insertText method using MuPDF.js - you can see the amount of in-depth knowledge required around defining resources and populating content streams etc. Deprecated Modules - MuPDF.js documentation

1 Like

@nverwer I just been made aware of DocumentWriter in MuPDF.js, you can do this kind of thing:



import * as mupdf from "mupdf"

let buffer = new mupdf.Buffer()
let writer = new mupdf.DocumentWriter(buffer, "pdf", "compress")

let doc = new mupdf.PDFDocument()

let pageObj = doc.addPage([0,0,300,350], 0, "", "");

// Insert page object at the end of the document.
doc.insertPage(-1, pageObj)

for (let i = 0; i < doc.countPages(); ++i) {
	let page = doc.loadPage(i)
	let device = writer.beginPage(page.getBounds())

    let text = new mupdf.Text()

    text.showString(new mupdf.Font("Times-Roman"), [ 16, 0, 0, -16, 100, 30 ], "Hello, world!")
    device.fillText(text, mupdf.Matrix.identity, mupdf.ColorSpace.DeviceRGB, [0, 0, 0], 1)

	page.run(device, mupdf.Matrix.identity)
	writer.endPage()
}
writer.close()
buffer.save("output.pdf")
1 Like

@Jamie_Lemon Ah, I see that DocumentWriter.prototype.beginPage returns a Device. It looks like that is exactly what I needed. Tomorrow I will give it a try. If it works, I will make an example and a merge request for MuPDF.js.

Fantastic! :slight_smile:

@Jamie_Lemon
It turned out that using the DocumentWriter was not as easy as I had hoped.

My version of mutool run does not support modern javascript, so things like let are not allowed. Fortunately, I am old enough to remember ‘classic’ javascript.

It also seems that doc = new mupdf.PDFDocument() and everything that uses doc is not necessary when using the DocumentWriter.

I first tried to just copy a document using DocumentWriter, using this code:

import * as mupdf from "mupdf";

var srcDoc = mupdf.Document.openDocument(scriptArgs[0]);
var dstDocName = scriptArgs[1];
var buffer = new mupdf.Buffer()
var writer = new mupdf.DocumentWriter(buffer, "pdf", "compress")

for (var page = 0; page < srcDoc.countPages(); page++) {
    var srcPage = srcDoc.loadPage(page);
    var pageRect = srcPage.getBounds();
    var device = writer.beginPage(pageRect);
    srcPage.run(device, mupdf.Matrix.identity);
    writer.endPage();
}

writer.close();
buffer.save(dstDocName);

This does indeed copy the source document. However, running this gives a warning:

warning: the pdf device does not support image masks; output may be incomplete

Indeed, the output document is incomplete. Some of the figures are mutilated.
This means that I now have the same problem I had with ghostscript (figures are not copied correctly), except that I have no recoloring yet.

It seems I am now back where I started.
Of course, mutool recolor almost worked, and my next step could be to program my modified recolor function in C (I need an option like ghostscript’s UseFastColor.). However, the last time I programmed C is more than 20 years ago, so that would be quite an adventure.

The MuPDF pdf write device (as used by the Document Writer) does have limitations; not supporting image masks is one of them.

If all you want to do is recolor a PDF, then working at the recolor level is absolutely a better thing to do. The structure of the PDF remains intact, rather than being thrown away as with a Document Writer.

At the C level, the underlying recolor filter is capable of doing the rewrite to pretty much any colorspace that MuPDF understands. It’s just only hooked up for grey/rgb/cmyk at the moment, because that’s all we needed.

If you can tell us more about your exact desired usecase, we can consider improving the functionality.

Thank you, @Robin_Watts , for your response.

The mutool recolor works well for the use case of my client, except for one thing:
Some of the articles that we need to convert to grayscale (as part of a larger process) contain black text, meaning it has CMYK = (0, 0, 0, 100). When converted using mutool recolor -c gray, the color of this text becomes a very dark shade of gray (RGB = (24, 24, 24), or something similar). That is because the underlying colorspace considers CMYK = (0, 0, 0, 100) not the darkest black possible.

We want the black text to become black, which means mapping CMYK = (0, 0, 0, 100) to RGB = (0, 0, 0).
In ghostscript we do this using the parameters -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -dUseFastColor. However, ghostscript (with -sDEVICE=pdfwrite) does not support image masks, and cannot copy some of the pictures we need to convert.

Ghostscript preserves black text with the -dUseFastColor parameter. From the documentation:

In certain cases, it may be desired to not perform ICC color management on DeviceGray, DeviceRGB and DeviceCMYK source colors. This can occur in particular if one is attempting to create an ICC profile for a target device and needed to print pure colorants. In this case, one may want instead to use the traditional Postscript 255 minus operations to convert between RGB and CMYK with black generation and undercolor removal mappings.

If I were to modify the C code, this is what I would try to do for mutool recolor The code for mutool recolor does not look very complicated, but my C skills are quite rusty.

If you would consider adding something like UseFastColor to mutool recolor, that would be awesome!
Please let me know if I can help. I might be able to provide a sample PDF, but my client probably will not allow me to send the whole book that needs to be converted. (It is an academic textbook on translation and interpreting.)