Page 1 of 1

Collating a collection of images into one PDF without holding them all in memory at once

Posted: 2018-10-15T17:15:50-07:00
by wbn
One of the things about `magick`/`convert` that I find most useful is the ability to transform a collection of images into a single PDF. However, I've found that for large collections of images, the utility tends to hang indefinitely. I tested it under a debugger and confirmed that yes, it is attempting to load all of the image data into memory before writing it out, making it infeasible to collate arbitrarily large collections of images.

Is it possible to write the PDF without holding all the image data in memory at any given time? I poked the source and found `PingImage`/`PingImages`, which looks like it should be useful for this purpose (since it doesn't load the image data, only metadata and a reference to the on-disk data). However, when I test the utility with the `-ping` flag it doesn't write the image data out to the resulting PDF, only the dimensions.

I understand that in many situations - for instance, when you need to perform multiple transformations on a number of images before writing them - it's much more efficient to hold all the image data in memory rather than reading and writing it to disk multiple times. However, I'm wondering if it's possible to ask the utility (or the API) to optimize for memory efficiency in this case.

Cheers.
-wbn

Re: Collating a collection of images into one PDF without holding them all in memory at once

Posted: 2018-10-15T17:59:38-07:00
by fmw42
I believe you would have to write a script loop over each input image and add that to your pdf one at a time. You could do it as

convert firstinputimage image.pdf
loop over each inputimage besides the first
convert image.pdf inputimage image.pdf

That would continue to add each image to the end of the pdf

Re: Collating a collection of images into one PDF without holding them all in memory at once

Posted: 2018-10-16T04:30:05-07:00
by snibgo
I would convert each image to its own PDF, then use "pdfunite" to unite the PDFs into one.