Converting PDF to image results in hidden differences

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
User avatar
discretiongrove
Posts: 13
Joined: 2013-02-19T01:20:09-07:00
Authentication code: 6789

Converting PDF to image results in hidden differences

Post by discretiongrove »

Hi there. I'm new with ImageMagick and have been using it to convert PDFs to jpgs to compare them through Python's Imaging Library(PIL). However, when I do the compare two supposedly identical images don't match. There's an option in PIL to show the difference and the difference is shown as white in a black image, but what appears is just a black image, meaning there should be no difference. Then I tried using Beyond Compare and lo and behold, there are sections that have little discrepancies on them if I use Picture Compare with Tolerance Mode on.

My question is if ImageMagick converts a PDF file into an image shouldn't white be white and there shouldn't be any hidden stuff behind it? How could I convert the PDFs so that those parts hidden in white are converted fully to white?

If you like, I'd post sample images to show what I'm talking about.
I'm not really Stephen Malkmus.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Converting PDF to image results in hidden differences

Post by fmw42 »

Yes it would be best to give us an example. Please note that IM uses Ghostscript to deal with PDF conversions to other formats. So be sure you have the latest Ghostscript installed and note that if your image is CMYK or has transparency that could be a factor. So it would be best to have one of your pdf files to examine.

Does the pdf have an imbedded image or is it a true vector pdf? That may also have some effect as the format of the imbedded image may be a factor also.

You should post your pdf to some free hosting service such as drop box and then put a link from there on your next post here.
User avatar
discretiongrove
Posts: 13
Joined: 2013-02-19T01:20:09-07:00
Authentication code: 6789

Re: Converting PDF to image results in hidden differences

Post by discretiongrove »

The Ghostscript installed seems to be 9.05. How would one be able to discern if an image is indeed CMYK? I tried saving the images into different file formats such as BMP, PNG and GIF and I think with some of them the discrepancies seem to get reduced but it never goes away completely.

The PDF does have embedded image on it but that part doesn't get flagged as a discrepancy. It seems that with some of the elements in the PDF such as text there's an invisible coating for each letter that is wrapped around it. How would one know if a PDF is a true vector one?

I'm very sorry kind sir, but I can't upload PDF samples because they contain text of highly confidential matter.

Here are images taken from Beyond Compare:

Image 1:
Image
What this is are three images the first and third one are being compared and the difference is the image on the center. The blue is the supposed difference. Notice that the two images are identical, and that the differences are located inside the black box. The images show a colon and a black box. The box is drawn using PIL.

Image 2:
Image
This image shows an underline and a black box beneath it. You can see that there are two differences on the whites and three on the blacks.

These images are supposed to be identical, and yet, Beyond Compare shows them not to be identical through Tolerance Mode. I've used Binary Mode and they appear to be identical. The problem with that is I have no idea how to compare that way using PIL or in ImageMagick.

I would like to add that I've tried using the simple tutorials for comparing images through ImageMagick but still, the images aren't identical no matter what method I use.
I'm not really Stephen Malkmus.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Converting PDF to image results in hidden differences

Post by fmw42 »

I have no idea what beyond compare is doing. Can you not create or get some non-proprietary pdf that shows this issue?

Note that PNG now has a rendering intent and that could cause differences with other formats. GIF has limited colors. JPG is lossy. So it is hard to say what differences you might see between formats.

IM has its own compare function. So perhaps you should check that out. See

http://www.imagemagick.org/Usage/compare/
http://www.imagemagick.org/Usage/compare/#statistics

To see if an image is cmyk, just look at the verbose information of that file.

identify -verbose yourimage

The colorspace will say whether the image is cmyk. But if you have some image imbedded in the PDF, I cannot say for sure that IM will get the colorspace of the imbedded image as opposed to the pdf itself. You can look at the verbose information and see if there is other information particular to the imbedded image, such as profiles.

Profiles may also be an issue if there are any.

As a start, get the verbose info and report it back here.

What version of IM and on what platform are you doing this work?
User avatar
discretiongrove
Posts: 13
Joined: 2013-02-19T01:20:09-07:00
Authentication code: 6789

Re: Converting PDF to image results in hidden differences

Post by discretiongrove »

I tried comparing the two images using ImageMagick and viewing the differences by specifying a difference image output and here are the results which are the same as Beyond Compare's:

Image
Image

Here's the verbose information of both the images being compared and it doesn't say CMYK on both images:
Image 1

Code: Select all

Image: 1A.jpg
  Format: JPEG (Joint Photographic Experts Group JFIF format)
  Class: DirectClass
  Geometry: 612x792+0+0
  Units: Undefined
  Type: TrueColor
  Endianess: Undefined
  Colorspace: sRGB
  Depth: 8-bit
  Channel depth:
    red: 8-bit
    green: 8-bit
    blue: 8-bit
  Channel statistics:
    Red:
      min: 0 (0)
      max: 255 (1)
      mean: 218.127 (0.855401)
      standard deviation: 87.3755 (0.342649)
      kurtosis: 2.21227
      skewness: -2.04777
    Green:
      min: 0 (0)
      max: 255 (1)
      mean: 218.145 (0.855472)
      standard deviation: 87.3578 (0.342579)
      kurtosis: 2.21479
      skewness: -2.04827
    Blue:
      min: 0 (0)
      max: 255 (1)
      mean: 218.024 (0.854996)
      standard deviation: 87.3708 (0.342631)
      kurtosis: 2.19984
      skewness: -2.04355
  Image statistics:
    Overall:
      min: 0 (0)
      max: 255 (1)
      mean: 218.099 (0.85529)
      standard deviation: 87.368 (0.34262)
      kurtosis: 2.20896
      skewness: -2.04653
  Rendering intent: Perceptual
  Gamma: 0.454545
  Chromaticity:
    red primary: (0.64,0.33)
    green primary: (0.3,0.6)
    blue primary: (0.15,0.06)
    white point: (0.3127,0.329)
  Interlace: None
  Background color: white
  Border color: srgb(223,223,223)
  Matte color: grey74
  Transparent color: black
  Compose: Over
  Page geometry: 612x792+0+0
  Dispose: Undefined
  Iterations: 0
  Compression: JPEG
  Quality: 75
  Orientation: Undefined
  Properties:
    date:create: 2013-02-21T13:15:32+08:00
    date:modify: 2013-02-21T12:57:09+08:00
    jpeg:colorspace: 2
    jpeg:sampling-factor: 2x2,1x1,1x1
    signature: a834a61bf31084219c8623c0b71f8490d250846a9ef6256b8c2ccd8872667e90
  Artifacts:
    filename: 1A.jpg
    verbose: true
  Tainted: False
  Filesize: 120KB
  Number pixels: 485K
  Pixels per second: 34.62MB
  User time: 0.016u
  Elapsed time: 0:01.013
  Version: ImageMagick 6.8.2-9 2013-02-10 Q16 http://www.imagemagick.org
Image 2

Code: Select all

Image: 1B.jpg
  Format: JPEG (Joint Photographic Experts Group JFIF format)
  Class: DirectClass
  Geometry: 612x792+0+0
  Units: Undefined
  Type: TrueColor
  Endianess: Undefined
  Colorspace: sRGB
  Depth: 8-bit
  Channel depth:
    red: 8-bit
    green: 8-bit
    blue: 8-bit
  Channel statistics:
    Red:
      min: 0 (0)
      max: 255 (1)
      mean: 218.127 (0.855401)
      standard deviation: 87.3754 (0.342649)
      kurtosis: 2.21226
      skewness: -2.04777
    Green:
      min: 0 (0)
      max: 255 (1)
      mean: 218.145 (0.855472)
      standard deviation: 87.3577 (0.342579)
      kurtosis: 2.21479
      skewness: -2.04827
    Blue:
      min: 0 (0)
      max: 255 (1)
      mean: 218.024 (0.854996)
      standard deviation: 87.3707 (0.34263)
      kurtosis: 2.19984
      skewness: -2.04355
  Image statistics:
    Overall:
      min: 0 (0)
      max: 255 (1)
      mean: 218.099 (0.85529)
      standard deviation: 87.3679 (0.342619)
      kurtosis: 2.20896
      skewness: -2.04653
  Rendering intent: Perceptual
  Gamma: 0.454545
  Chromaticity:
    red primary: (0.64,0.33)
    green primary: (0.3,0.6)
    blue primary: (0.15,0.06)
    white point: (0.3127,0.329)
  Interlace: None
  Background color: white
  Border color: srgb(223,223,223)
  Matte color: grey74
  Transparent color: black
  Compose: Over
  Page geometry: 612x792+0+0
  Dispose: Undefined
  Iterations: 0
  Compression: JPEG
  Quality: 75
  Orientation: Undefined
  Properties:
    date:create: 2013-02-21T13:15:32+08:00
    date:modify: 2013-02-21T12:57:09+08:00
    jpeg:colorspace: 2
    jpeg:sampling-factor: 2x2,1x1,1x1
    signature: b11f55a5997d769ee240141c41f2e290de8571831c1f36d2a2173e587d95319c
  Artifacts:
    filename: 1B.jpg
    verbose: true
  Tainted: False
  Filesize: 120KB
  Number pixels: 485K
  Pixels per second: 34.62MB
  User time: 0.016u
  Elapsed time: 0:01.013
  Version: ImageMagick 6.8.2-9 2013-02-10 Q16 http://www.imagemagick.org
The version of ImageMagick seems to be the latest according to the verbose information, and I'm using Windows 7 to do the converting. I'm having a hard time trying to find a way to replicate the PDFs. What I'm thinking is I'll do a Word document and convert it to PDF by some program. Any suggestions so I could try to replicate the PDF? PrimoPDF, PDFCreator and CutePDFs not available to me.
Last edited by discretiongrove on 2013-02-20T23:15:19-07:00, edited 1 time in total.
I'm not really Stephen Malkmus.
User avatar
discretiongrove
Posts: 13
Joined: 2013-02-19T01:20:09-07:00
Authentication code: 6789

Re: Converting PDF to image results in hidden differences

Post by discretiongrove »

Hey there, this is quite an update. I don't know how but it seems that if I use ImageMagick to compare and do this:

Code: Select all

compare -metric AE -fuzz 5% 1A.jpg 1B.jpg output.jpg
The value returned is 0 meaning there are no differences! :)
I'm not really Stephen Malkmus.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Converting PDF to image results in hidden differences

Post by fmw42 »

discretiongrove wrote:Hey there, this is quite an update. I don't know how but it seems that if I use ImageMagick to compare and do this:

Code: Select all

compare -metric AE -fuzz 5% 1A.jpg 1B.jpg output.jpg
The value returned is 0 meaning there are no differences! :)
That is probably due to the -fuzz 5% which says do not consider them different if within 5% of being the same.

But there is perhaps a misunderstanding. I am not sure what you are comparing. Are you trying to convert the PDF to jpg with two different applications? Or are you converting two different PDF files? Using two applications going to jpg, the two applications can compress totally differently or use different compression codes.

Try converting to png or to tif so that you do not get any losses that could be different.

Also I was asking for the verbose information from the PDF files not the jpg files. I want to see what you are starting with.

Please clarify the process you are using and what the starting image or images are?
User avatar
discretiongrove
Posts: 13
Joined: 2013-02-19T01:20:09-07:00
Authentication code: 6789

Re: Converting PDF to image results in hidden differences

Post by discretiongrove »

I'm sorry if I'm not clear. Now that you've mentioned it, the fuzz factor could affect my comparisons. Using a fuzz factor of 5 would not detect a stray dot or a comma as a difference then?

Here's the verbose information of the PDF that I got using ImageMagick:
PDF1

Code: Select all

Image: 1.pdf
  Format: PDF (Portable Document Format)
  Class: DirectClass
  Geometry: 612x792+0+0
  Resolution: 72x72
  Print size: 8.5x11
  Units: Undefined
  Type: TrueColorAlpha
  Endianess: Undefined
  Colorspace: sRGB
  Depth: 16/8-bit
  Channel depth:
    red: 8-bit
    green: 8-bit
    blue: 8-bit
    alpha: 8-bit
  Channel statistics:
    Red:
      min: 0 (0)
      max: 65535 (1)
      mean: 57546.6 (0.878105)
      standard deviation: 21415.6 (0.326781)
      kurtosis: 3.35258
      skewness: -2.31303
    Green:
      min: 0 (0)
      max: 65535 (1)
      mean: 57548.9 (0.87814)
      standard deviation: 21401.1 (0.32656)
      kurtosis: 3.358
      skewness: -2.31375
    Blue:
      min: 0 (0)
      max: 65535 (1)
      mean: 57524.3 (0.877765)
      standard deviation: 21415.5 (0.32678)
      kurtosis: 3.33692
      skewness: -2.30845
    Alpha:
      min: 0 (0)
      max: 65535 (1)
      mean: 4667.68 (0.0712242)
      standard deviation: 13904.9 (0.212175)
      kurtosis: 8.74319
      skewness: -3.10793
  Image statistics:
    Overall:
      min: 0 (0)
      max: 65535 (1)
      mean: 58371.8 (0.890697)
      standard deviation: 19802.8 (0.302171)
      kurtosis: 4.4746
      skewness: -2.5238
  Alpha: srgba(255,255,255,0)   #FFFFFFFFFFFF0000
  Rendering intent: Perceptual
  Gamma: 0.454545
  Chromaticity:
    red primary: (0.64,0.33)
    green primary: (0.3,0.6)
    blue primary: (0.15,0.06)
    white point: (0.3127,0.329)
  Interlace: None
  Background color: white
  Border color: srgba(223,223,223,1)
  Matte color: grey74
  Transparent color: none
  Compose: Over
  Page geometry: 612x792+0+0
  Dispose: Undefined
  Iterations: 0
  Scene: 0 of 6
  Compression: Undefined
  Orientation: Undefined
  Properties:
    date:create: 2013-02-21T14:02:25+08:00
    date:modify: 2013-02-21T14:02:25+08:00
    pdf:HiResBoundingBox: 612x792+0+0
    pdf:Version: PDF-1.2 
    signature: 7238be2ec5fa3f2ef83a75db908770232293229cfa1efc1461663070e81a28ff
  Profiles:
    Profile-icc: 2576 bytes
      Description: Artifex Software sRGB ICC Profile
      Manufacturer: Artifex Software sRGB ICC Profile
      Model: Artifex Software sRGB ICC Profile
      Copyright: Copyright Artifex Software 2011
  Artifacts:
    filename: 1.pdf
    verbose: true
  Tainted: False
  Filesize: 66.5KB
  Number pixels: 485K
  Pixels per second: 4.53MB
  User time: 0.109u
  Elapsed time: 0:01.107
  Version: ImageMagick 6.8.2-9 2013-02-10 Q16 http://www.imagemagick.org
PDF2

Code: Select all

Image: 2.pdf
  Format: PDF (Portable Document Format)
  Class: DirectClass
  Geometry: 612x792+0+0
  Resolution: 72x72
  Print size: 8.5x11
  Units: Undefined
  Type: TrueColorAlpha
  Endianess: Undefined
  Colorspace: sRGB
  Depth: 16/8-bit
  Channel depth:
    red: 8-bit
    green: 8-bit
    blue: 8-bit
    alpha: 8-bit
  Channel statistics:
    Red:
      min: 0 (0)
      max: 65535 (1)
      mean: 57543.3 (0.878054)
      standard deviation: 21419.5 (0.326841)
      kurtosis: 3.34917
      skewness: -2.3123
    Green:
      min: 0 (0)
      max: 65535 (1)
      mean: 57545.5 (0.878089)
      standard deviation: 21405 (0.326619)
      kurtosis: 3.35459
      skewness: -2.31301
    Blue:
      min: 0 (0)
      max: 65535 (1)
      mean: 57521 (0.877714)
      standard deviation: 21419.4 (0.326839)
      kurtosis: 3.33352
      skewness: -2.30772
    Alpha:
      min: 0 (0)
      max: 65535 (1)
      mean: 4668.28 (0.0712334)
      standard deviation: 13905.2 (0.21218)
      kurtosis: 8.74354
      skewness: -3.10795
  Image statistics:
    Overall:
      min: 0 (0)
      max: 65535 (1)
      mean: 58369.1 (0.890656)
      standard deviation: 19806 (0.302221)
      kurtosis: 4.47138
      skewness: -2.52318
  Alpha: srgba(255,255,255,0)   #FFFFFFFFFFFF0000
  Rendering intent: Perceptual
  Gamma: 0.454545
  Chromaticity:
    red primary: (0.64,0.33)
    green primary: (0.3,0.6)
    blue primary: (0.15,0.06)
    white point: (0.3127,0.329)
  Interlace: None
  Background color: white
  Border color: srgba(223,223,223,1)
  Matte color: grey74
  Transparent color: none
  Compose: Over
  Page geometry: 612x792+0+0
  Dispose: Undefined
  Iterations: 0
  Scene: 0 of 6
  Compression: Undefined
  Orientation: Undefined
  Properties:
    date:create: 2013-02-21T14:02:34+08:00
    date:modify: 2013-02-21T14:02:34+08:00
    pdf:HiResBoundingBox: 612x792+0+0
    pdf:Version: PDF-1.2 
    signature: 9c96c4a131515698a79a0bf90c594ebbabaab89767a16980ee4e156b3000956b
  Profiles:
    Profile-icc: 2576 bytes
      Description: Artifex Software sRGB ICC Profile
      Manufacturer: Artifex Software sRGB ICC Profile
      Model: Artifex Software sRGB ICC Profile
      Copyright: Copyright Artifex Software 2011
  Artifacts:
    filename: 2.pdf
    verbose: true
  Tainted: False
  Filesize: 66.5KB
  Number pixels: 485K
  Pixels per second: 4.039MB
  User time: 0.109u
  Elapsed time: 0:01.119
  Version: ImageMagick 6.8.2-9 2013-02-10 Q16 http://www.imagemagick.org
A little background of what I'm doing: I'm creating an automated regression testing suite that involves PDFs. I generate PDFs and once they're approved as free of errors I store them as the basis for checking on the next batch of PDFs to be generated if something changed or not. Basically the same system generates these PDFs.

So, the automated PDF testing goes like this:
1. I take two PDFs, the error free PDF(PDF A) and the newly generated PDF(PDF B).
2. I convert them both to JPEGs using ImageMagick. This produces two folders each with the pages of each PDFs.
3. Using Python Imaging Library(PIL) I draw black or white boxes on some parts of the images on both PDFs to hide the dynamically changing elements because they shouldn't be compared.
4. I compare the pages by using PIL. I'm using Python because it takes care of everything: getting the PDFs, invoking the ImageMagick command to convert the PDFs, compare the images, count the matching and non-matching pages, and generate a report by the end.

I did a research and it seems that there's no library yet that does comparison of PDFs that ignore regions. That's why I'm doing it with images because there are lots of image comparison applications now. DiffPDF actually compares the PDFs and even shows regions but they don't have a developer library available.
I'm not really Stephen Malkmus.
User avatar
discretiongrove
Posts: 13
Joined: 2013-02-19T01:20:09-07:00
Authentication code: 6789

Re: Converting PDF to image results in hidden differences

Post by discretiongrove »

Update:

I tried converting to PNG and TIF and here are the results:

PNG - converting to this format and comparing them through the ImageMagick commands I specified above resulted in a value of 64. It is also worth noting that the output of the difference is the same as the JPG version.
TIF - converting and comparing using this format did reduce the difference but the first and last set of horizontal lines still exist as discrepancies. See the second image on the post with the ImageMagick outputs. The value now of the comparison is 32.

I think I found a way to create new PDFs and replicate the error. I'll post the PDFs once I have them.
I'm not really Stephen Malkmus.
User avatar
discretiongrove
Posts: 13
Joined: 2013-02-19T01:20:09-07:00
Authentication code: 6789

Re: Converting PDF to image results in hidden differences

Post by discretiongrove »

Hey there, I've finally duplicated the errors. Now I think I'm correct in assuming that there are phantom colors that surround the text that are invisible to the naked eye.

Here are the PDFs:
http://totaldrench.files.wordpress.com/2013/02/pdf1.pdf
http://totaldrench.files.wordpress.com/2013/02/pdf2.pdf

Here are the images produced:
Image1 converted from the PDF2
Image

Image1 with a black box
Image

Image2 converted from the PDF2
Image

Image2 with a black box
Image

Output of comparison through ImageMagick
Image

Output of comparison through Beyond Compare
Image

I'm using the black boxes to cover the last lines, which as you can see are different. As you can see on the black boxes there are differences even though it's all black. Any thoughts on how black would stay black or something? Let me know if you can't download any of the stuff posted here.

Update - Here's the verbose information of the PDF files:
PDF1

Code: Select all

Image: PDF1.pdf
  Format: PDF (Portable Document Format)
  Class: DirectClass
  Geometry: 612x792+0+0
  Resolution: 72x72
  Print size: 8.5x11
  Units: Undefined
  Type: PaletteAlpha
  Endianess: Undefined
  Colorspace: sRGB
  Depth: 16/8-bit
  Channel depth:
    red: 1-bit
    green: 1-bit
    blue: 1-bit
    alpha: 8-bit
  Channel statistics:
    Red:
      min: 0 (0)
      max: 65535 (1)
      mean: 64776.2 (0.988422)
      standard deviation: 7010.76 (0.106977)
      kurtosis: 81.3809
      skewness: -9.13132
    Green:
      min: 0 (0)
      max: 65535 (1)
      mean: 64776.2 (0.988422)
      standard deviation: 7010.76 (0.106977)
      kurtosis: 81.3809
      skewness: -9.13132
    Blue:
      min: 0 (0)
      max: 65535 (1)
      mean: 64776.2 (0.988422)
      standard deviation: 7010.76 (0.106977)
      kurtosis: 81.3809
      skewness: -9.13132
    Alpha:
      min: 0 (0)
      max: 65535 (1)
      mean: 442.002 (0.00674452)
      standard deviation: 4634.13 (0.0707123)
      kurtosis: 139.837
      skewness: -11.563
  Image statistics:
    Overall:
      min: 0 (0)
      max: 65535 (1)
      mean: 64855.4 (0.98963)
      standard deviation: 6498.6 (0.0991623)
      kurtosis: 92.222
      skewness: -9.66482
  Alpha: srgba(255,255,255,0)   #FFFFFFFFFFFF0000
  Colors: 26
  Histogram:
      1050: (    0,    0,    0,65535) #000000000000 black
       877: (    0,    0,    0,34952) #0000000000008888 srgba(0,0,0,0.533333)
       496: (    0,    0,    0, 4369) #0000000000001111 srgba(0,0,0,0.0666667)
       402: (    0,    0,    0,48059) #000000000000BBBB srgba(0,0,0,0.733333)
       379: (    0,    0,    0,17476) #0000000000004444 srgba(0,0,0,0.266667)
       329: (    0,    0,    0,30583) #0000000000007777 srgba(0,0,0,0.466667)
       313: (    0,    0,    0,56797) #000000000000DDDD srgba(0,0,0,0.866667)
       301: (    0,    0,    0,52428) #000000000000CCCC srgba(0,0,0,0.8)
       291: (    0,    0,    0,13107) #0000000000003333 srgba(0,0,0,0.2)
       242: (    0,    0,    0, 8738) #0000000000002222 srgba(0,0,0,0.133333)
       236: (    0,    0,    0,39321) #0000000000009999 srgba(0,0,0,0.6)
       232: (    0,    0,    0,26214) #0000000000006666 srgba(0,0,0,0.4)
       216: (    0,    0,    0,61166) #000000000000EEEE srgba(0,0,0,0.933333)
       137: (    0,    0,    0,43690) #000000000000AAAA srgba(0,0,0,0.666667)
        97: (    0,    0,    0,21845) #0000000000005555 srgba(0,0,0,0.333333)
         3: (    0,    0,    0,12336) #0000000000003030 srgba(0,0,0,0.188235)
         2: (    0,    0,    0,32896) #0000000000008080 srgba(0,0,0,0.501961)
         2: (    0,    0,    0,42919) #000000000000A7A7 srgba(0,0,0,0.654902)
         1: (    0,    0,    0,20046) #0000000000004E4E srgba(0,0,0,0.305882)
         1: (    0,    0,    0,36751) #0000000000008F8F srgba(0,0,0,0.560784)
         1: (    0,    0,    0,36494) #0000000000008E8E srgba(0,0,0,0.556863)
         1: (    0,    0,    0,38807) #0000000000009797 srgba(0,0,0,0.592157)
         1: (    0,    0,    0,39835) #0000000000009B9B srgba(0,0,0,0.607843)
         1: (    0,    0,    0,46774) #000000000000B6B6 srgba(0,0,0,0.713725)
         1: (    0,    0,    0,31354) #0000000000007A7A srgba(0,0,0,0.478431)
    479092: (65535,65535,65535,    0) #FFFFFFFFFFFF0000 srgba(255,255,255,0)
  Rendering intent: Perceptual
  Gamma: 0.454545
  Chromaticity:
    red primary: (0.64,0.33)
    green primary: (0.3,0.6)
    blue primary: (0.15,0.06)
    white point: (0.3127,0.329)
  Interlace: None
  Background color: white
  Border color: srgba(223,223,223,1)
  Matte color: grey74
  Transparent color: none
  Compose: Over
  Page geometry: 612x792+0+0
  Dispose: Undefined
  Iterations: 0
  Compression: Undefined
  Orientation: Undefined
  Properties:
    date:create: 2013-02-21T16:39:03+08:00
    date:modify: 2013-02-21T16:39:03+08:00
    pdf:HiResBoundingBox: 612x792+0+0
    pdf:Version: PDF-1.2 
    signature: 6a5d37e948627efd81ba7a32ce337e489b19727eab07366d9c9c45316e0d5d96
  Profiles:
    Profile-icc: 2576 bytes
      Description: Artifex Software sRGB ICC Profile
      Manufacturer: Artifex Software sRGB ICC Profile
      Model: Artifex Software sRGB ICC Profile
      Copyright: Copyright Artifex Software 2011
  Artifacts:
    filename: PDF1.pdf
    verbose: true
  Tainted: False
  Filesize: 14.9KB
  Number pixels: 485K
  Pixels per second: 7.694MB
  User time: 0.016u
  Elapsed time: 0:01.062
  Version: ImageMagick 6.8.2-9 2013-02-10 Q16 http://www.imagemagick.org
PDF2

Code: Select all

Image: PDF2.pdf
  Format: PDF (Portable Document Format)
  Class: DirectClass
  Geometry: 612x792+0+0
  Resolution: 72x72
  Print size: 8.5x11
  Units: Undefined
  Type: PaletteAlpha
  Endianess: Undefined
  Colorspace: sRGB
  Depth: 16/8-bit
  Channel depth:
    red: 1-bit
    green: 1-bit
    blue: 1-bit
    alpha: 8-bit
  Channel statistics:
    Red:
      min: 0 (0)
      max: 65535 (1)
      mean: 64769.2 (0.988315)
      standard deviation: 7042.78 (0.107466)
      kurtosis: 80.5881
      skewness: -9.0878
    Green:
      min: 0 (0)
      max: 65535 (1)
      mean: 64769.2 (0.988315)
      standard deviation: 7042.78 (0.107466)
      kurtosis: 80.5881
      skewness: -9.0878
    Blue:
      min: 0 (0)
      max: 65535 (1)
      mean: 64769.2 (0.988315)
      standard deviation: 7042.78 (0.107466)
      kurtosis: 80.5881
      skewness: -9.0878
    Alpha:
      min: 0 (0)
      max: 65535 (1)
      mean: 446.799 (0.00681771)
      standard deviation: 4662.76 (0.0711492)
      kurtosis: 138.422
      skewness: -11.5066
  Image statistics:
    Overall:
      min: 0 (0)
      max: 65535 (1)
      mean: 64848.9 (0.989531)
      standard deviation: 6529.62 (0.0996356)
      kurtosis: 91.3056
      skewness: -9.61787
  Alpha: srgba(255,255,255,0)   #FFFFFFFFFFFF0000
  Colors: 26
  Histogram:
      1073: (    0,    0,    0,65535) #000000000000 black
       881: (    0,    0,    0,34952) #0000000000008888 srgba(0,0,0,0.533333)
       500: (    0,    0,    0, 4369) #0000000000001111 srgba(0,0,0,0.0666667)
       413: (    0,    0,    0,48059) #000000000000BBBB srgba(0,0,0,0.733333)
       383: (    0,    0,    0,17476) #0000000000004444 srgba(0,0,0,0.266667)
       331: (    0,    0,    0,30583) #0000000000007777 srgba(0,0,0,0.466667)
       317: (    0,    0,    0,56797) #000000000000DDDD srgba(0,0,0,0.866667)
       298: (    0,    0,    0,13107) #0000000000003333 srgba(0,0,0,0.2)
       298: (    0,    0,    0,52428) #000000000000CCCC srgba(0,0,0,0.8)
       242: (    0,    0,    0, 8738) #0000000000002222 srgba(0,0,0,0.133333)
       235: (    0,    0,    0,26214) #0000000000006666 srgba(0,0,0,0.4)
       233: (    0,    0,    0,39321) #0000000000009999 srgba(0,0,0,0.6)
       216: (    0,    0,    0,61166) #000000000000EEEE srgba(0,0,0,0.933333)
       135: (    0,    0,    0,43690) #000000000000AAAA srgba(0,0,0,0.666667)
        96: (    0,    0,    0,21845) #0000000000005555 srgba(0,0,0,0.333333)
         2: (    0,    0,    0,32896) #0000000000008080 srgba(0,0,0,0.501961)
         2: (    0,    0,    0,12336) #0000000000003030 srgba(0,0,0,0.188235)
         2: (    0,    0,    0,42919) #000000000000A7A7 srgba(0,0,0,0.654902)
         1: (    0,    0,    0,20046) #0000000000004E4E srgba(0,0,0,0.305882)
         1: (    0,    0,    0,36751) #0000000000008F8F srgba(0,0,0,0.560784)
         1: (    0,    0,    0,36494) #0000000000008E8E srgba(0,0,0,0.556863)
         1: (    0,    0,    0,38807) #0000000000009797 srgba(0,0,0,0.592157)
         1: (    0,    0,    0,39835) #0000000000009B9B srgba(0,0,0,0.607843)
         1: (    0,    0,    0,46774) #000000000000B6B6 srgba(0,0,0,0.713725)
         1: (    0,    0,    0,31354) #0000000000007A7A srgba(0,0,0,0.478431)
    479040: (65535,65535,65535,    0) #FFFFFFFFFFFF0000 srgba(255,255,255,0)
  Rendering intent: Perceptual
  Gamma: 0.454545
  Chromaticity:
    red primary: (0.64,0.33)
    green primary: (0.3,0.6)
    blue primary: (0.15,0.06)
    white point: (0.3127,0.329)
  Interlace: None
  Background color: white
  Border color: srgba(223,223,223,1)
  Matte color: grey74
  Transparent color: none
  Compose: Over
  Page geometry: 612x792+0+0
  Dispose: Undefined
  Iterations: 0
  Compression: Undefined
  Orientation: Undefined
  Properties:
    date:create: 2013-02-21T16:39:13+08:00
    date:modify: 2013-02-21T16:39:13+08:00
    pdf:HiResBoundingBox: 612x792+0+0
    pdf:Version: PDF-1.2 
    signature: 29e98677c90f1c5a14a0dd2721c1dd965d70bd835bbb9428c94075479d84a36a
  Profiles:
    Profile-icc: 2576 bytes
      Description: Artifex Software sRGB ICC Profile
      Manufacturer: Artifex Software sRGB ICC Profile
      Model: Artifex Software sRGB ICC Profile
      Copyright: Copyright Artifex Software 2011
  Artifacts:
    filename: PDF2.pdf
    verbose: true
  Tainted: False
  Filesize: 14.9KB
  Number pixels: 485K
  Pixels per second: 30.29MB
  User time: 0.016u
  Elapsed time: 0:01.016
  Version: ImageMagick 6.8.2-9 2013-02-10 Q16 http://www.imagemagick.org
I'm not really Stephen Malkmus.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Converting PDF to image results in hidden differences

Post by snibgo »

I did the following, on Windows 7, IM v6.7.9:

Code: Select all

"%IMG%convert" pdf1.pdf pdf1.png
"%IMG%convert" pdf2.pdf pdf2.png

"%IMG%compare" -metric AE pdf1.png pdf2.png pdfDiff.png

"%IMG%convert" pdf1.png -draw "rectangle 115,177 293,189" pdf1a.png
"%IMG%convert" pdf2.png -draw "rectangle 115,177 293,189" pdf2a.png

"%IMG%compare" -metric AE pdf1a.png pdf2a.png pdfDiffa.png
The first compare gave "791", so 791 pixels were different. The second gave 0; no pixels were different. As expected.

I suspect your black boxes are not entirely covering the text that changes.

The PDFs have a transparent background. This may cause a complication; how are you drawing the black boxes?
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Converting PDF to image results in hidden differences

Post by fmw42 »

If you can block out white areas that seems to be different and get no errors using PNG, then I would have to assume the issue is in the created PDF files. So that goes back to whatever tool is creating the PDFs, assuming I am following your descriptions correctly.

Or the PDF to PNG conversion is introducing differences. Then that would be a Ghostscript issue if done with IM?

If the PDF files have a transparent background, then the Ghostscript needs to be using sDEVICE=pngalpha rather than pnmraw assuming there is just one page to the PDFs. If more than one page, then Ghostscript cannot correctly handle multiple transparent pages. See your delegates.xml file

<delegate decode="ps:alpha" stealth="True" command=""gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pnmraw" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>

If using pnmraw, it is possible that GS gets rid of the alpha channel, leaving the data with the marks under the transparency.


Furthermore, the compare is ignoring the alpha channel. So if one image has something under it that is made transparent by an alpha channel, then the compare will notice it anyway.

Perhaps you should flatten the PDF files against a white background when converting to 24-bit PNG. If you convert to 8-bit PNG that could also cause differences.
User avatar
discretiongrove
Posts: 13
Joined: 2013-02-19T01:20:09-07:00
Authentication code: 6789

Re: Converting PDF to image results in hidden differences

Post by discretiongrove »

Hi snibgo, I'm drawing the black boxes through PIL. Now that you made an example of it, I haven't tried using ImageMagick to draw the black boxes.

Hi fmw42, well I did just that but the problem is if some of the elements are really close to each other then the whites get differences too. See the image difference with three sets of horizontal lines? Those elements are close to each other, so I think there is an overlapping. What did you mean about Ghostscript not being able to handle multiple pages? It can't convert a PDF with multiple pages into jpg with pngalpha on? I'm converting the images to JPG, should I switch to PNG? I thought of converting the images to PNG because I remembered that one has transparent backgrounds. Still, the difference persisted.

When you mean flatten I should make another white JPG then use it as a background for the converted image as PNG?

By the way, the PDFs are generated by iText, a developer library for generating PDFs.
I'm not really Stephen Malkmus.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Converting PDF to image results in hidden differences

Post by fmw42 »

Ghostscript, so I am told, cannot process multiple pages with transparency using device=pngalpha. device=pnmraw can process multiple pages but not with transparency. I am not sure if the transparency is lost or what happens. With pngalpha, you will likely get only the first page if there are multiple pages with transparency.

Part of your problem may be you are converting to a lossy jpg format. Thus two different image conversions to jpg (from different pdfs) may cause different parts of the image to compress differently. Thus you may be seeing this difference.

PNG or TIF is not lossy and both will keep transparency. But you are limited to converting only single page pdfs with transparency by GS.
User avatar
discretiongrove
Posts: 13
Joined: 2013-02-19T01:20:09-07:00
Authentication code: 6789

Re: Converting PDF to image results in hidden differences

Post by discretiongrove »

Well, I tried changing that configuration to pngraw and when I tried to convert the PDF to PNG I got this error:

Code: Select all

Unknown device: pngraw
Unrecoverable error: undefined in .uninstallpagedevice
About my conversion to JPG I just use the simplest conversion method in ImageMagick

Code: Select all

convert PDF.pdf PDF.jpg
I'll try doing that white on transparent thing and see if it helps.

By the way I tried converting to PNG using pngalpha and all the pages have their transparencies.
I'm not really Stephen Malkmus.
Post Reply