Page 1 of 1

convert and gs results difference

Posted: 2019-10-07T16:53:02-07:00
by wwubcn
Hello,

I'm doing a OCR project and currently i'm using convert command to convert pdf to jpg.

Due to some reason, i have to use ghostscript now.

From what i see, the image quality from gs is much better.

However, seems OCR has better result from convert command result.

convert result
https://cl.ly/48d7aa1a6ced

gs result
https://cl.ly/eaeb7b32df1e

I wonder what causes this difference. And if i have to use gs now, how could i get similar result as convert?

Commands I use
> convert -quality 100 ~/sample.pdf jpgs/a-%02d.jpg
> gs -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=1 -dGridFitTT=2 "-sDEVICE=png16m" "-r72x72" -sOutputFile=./jpgs/a-%02d.jpg ~/sample.pdf

Thank you very much

Re: convert and gs results difference

Posted: 2019-10-07T17:31:19-07:00
by fmw42
ImageMagick uses Ghostscript to process PDF. The only issue would be what arguments are set in the delegates.xml file for reading PS/PDF/etc.

Re: convert and gs results difference

Posted: 2019-10-08T03:56:16-07:00
by wwubcn
fmw42 wrote: 2019-10-07T17:31:19-07:00 ImageMagick uses Ghostscript to process PDF. The only issue would be what arguments are set in the delegates.xml file for reading PS/PDF/etc.
Thanks. I wonder where the "bold" font come from in convert result.

I used command in delegates.xml for pdf, it outputed ps, not jpg

Re: convert and gs results difference

Posted: 2019-10-08T04:42:40-07:00
by snibgo
The two linked images are clearly from different documents, or different parts of the same document, so we don't know why they have different quality. They seem to be rasterized at different densities.

"-verbose" is useful to see what Ghostscript command IM is using.

Re: convert and gs results difference

Posted: 2019-10-08T05:02:31-07:00
by wwubcn
snibgo wrote: 2019-10-08T04:42:40-07:00 The two linked images are clearly from different documents, or different parts of the same document, so we don't know why they have different quality. They seem to be rasterized at different densities.

"-verbose" is useful to see what Ghostscript command IM is using.
You are correct, they are from different pages in the same pdf.

Here is the log with -verbose

```
convert -verbose sample.pdf a.jpg
'gs' -sstdout=%stderr -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 '-sDEVICE=pngalpha' -dTextAlphaBits=4 -dGraphicsAlphaBits=4 '-r72x72' '-sOutputFile=/var/folders/pb/tbjqx4_56w1cv1htzdm5yh4c0000gn/T/magick-15645ycTMnExb7xla%d' '-f/var/folders/pb/tbjqx4_56w1cv1htzdm5yh4c0000gn/T/magick-156456UMILWLqIH8a' '-f/var/folders/pb/tbjqx4_56w1cv1htzdm5yh4c0000gn/T/magick-15645KqScxlvU4x6G'
/var/folders/pb/tbjqx4_56w1cv1htzdm5yh4c0000gn/T/magick-15645ycTMnExb7xla1 PNG 595x842 595x842+0+0 8-bit sRGB 35364B 0.020u 0:00.011
/var/folders/pb/tbjqx4_56w1cv1htzdm5yh4c0000gn/T/magick-15645ycTMnExb7xla2 PNG 595x842 595x842+0+0 8-bit sRGB 41345B 0.030u 0:00.008
sample.pdf[0] PDF 595x842 595x842+0+0 16-bit sRGB 35364B 0.040u 0:00.008
sample.pdf[1] PDF 595x842 595x842+0+0 16-bit sRGB 41345B 0.010u 0:00.000
sample.pdf=>a-0.jpg[0] PDF 595x842 595x842+0+0 16-bit GrayscaleAlpha Gray 66963B 0.020u 0:00.010
sample.pdf=>a-1.jpg[1] PDF 595x842 595x842+0+0 16-bit GrayscaleAlpha Gray 83686B 0.050u 0:00.021
```

left from gs, right from convert
https://cl.ly/1db1d521d222

Any idea?

Re: convert and gs results difference

Posted: 2019-10-08T05:47:27-07:00
by snibgo
wwubcn wrote:left from gs, right from convert
The resolution seems to be the same (caps height about 7 pixels) but the IM result is monochrome, black and white with no grays, hence the letters are aliased. The command you show "convert -verbose sample.pdf a.jpg" shouldn't remove grays.

Can you link to a sample input PDF?

Re: convert and gs results difference

Posted: 2019-10-08T11:10:01-07:00
by wwubcn
snibgo wrote: 2019-10-08T05:47:27-07:00
wwubcn wrote:left from gs, right from convert
The resolution seems to be the same (caps height about 7 pixels) but the IM result is monochrome, black and white with no grays, hence the letters are aliased. The command you show "convert -verbose sample.pdf a.jpg" shouldn't remove grays.

Can you link to a sample input PDF?
I'm very sorry that i could not share the pdf.

From gs, now i get png file. Then i use convert on the png to create jpg. The jpg is what I need (bold).

I guess my request is a little strange because i am looking for a 'worse' result :D