Trimming and cleaning up large number of photographed checks

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Masejoer
Posts: 23
Joined: 2016-06-23T11:34:37-07:00
Authentication code: 1151

Trimming and cleaning up large number of photographed checks

Post by Masejoer »

I have been trying to crop-to-edge and cleanup personal-checks for OCR. I have a few checks working, but I think I need assistance in getting better image-processing to improve overall accuracy over a wider sample. All checks are digitized with cameras, not scanner, so images will always shoot for 50% gray rather than bright white. Some photos may be further away than others also, so I can't use a fixed crop value.

I am currently working on the below linked image, and I am having issues getting it to trim correctly. Nothing seems to work better than the below two trim steps. It is leaving a block at the bottom of the image.

Image: http://masejoer.com/Images/blackened.jpg
Output: http://masejoer.com/Images/blackened_trimmed.jpg

Where I'm at so far for all the image processing:

Code: Select all

convert -fuzz 35% -trim -trim in.jpg out.jpg (crop to border)
convert -gamma 0.25 -auto-level -negate -lat 30x30+10% -negate in.jpg out.jpg (helps with cleanup of background noise on darker checks)
convert -morphology close Diamond:2 in.jpg out.jpg (remove noise)
convert -blur 1x1 in.jpg out.jpg (slight blur improves threshold)
convert -threshold 5% in.jpg out.jpg (binarize)


One last thing that I've been unable to figure out - removing everything that is not close to a grayscale pixel. If it has color hues (dark color check backgrounds or non-black pen strokes), turn them white. Closest I've gotten is finding color codes for various colors and using small fuzz values to burn them white. Is there an accurate way to turn non-grayscale pixels white?

Thank you for any help. I've gone about as far as I can after tweaking and testing for over a dozen hours.



Edit: Windows version - ImageMagick-7.0.1-6-portable-Q16-x64. Currently doing all my testing through a batch file.

I have ImageMagick currently setup to have commands, then inputfile and outputfile just for easier readability and tweaking. It's easier to see what I have going on per line. I know some commands don't work unless the input file is first - all of the ones I'm current using work fine as "COMMAND ARGS IN OUT". My batch file actually looks more like this:

Code: Select all

call :Profiler %magick%\convert -fuzz 35%%%% -trim -trim check_step^^!stepold^^!.jpg check_step^^!stepnew^^!.jpg
call :Profiler %magick%\convert -gamma 0.25 -auto-level -negate -lat 30x30+10% -negate check_step^^!stepold^^!.jpg check_step^^!stepnew^^!.jpg
call :Profiler %magick%\convert -morphology close Diamond:2 check_step^^!stepold^^!.jpg check_step^^!stepnew^^!.jpg
call :Profiler %magick%\convert -blur 1x1 check_step^^!stepold^^!.jpg check_step^^!stepnew^^!.jpg
call :Profiler %magick%\convert -threshold 5%%%% check_step^^!stepold^^!.jpg check_step^^!stepnew^^!.jpg
Last edited by Masejoer on 2016-06-23T12:42:26-07:00, edited 2 times in total.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Trimming and cleaning up large number of photographed checks

Post by fmw42 »

What is your IM version and platform? If on Unix system (Linux, Mac OSX or Windows with Cygwin), see my script, textcleaner, at the link below. That might help you get more OCR readable text. It also calls -lat.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Trimming and cleaning up large number of photographed checks

Post by fmw42 »

P.S. Proper IM syntax is to read the input first and then apply operators to it. See http://www.imagemagick.org/Usage/basics/#syntax
Masejoer
Posts: 23
Joined: 2016-06-23T11:34:37-07:00
Authentication code: 1151

Re: Trimming and cleaning up large number of photographed checks

Post by Masejoer »

fmw42 wrote:What is your IM version and platform? If on Unix system (Linux, Mac OSX or Windows with Cygwin), see my script, textcleaner, at the link below. That might help you get more OCR readable text. It also calls -lat.
Windows version - ImageMagick-7.0.1-6-portable-Q16-x64. Currently doing all my testing through a batch file.

I have ImageMagick currently setup to have commands, then inputfile and outputfile just for easier readability and tweaking. It's easier to see what I have going on per line. I know some commands don't work unless the input file is first - all of the ones I'm current using work fine as "COMMAND ARGS IN OUT". My batch file actually looks more like this:

Code: Select all

call :Profiler %magick%\convert -fuzz 35%%%% -trim -trim check_step^^!stepold^^!.jpg check_step^^!stepnew^^!.jpg
call :Profiler %magick%\convert -gamma 0.25 -auto-level -negate -lat 30x30+10% -negate check_step^^!stepold^^!.jpg check_step^^!stepnew^^!.jpg
call :Profiler %magick%\convert -morphology close Diamond:2 check_step^^!stepold^^!.jpg check_step^^!stepnew^^!.jpg
call :Profiler %magick%\convert -blur 1x1 check_step^^!stepold^^!.jpg check_step^^!stepnew^^!.jpg
call :Profiler %magick%\convert -threshold 5%%%% check_step^^!stepold^^!.jpg check_step^^!stepnew^^!.jpg
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Trimming and cleaning up large number of photographed checks

Post by snibgo »

I don't understand your commands. You have five commands, all operating on the same input file and making the same output. So only the last has any effect. Is that right?

I suggest:

Don't deliberately use unsupported syntax.

Don't use JPEG for storing intermediate results. In fact, don't ever use it for anything.
snibgo's IM pages: im.snibgo.com
Masejoer
Posts: 23
Joined: 2016-06-23T11:34:37-07:00
Authentication code: 1151

Re: Trimming and cleaning up large number of photographed checks

Post by Masejoer »

The profiler is checking performance per command to see how long each stage takes. It increments on each line.

This is a simplified version of what is happening:

Code: Select all

convert original.jpg -fuzz 35% -trim -trim post1.bmp
convert post1.bmp -gamma 0.25 -auto-level -negate -lat 30x30+10% -negate post2.bmp
convert post2.bmp -morphology close Diamond:2 post3.bmp
convert post3.bmp -blur 1x1 post4.bmp
convert post4.bmp -threshold 5% post5.bmp
I'm trying to get trimming to work more consistently across a few dozen test checks. I'm also trying to get rid of anything that isn't the routing and account number - colors need to be turned to white. The rest I can crop out if I can get trim working better.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Trimming and cleaning up large number of photographed checks

Post by snibgo »

-trim -trim
What is the second trim for? Does it do anything?

The background colour is almost the same as the cheque, with no good boundary top or left. The lighting is uneven, so bottom-right is darker than top-left. Your "-fuzz" is large to compensate for this. A better method would be to adjust the image so the four corners have the same colour values. Then your trim won't need such a large fuzz, and will probably work better.
I'm also trying to get rid of anything that isn't the routing and account number
Are those the near-black numbers top-right? Then you can use code like this:

Code: Select all

convert blackened.jpg -fuzz 5% -fill White +opaque gray(13%) -fill Black +opaque White out.png
snibgo's IM pages: im.snibgo.com
Masejoer
Posts: 23
Joined: 2016-06-23T11:34:37-07:00
Authentication code: 1151

Re: Trimming and cleaning up large number of photographed checks

Post by Masejoer »

snibgo wrote:
-trim -trim
What is the second trim for? Does it do anything?

The background colour is almost the same as the cheque, with no good boundary top or left. The lighting is uneven, so bottom-right is darker than top-left. Your "-fuzz" is large to compensate for this. A better method would be to adjust the image so the four corners have the same colour values. Then your trim won't need such a large fuzz, and will probably work better.
I'm also trying to get rid of anything that isn't the routing and account number
Are those the near-black numbers top-right? Then you can use code like this:

Code: Select all

convert blackened.jpg -fuzz 5% -fill White +opaque gray(13%) -fill Black +opaque White out.png
The trimming is likely borderline effective. These allowed the original test checks to crop down well, but likely not reliable for great accuracy among a larger sample.

If I had control over the quality of all the checks, I would have better pictures. These are some examples of the worse-cases.

The routing and account numbers are the magnetic-ink glyph characters on the bottom to bottom left of checks. They are always grayscale. Should be black, but cameras do a bad job due to how they expose a shot when set to full automatic. This is all I'm trying to read with OCR - I don't care about the rest of the data on the check, nothing with color, no handwriting. It seems I'll never have much luck with checks that weren't taken against a solid background, preferably white.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Trimming and cleaning up large number of photographed checks

Post by snibgo »

I prefer black backgrounds because paper is rarely black so we can easily remove the background. In addition, shadows don't show well on black, so we don't need to worry about them.
snibgo's IM pages: im.snibgo.com
Masejoer
Posts: 23
Joined: 2016-06-23T11:34:37-07:00
Authentication code: 1151

Re: Trimming and cleaning up large number of photographed checks

Post by Masejoer »

Still having issues with trim. How can I get ImageMagick to trim to document border more consistently? Uneven lighting will always exist whether it's from environment, or flash on camera. This test check uses built-in flash against black paper backdrop. Something that may work for both black backdrops with white documents, and white backdrops with lighter-shaded documents. A much higher fuzz works in this one case, but not other photos.

Starting image: http://masejoer.com/Images/IM/check_step1.jpg

Code: Select all

Run convert check_step1.tiff -fuzz 38%%%% -trim -trim check_step2.tiff
Results: http://masejoer.com/Images/IM/check_step2.jpg

Code: Select all

convert check_step2.tiff -fuzz 45%%%% -trim -trim check_step3.tiff
Results: http://masejoer.com/Images/IM/check_step3.jpg
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Trimming and cleaning up large number of photographed checks

Post by snibgo »

For the black backround, this gives the crop parameter:

Code: Select all

convert check_step1.jpg -blur 0x3 -threshold 50% -format %@ info:

4666x2142+377+480
snibgo's IM pages: im.snibgo.com
Masejoer
Posts: 23
Joined: 2016-06-23T11:34:37-07:00
Authentication code: 1151

Re: Trimming and cleaning up large number of photographed checks

Post by Masejoer »

snibgo wrote:For the black backround, this gives the crop parameter:

Code: Select all

convert check_step1.jpg -blur 0x3 -threshold 50% -format %@ info:

4666x2142+377+480
Nice to know I can do that. Not sure if you can nest that within a larger command. Right now I am doing the below and getting good results across all checks:

Code: Select all

FOR /F "tokens=* USEBACKQ" %F IN (`image_original -blur 0x8 -threshold 45% -format %@ info:`) DO (SET var=%F)
convert image_original -crop !var! image_step1

convert image_step1 -colorspace gray -negate -lat 80x80+10% -negate image_step2
convert image_step2 -resize 3000 image_step3
convert image_step3 -gravity South -chop 0x8% image_step4
convert image_step4 -chop 0x89% image_step5
convert image_step5 -gravity East -chop 20%x0 image_step6
convert image_step6 -gravity West -chop 4%x0 image_step7
convert image_step7 -morphology close Diamond:1 image_step8
convert image_step8 -gamma 0.25 -auto-level image_step9
convert image_step9 -blur 1x1 image_step10
convert image_step10 -threshold 60% image_step11
Not sure if possible to do something like this:

Code: Select all

convert image_original -crop (convert image_original -blur 0x8 -threshold 45% -format %@ info:) image_step1
One remaining issue - is there a way to completely remove pixels that aren't near grayscale?

All purple strokes should be removed from this (along with red, blue, etc, no matter how dark the color - I want to keep shades of gray only, turning everything else white). Is there a way to say remove all pixels that aren't already "within 20% of grayscale", or whatever terminology to use for such a task?
Image

Processed - purple becomes black stroke that causes OCR issues:
Image

Won't help for black penstrokes, but such a filter would greatly improve overall sample accuracy.

This type of thing sorta works, but seems like there must be a better way:

Code: Select all

convert input -fuzz 20% -fill white -opaque "#FF0000" output
convert input -fuzz 20% -fill white -opaque "#00FF00" output
convert input -fuzz 20% -fill white -opaque "#0000FF" output
convert input -fuzz 20% -fill white -opaque "#FFFF00" output
convert input -fuzz 20% -fill white -opaque "#00FFFF" output
convert input -fuzz 20% -fill white -opaque "#FF00FF" output
convert input -fuzz 20% -fill white -opaque "#880000" output
convert input -fuzz 20% -fill white -opaque "#008800" output
convert input -fuzz 20% -fill white -opaque "#000088" output
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Trimming and cleaning up large number of photographed checks

Post by snibgo »

Masejoer wrote:convert image_original -crop (convert image_original -blur 0x8 -threshold 45% -format %@ info:) image_step1
The syntax is wrong. If you want to trim, just use "-trim":
convert image_original -crop -blur 0x8 -threshold 45% -trim image_step1
Masejoer wrote:All purple strokes should be removed from this (along with red, blue, etc, no matter how dark the color - I want to keep shades of gray only, turning everything else white).
You want to turn all pixels that are above a given threshold of saturation into white? For example:

Code: Select all

convert check_color.jpg ( +clone -colorspace HCL -channel G -separate +channel -auto-level -threshold 40% +write x.png ) -compose Lighten -composite s.png
snibgo's IM pages: im.snibgo.com
Masejoer
Posts: 23
Joined: 2016-06-23T11:34:37-07:00
Authentication code: 1151

Re: Trimming and cleaning up large number of photographed checks

Post by Masejoer »

snibgo wrote: The syntax is wrong. If you want to trim, just use "-trim":
convert image_original -crop -blur 0x8 -threshold 45% -trim image_step1

Code: Select all

convert input -crop -blur 0x8 -threshold 45%% -trim output

Code: Select all

ERROR: convert: InvalidArgument '-crop': -blur @ error/convert.c/ConvertImageCommand/1215.
snibgo wrote: You want to turn all pixels that are above a given threshold of saturation into white? For example:

Code: Select all

convert check_color.jpg ( +clone -colorspace HCL -channel G -separate +channel -auto-level -threshold 40% +write x.png ) -compose Lighten -composite s.png
Non-grayscale pixels, not just saturation.Purples, blues, reds, pinks, greens, yellows, oranges, etc are far from grayscale and I'm trying to simply white them out. The above command makes nearly the entire image white.

Here is an example of what I'm trying to do. Start with the following color palette:
Image

End up with the following (colors are turned white, but grayscale is left intact):
Image

I tried to dissect the above command to see how I could tweak it to do what I need, but was unsuccessful.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Trimming and cleaning up large number of photographed checks

Post by snibgo »

snibgo wrote:convert image_original -crop -blur 0x8 -threshold 45% -trim image_step1
Yeah, sorry, remove the "-crop".

Code: Select all

convert c:pictures\palette.png ( +clone -colorspace HCL -channel G -separate +channel -auto-level -threshold 40% +write x.png ) -compose Lighten -composite grey_pal.png
This is my result, with v6.9.2-5:
Image
Lower the threshold, eg to 10%, to turn more colours to white.

If you are using BAT files, you have doubled the %, haven't you?
snibgo's IM pages: im.snibgo.com
Post Reply