Race condition in exception handling

Post any defects you find in the released or beta versions of the ImageMagick software here. Include the ImageMagick version, OS, and any command-line required to reproduce the problem. Got a patch for a bug? Post it here.
Post Reply
egb

Race condition in exception handling

Post by egb »

I've seen a lot of failed asserts in a multithreaded environment like this one:
  • assertion=0xf797f128 "list_info != (LinkedListInfo *) ((void *)0)",
    file=0xf797f115 "magick/hashmap.c", line=1972,
    function=0xf797f1f8 "ResetLinkedListIterator"
which happen during function calls like this one:

Code: Select all

#1  0xf78d3e86 in ResetLinkedListIterator (list_info=0x0)
    at magick/hashmap.c:1972
#2  0xf78bf85c in InheritException (exception=0xd84fff04, relative=0xb17144c)
    at magick/exception.c:599
#3  0xf78db082 in CloneImage (image=0xb16e290, columns=122, rows=152, 
    orphan=MagickTrue, exception=0xe3efd670) at magick/image.c:1100
#4  0xf78a9f0a in DespeckleImage (image=0xb16e290, exception=0xe3efd670)
    at magick/effect.c:1429
#5  0xf7ad2fc0 in Magick::Image::despeckle () from /usr/lib/libMagick++.so.2
My opinion is that the locking of the ExceptionInfo during the InheritException call is insufficient:
It only locks the semaphore for the destination ExceptionInfo, but does nothing to prevent that the source ExceptionInfo is deleted during the run.
But since the invalid ExceptionInfo is a member of the source image struct, the problem could as well be that the source image is deleted during the "despeckle" run.
Also possible: the function invoked just before the despeckle (Zoom or Scale) somehow destroyed the image, and the next CloneImage call is not guilty at all, it just happens to notice it first.

In the backtrace above, the souce image object is only used inside one thread, and can't be concurrently accessed by a different one.

Unfortunatly, I couldn't reproduce this crash with a small test programm only executing the despeckle-method, so I don't have more detailed information. Except maybe that at the moment of the crash two other threads were also inside "Despeckle", but were working on different Images.
User avatar
magick
Site Admin
Posts: 11064
Joined: 2003-05-31T11:32:55-07:00

Re: Race condition in exception handling

Post by magick »

This one has us perplexed. The exception list should be valid except if one thread destroys an image why another thread is accessing it. We'll investigate and see if we can spot the problem.
egb

Re: Race condition in exception handling

Post by egb »

Possibly related:
When using helgrind on a threaded program using imagemagick, it finds an unlocked read in DeleteNodeByValueFromSplayTree:

Code: Select all

==11048== Possible data race during read of size 4 at 0x88a0368 by thread #4
==11048==    at 0x60F2D57: DeleteNodeByValueFromSplayTree (splay-tree.c:525)
==11048==    by 0x5FFF8E6: DestroyPixelCacheInfo (cache.c:1499)
==11048==    by 0x5FFFBDD: DestroyPixelCache (cache.c:1414)
==11048==    by 0x5FFEDC8: DestroyImagePixels (cache.c:1381)
==11048==    by 0x6085D5A: DestroyImage (image.c:1623)
==11048==    by 0x60E1109: ResizeImage (resize.c:2237)
==11048==    by 0x60E1364: ZoomImage (resize.c:2985)
==11048==  This conflicts with a previous write of size 4 by thread #27
==11048==    at 0x60F3050: AddValueToSplayTree (splay-tree.c:211)
==11048==    by 0x5FFFF19: AcquirePixelCacheInfo (cache.c:258)
==11048==    by 0x6005D00: GetImagePixelCache (cache.c:2093)
==11048==    by 0x6007EAF: QueueCacheViewAuthenticPixels (cache-view.c:836)
==11048==    by 0x60DD162: VerticalFilter (resize.c:2023)
==11048==    by 0x60E11C7: ResizeImage (resize.c:2229)
==11048==    by 0x60E1364: ZoomImage (resize.c:2985)
But thats only the check for splay_tree->root == NULL, and probably harmless.

And another one here:

Code: Select all

==11048== Possible data race during read of size 4 at 0x88e2040 by thread #3
==11048==    at 0x6005CA7: GetImagePixelCache (cache.c:2078)
==11048==    by 0x6007EAF: QueueCacheViewAuthenticPixels (cache-view.c:836)
==11048==    by 0x60E0243: HorizontalFilter (resize.c:1782)
==11048==    by 0x60E11FA: ResizeImage (resize.c:2231)
==11048==    by 0x60E1364: ZoomImage (resize.c:2985)
==11048==  This conflicts with a previous write of size 4 by thread #25
==11048==    at 0x6005D74: GetImagePixelCache (cache.c:2105)
==11048==    by 0x6007EAF: QueueCacheViewAuthenticPixels (cache-view.c:836)
==11048==    by 0x60DD162: VerticalFilter (resize.c:2023)
==11048==    by 0x60E11C7: ResizeImage (resize.c:2229)
==11048==    by 0x60E1364: ZoomImage (resize.c:2985)
Here, the cache_info->reference_count is read without lock on cache_info->semaphore, and it looks like two threads could enter with cache_info->reference_count==1, and both skip the Copy-on-Write part.

Maybe a few asserts on the reference_count variable could help to catch this sort of problem earlier if it really exist and isn't just a false report by helgrind.
For Example, GetImagePixelCache could assert (image->cache->reference_count==1) before leaving the lock and returning.
And ReferencePixelCache could assert (cache_info->reference_count!=0) before incrementing it.
User avatar
magick
Site Admin
Posts: 11064
Joined: 2003-05-31T11:32:55-07:00

Re: Race condition in exception handling

Post by magick »

Good catch. We'll have a patch by sometime tomorrow.
Armani
Posts: 11
Joined: 2003-11-07T16:21:16-07:00

Re: Race condition in exception handling

Post by Armani »

during my tests I git this helgrind output, that could belong to the assertion Ernst already reported.

I think there is a lock missing in GetNextValueInLinkedList
  • ==6135== Possible data race during write of size 4 at 0x6e8e9a0 by thread #21
    ==6135== at 0x5F3A593: ResetLinkedListIterator (hashmap.c:1977)
    ==6135== by 0x5F25FA2: InheritException (exception.c:605)
    ==6135== by 0x5F417D1: CloneImage (image.c:1100)
    ==6135== by 0x5F9A1CD: ScaleImage (resize.c:2485)
    ==6135== by 0x5BE7AD3: Magick::Image::scale(Magick::Geometry const&) (in /usr/lib/libMagick++.so.2.0.0)
    ==6135== by 0x80493D1: testthread(void*) (in /root/magick_test/test)
    ==6135== by 0xB5: ???
    ==6135== This conflicts with a previous read of size 4 by thread #4
    ==6135== at 0x5F3A9A9: GetNextValueInLinkedList (hashmap.c:635)
    ==6135== by 0x5F25FAD: InheritException (exception.c:606)
    ==6135== by 0x5F417D1: CloneImage (image.c:1100)
    ==6135== by 0x5F9A1CD: ScaleImage (resize.c:2485)
    ==6135== by 0x5BE7AD3: Magick::Image::scale(Magick::Geometry const&) (in /usr/lib/libMagick++.so.2.0.0)
    ==6135== by 0x80493D1: testthread(void*) (in /root/magick_test/test)
    ==6135== by 0xAE: ???
User avatar
magick
Site Admin
Posts: 11064
Joined: 2003-05-31T11:32:55-07:00

Re: Race condition in exception handling

Post by magick »

We have a patch for the problem you reported. Look for it in the ImageMAgick 6.5.0-4 release later on this evening. Thanks.
Post Reply