multiple parallel Convert commands collide

Post any defects you find in the released or beta versions of the ImageMagick software here. Include the ImageMagick version, OS, and any command-line required to reproduce the problem. Got a patch for a bug? Post it here.
Post Reply
Spacetracker
Posts: 2
Joined: 2018-10-24T06:11:32-07:00
Authentication code: 1152

multiple parallel Convert commands collide

Post by Spacetracker »

System: Red Hat Enterprise Linux EL7 cluster

Versions: ImageMagick 7.0.7-8 , ffmpeg version 4.0.2

Call: convert (mp4), ffmpeg delegate, h.264 codec

Observation: Running convert commands serially is successful. Running the same convert commands at the same time, in parallel, on different CPUs and input and output data, and with separate TMP directories, fails. Produces a zero-length mp4 on different random videos each time run.

Hypothesis: The convert utility and/or the ffmpeg delegate share a global disk or memory or daemon source that collide and interfere with each other. It is difficult to point to the convert setup or the ffmpeg codec specifically, because different codecs fail also.

Data:
Convert error when collides: Only one error
onvert: delegate failed `"ffmpeg" -v -1 -mbd rd -trellis 2 -cmp 2 -subcmp 2 -g 300 -i "%F%%d.jpg" "%u.%m" 2> "%Z"' @ error/delegate.c/InvokeDelegate/1065.

When Exceptions turned on. Occur on every call whether a success or failure: Many errors
2018-10-22T15:16:07-04:00 0:21.250 36.810u 7.0.7 Exception convert[190290]: utility.c/ShredFile/1845/Exception
Failed to remove: /****/******/****/magick-190290mQcUkCufgR4O94.cache

Summary: I have run many tests with separate TMP directories and serially and parallel. I do not have permission to run other parallel utilities on our large cluster system. I use SLURM.

Thanks
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: multiple parallel Convert commands collide

Post by snibgo »

What is your "convert" command?

IM software assumes it can use all your memory, unless you tell it otherwise. Perhaps one instance has hogged all the memory, so the other fails due to lack of memory.

IM memory use can be limited, eg "-limit memory 2GiB -limit map 2GiB"

I don't know ffmpeg well, but that may have the same problem.
snibgo's IM pages: im.snibgo.com
Spacetracker
Posts: 2
Joined: 2018-10-24T06:11:32-07:00
Authentication code: 1152

Re: multiple parallel Convert commands collide

Post by Spacetracker »

Thanks. My convert command is formed in a piece of Python code. Montage creates the ccmm*.png series of images.

convert -layers OptimizeTransparency -sampling-factor 4:2:0 -delay 0 "/****/*****/staging/job525057/1/ccmm*.png" "/***/******/results/job525057/1.mp4"

Additional info:
1. Indeed, I have seen this command fail for lack of memory previously and that was my first thought. But I am giving each command 8GB, or even 16GB, in separate threads on separate cores, whether they are running serially or parallel. Only when they are parallel does the Convert fail.

2. I admit that I don't know what happens when using that command. Will it then swap memory to disk when it needs more than that? I used separate directories for the TMP and it made no difference. I think some threads are asking for a common resource that is being used by another thread. Maybe in the ImageMagick directory or daemon structure?

3. Because I am spending a lot of time using Montage beforehand, different Convert commands become active at the exact same time, randomly. I already turned on exceptions. How can I pinpoint the error further besides building it with debug? That would be difficult in my environment, especially for ffmpeg. I can't isolate to Convert or ffmpeg, but it seems to point to ffmpeg not seeing the data it expects from Convert.

PS: I have tried it with and without OpenMP for each thread. It does not make a difference, other than there being fewer opportunities for collision on a single node/blade.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: multiple parallel Convert commands collide

Post by snibgo »

You are using IM v7, but with "convert" which uses v6 syntax. I suggest you migrate to v7, and use "magick" (not "convert" or "magick convert"). "magick" has a tighter syntax that doesn't allow image operations such as "-layers OptimizeTransparency" before images are read.

But that's a side issue, with probably no impact on your problem.

Your convert command reads all images /****/*****/staging/job525057/1/ccmm*.png, and makes an mp4. IM will read all those images into memory , then call ffmpeg to do the work. Roughly how many pixels per png, and how many pngs? Each needs 8 bytes of memory.

For small jobs, doing this is reasonable. For production work (eg thousands of images, each with 4 million pixels) it is very wasteful of memory. I don't use IM to extract frames from videos, or pack them into videos. Instead, I invoke ffmpeg directly. That way, all the frames don't need to be in memory at the same time.

IM can certainly be used in parallel tasks each with their own inputs and outputs with no problem.

As far as I know, the same is true of ffmpeg.

So I suspect a resource conflict. Either two tasks accidentally writing the same files, or out-of-memory. Less likely, but worth checking: out of temporary disk space.

Incidentally, a limit on IM's memory usage won't get passed down to ffmpeg.
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: multiple parallel Convert commands collide

Post by fmw42 »

I will add one more thing to snibgo's information. IM 6 syntax is forgiving, but IM 7 is not. So your command syntax is in the wrong order, especially for IM 7 and even for IM 6. You should read all the input images, if you decide to do that, before any other settings or operators, when you have raster input images. Proper syntax would be

convert -delay 0 "/****/*****/staging/job525057/1/ccmm*.png" -layers OptimizeTransparency -sampling-factor 4:2:0 "/***/******/results/job525057/1.mp4"

Note that the -delay 0, however needs to be before the input images as it set s the delay when reading the input. Or you could put -set delay 0 after reading the input.

See https://imagemagick.org/Usage/basics/#why
Post Reply