I’ve been working on a project that requires the processing of tens of thousands of tiny files. I’ve written a number of scripts to automate this process, but by far the most impressive (or so I thought) was a PHP script that could launch a number of rendering jobs at the same time.

The benefit of this is that most computers have multiple cores per processor these days. This particular script was bottlenecked by the fact that the tools I was using weren’t optimised to run on more than one core at once.

#/bin/sh
# Arguments are filename & job number
echo Optimising $1 on thread $2
/usr/bin/mogrify -colors 32 -dither none $1
pngcrush $1 /tmp/pngcrush$2
mv /tmp/pngcrush$2 $1

The solution was to launch two jobs at a time so that they could run side by side and utilise the full speed of both my cores. (Note: This code is wrong. Jump to the solution.)

$processes = array();

foreach($tiles as $tile){

	echo $tile."\n";

	// This is out mogrification/pngcrushing script.
	// Arguments are filename & process number
	$processes[] = popen('/home/ash/Misc/Scripts/tileCrush.sh '.escapeshellarg($tile).' '.count($processes),'r');

	while(count($processes) > 2){

		// Loop through each process
		for($i=0; $i < count($processes) ; $i++){

			// Read the output
			fgets($processes[$i]);

			// Check if the output is finished, if so the process is finished.
			if(feof($processes[$i])){
				pclose($processes[$i]);

				// Remove the process from the array.
				array_splice(&$processes,$i,1);

				// This is so we can print lots of numbers. :)
				$completedTile = $i;
			}
		}
	}
}

This seemed to work fine, except I was noticing a lot of files failing with a “file not found” error. I couldn’t seem to work out what was going wrong until I looked at the output and saw that the script was getting all of the files muddled up.

Through a shameful oversight, I was passing a non-unique job ID  — count($processes) — to some processes, which meant the job would sometimes use it to clobber the work of another process. Essentially I was overwriting the wrong files because the second argument was subject to change as new jobs started and finished.

I’ve reworked the script to use an universally unique value throughout the entire session. I should have done this in the shell script, but it was nice to implement a progress bar at the same time.

Multi-Process PHP Scripting

Here’s the final script which will launch multiple scripts, wait for them to finish, and continue until there are no parameters left.

Feel free to use it for your own nefarious purposes if you’d like.

$tiles
An array of filenames (or other parameters.)
$processes
An array containing references to each process that is launched. These will be created & destroyed at run time.
while(count($processes) > 2){
This tells the script the maximum number of processes allowed to run at any one time.
$processes = array();

$tilesTotal = count($tiles);

foreach($tiles as $tilesDone => $tile){

	// If this is a blank tile, let's just discard it.
	if(md5_file($tile) == '08108e1c241870ecbc5fe798ccd1229d'){
		unlink($tile);
		echo 'Blank tile, discarding.';
	}else{
		// This is out mogrification/pngcrushing script.
		// Arguments are filename & process number
		$processes[] = popen('/home/ash/Misc/Scripts/tileCrush.sh '.escapeshellarg($tile).' '.$tilesDone,'r');

		while(count($processes) > 2){
			usleep(50000); // Sleep for .1 seconds.

			// Loop through each process
			for($i=0; $i < count($processes) ; $i++){

				// Read the output. Note, fgets reads only one line and performs terribly here.
				fread($processes[$i],1024);

				// Check if the output is finished, if so the process is finished.
				if(feof($processes[$i])){
					pclose($processes[$i]);

					// Remove the process from the array.
					array_splice(&$processes,$i,1);

				}
			}

		}

		echo "$tilesDone/$tilesTotal: $tile), done on process $completedTile\n";
	}
}

I used the forum post Multithreaded PHP, as a reference for this project.


Recent Posts

Post a Comment

Your email is never shared.

*
*