Thursday, 12 September 2013

multiprocessing.Pool with a global variable

multiprocessing.Pool with a global variable

I am using the Pool class from python's multiprocessing library to do some
shared memory processing on an HPC cluster.
Here is an abstraction of what I am trying to do:
poolVar = Pool(processes=numThreads)
argsArray = [ARGS ARRAY GOES HERE]
output = poolVar.map(myFunction, argsArray)
def myFunction(x):
# the object is a global variable in this case
return myFunction2(x,object)
def myFunction2(x,object):
return object.f(x)
The problem I am having is that the value of the output variable is
different each time I run my program (even though the function object.f()
is a deterministic function). (If numThreads = 1 then the output variable
is the same each time I run the program. In addition, the output variable
is not drastically different between runs when numThreads > 1.)
I have tried creating the object rather than storing it as a global variable:
def myFunction(x):
object = createObject()
return myFunction2(x,object)
However, in my program the object creation is expensive. Thus, I would
like to not have to create the object each time.
Do you have any tips? I am very new to parallel programming so I could be
going about this all wrong. I decided to use the Pool class since I wanted
to start with something simple. But I am willing to try a better way of
doing it.

No comments:

Post a Comment