I have read every blog post and thread I can find on multiprocessing with arcpy and none of the fixes in them have fully addressed my problem.
I'm trying to do a relatively simple watershed calculation using multiprocessing.
The 'worker' function looks like this:
Given a list of points, it snaps the point to high flow accumulation, calculates the watershed and converts it to a polygon.
Using one process, this works fine.
I have a dictionary where each value is a list of points and I am trying to do the multiprocessing over this dictionary. The multiprocessing function looks like this:
It is as simple as can be and just returns the list of 'Apply_Result' objects. I then run this function from a script.
When using multiprocessing, sometimes it works, but more often than not I get one of these errors:
ERROR 010088: Invalid input geodataset (Layer, Tin, etc.).]
or
Unable to remove directory. Possible causes:
1- Not owner of the directory
2- Another person or application is accessing this directory
or even
FATAL ERROR(INFADI)
MISSING FILE OR DIRECTORY
There seems to be no pattern as to if/when these errors will occur and which one it will be...
Any ideas?
I'm trying to do a relatively simple watershed calculation using multiprocessing.
The 'worker' function looks like this:
Code:
def multi_watershed(pnts, branchID, flowdir, flowacc, scratchWks):
direc = tempfile.mkdtemp(dir = scratchWks) # If called in a pll process, needs to write to seperate directories
arcpy.env.scratchWorkspace = direc
polylist = []
for i, p in enumerate(pnts):
pnt = arcpy.PointGeometry(arcpy.Point(p.x, p.y, ID=i)) #Convert the shapely point to an arcpy point
pourpt = sa.SnapPourPoint(pnt, flowacc, 1000)
ws = sa.Watershed(flowdir, pourpt)
out = os.path.join(direc, "pol_%i"%i) #Generate a filename for the output polygon
arcpy.RasterToPolygon_conversion(ws, out)
polylist.append(out) #Append the output file to the list to be returned
res = (branchID, polylist)
return resUsing one process, this works fine.
I have a dictionary where each value is a list of points and I am trying to do the multiprocessing over this dictionary. The multiprocessing function looks like this:
Code:
def watershed_pll(data, flowdir, flowacc, tempfolder, proc=4):
""" Calculate the watershed for each station point using parallel processing """
pool = Pool(processes = proc)
jobs = []
for key, val in data.iteritems():
jobs.append(pool.apply_async(multi_watershed, (val, key, flowdir, flowacc, temp)))
pool.close()
pool.join()
return jobsWhen using multiprocessing, sometimes it works, but more often than not I get one of these errors:
ERROR 010088: Invalid input geodataset (Layer, Tin, etc.).]
or
Unable to remove directory. Possible causes:
1- Not owner of the directory
2- Another person or application is accessing this directory
or even
FATAL ERROR(INFADI)
MISSING FILE OR DIRECTORY
There seems to be no pattern as to if/when these errors will occur and which one it will be...
Any ideas?