First let us
see how map works internally.
map(func, iterable[, chunksize])
It is
parallel equivalent of map function. Apply the function to every item of
iterable and return the result. The chunksize parameter will cause the iterable
to be split into pieces of approximately that size, and each piece is submitted
as a separate task.
If you
provide the chunk size, map converts the iterable to list and divide it into
chunks and submit these chunks to the processes.
from multiprocessing import Pool, Process from time import sleep import os def process(task): print("Started task ", task, " PID :", os.getpid()) sleep(task) return str(task)+" Finished" if __name__=="__main__": myPool = Pool(5) tasks=[] for i in range(20): tasks.append(i) print("Submitted tasks to pool") results = myPool.map(process, tasks, 4) print("Got the results") for result in results: print(result)
Output
Submitted tasks to pool Started task 0 PID : 91427 Started task 1 PID : 91427 Started task 4 PID : 91428 Started task 8 PID : 91429 Started task 12 PID : 91430 Started task 16 PID : 91431 Started task 2 PID : 91427 Started task 3 PID : 91427 Started task 5 PID : 91428 Started task 9 PID : 91429 Started task 6 PID : 91428 Started task 13 PID : 91430 Started task 7 PID : 91428 Started task 17 PID : 91431 Started task 10 PID : 91429 Started task 14 PID : 91430 Started task 11 PID : 91429 Started task 18 PID : 91431 Started task 15 PID : 91430 Started task 19 PID : 91431 Got the results 0 Finished 1 Finished 2 Finished 3 Finished 4 Finished 5 Finished 6 Finished 7 Finished 8 Finished 9 Finished 10 Finished 11 Finished 12 Finished 13 Finished 14 Finished 15 Finished 16 Finished 17 Finished 18 Finished 19 Finished
Main problems with map
a.
First
map should convert iterable to chunks, so it has to load entire iterable into
memory and convert this to list.if the iterable is large. However, turning the
iterable into a list can have a very high memory cost, since the entire list
will need to be kept in memory.
b.
You
will get the results only after all the tasks finished execution. No partial
results.
c.
Another
problem is processes which finishes tasks early sits idle, which impact
performance. In our case Process1 finishes tasks 0, 1, 2, 3 early than process
4. Process1 sits idle after completion of 4(0, 1, 2, 3) tasks. In this kind of
scenarios, we are not using the multiprocessors effectively.
Please go
through the code once, I defined a pool of 5 processes and chunk size of 4.
When I submitted 20 tasks, map divides these 20 tasks into a chunk of size 4.
So Process 1
get the chunk with tasks 0, 1, 2, 3
So Process 2
get the chunk with tasks 4, 5, 6, 7
So Process 3
get the chunk with tasks 8, 9, 10, 11
So Process 4
get the chunk with tasks 12, 13, 14, 15
So Process 5
get the chunk with tasks 16, 17, 18, 19
No comments:
Post a Comment