Financing for startups My Calendar
Nov 13

Recently I have just had my hands on Python when I am working at MusicMetric. I am a Python newbie and just want to write down some experience that I have learned so that I can revise anytime I want (in the way I can understand quickly) and it might also help other people who are new to Python.

First I want to answer for the question “why not multi-threading but multi-processing ?”. The reason is that the GIL (Global Interpreter Lock) makes multi-threading in Python not truly parallel and much less efficient (click here for more detail).

Thank to Multiprocessing module (a process-based “threading” interface) which is available on Python 2.6+, multi-processing in Python is now easier than ever.

1. Using Pool - a quick approach:

import multiprocessing

def testing(arg):
    x = 0
    for i in xrange(arg):
        x += i * i
    return x

if __name__ == "__main__":
    pool = multiprocessing.Pool(N_PROCESSES)
    print "processing..."
    results = pool.map(testing, range(20000000))
    print results
  • Main advantages:

    • Quick implementation
  • Main disadvantages:

    • Inefficient memory usage
    • Inflexible

2.Using Process:

import multiprocessing

def event_func(event):
    print '\t%r is waiting' % multiprocessing.current_process()
    event.wait()
    print '\t%r has woken up' % multiprocessing.current_process()

def test_event():
    event = multiprocessing.Event()

    processes = [multiprocessing.Process(target=event_func, args=(event,))
                 for i in range(5)]

    for p in processes:
        p.start()

    print 'main is sleeping'
    time.sleep(2)

    print 'main is setting event'
    event.set()

    #wait until all processes stop
    for p in processes:
        p.join()

Advantages:

  • Flexible
  • More efficient in memory usage than Pool

3. Shared memory:

Using Multiprocess.Pipe() and Multiprocess.Queue():

- Pipe() is used for one-to-one communication.
- Queue() is used for many-to-many communication.
Note:

  • If a process is killed using Process.terminate() or os.kill() while using a Queue, data will be corrupted and won’t be able to be used in other processes.
  • Processes won’t terminate untill all buffered item have been flushed to the Pipe. If you try to join processes, it is likely that you will get deadlocks unless you are sure that all items which have been put on the queue have been consumed. User Multiprocess.Manager.Queue() instead.
  • Memory leaks (I don’t know why) - use Multiprocess.Manager.Queue() instead.

Even god made mistakes, please let me know what mistakes I have made.

  • Share/Save/Bookmark

Leave a Reply

preload preload preload