Parallel Python Interactive parallel computing Research Computing @ CU Boulder Monday, April 23, 12 Why Python? • Relatively easy to learn • Interactive • Cross platform • Linux • Windows • OS/X • Large number of packages • Numpy – for numerical computation • Matplotlib - plotting • Ipython – interactive shell Research Computing @ CU Boulder Monday, April 23, 12 IPython • Supports different styles of parallelism • • • • • • Single program, multiple data (SPMD) parallelism Multiple program, multiple data (MPMD) parallelism Message passing using MPI Task farming Combinations of above approaches Custom user defined approaches • Interactive approach to • • • • Development Execution Debugging Monitoring Research Computing @ CU Boulder Monday, April 23, 12 Overview • How it works • Direct interface • Executing remote commands • Data management • Blocking, non-blocking • Task interface • Compare with pbsdsh • Simple, robust, and load-balanced Research Computing @ CU Boulder Monday, April 23, 12 How it works Research Computing @ CU Boulder Monday, April 23, 12 Architecture • Engine • Takes Python commands over a network connection • One engine per core • Blocks code: controller has an asynchronous API • Controller client • • • • Interface for working with a set of engines Hub Collection of schedulers View objects represent a subset of engines • A Direct interface, where engines are addressed explicitly. • A LoadBalanced interface, where the Scheduler is trusted with assigning work to appropriate engines. Research Computing @ CU Boulder Monday, April 23, 12 Parallel Architecture • Hub • keeps track of engine connections, schedulers, clients • Cluster state • Scheduler • All actions that can be performed on the engine go through a scheduler Research Computing @ CU Boulder Monday, April 23, 12 Start your Engines • Simplest method • ipcluster start --n=4 • MPIEXEC/MPIRUN mode • SSH mode • PBS mode for batch processing • Other methods • Ipcontroller • ipengine • Don’t forget to turn off your engine! Research Computing @ CU Boulder Monday, April 23, 12 JANUS • use .epd-7.1-2 • use .openmpi-1.4.3_gcc-4.5.2_torque-2.5.8_ib • ipython profile create --parallel --profile=mpi Genera&ng default config file: u'/home/molu8455/.ipython/profile_mpi/ipython_config.py' Genera&ng default config file: u'/home/molu8455/.ipython/profile_mpi/ipython_qtconsole_config.py' Genera&ng default config file: u'/home/molu8455/.ipython/profile_mpi/ipcontroller_config.py' Genera&ng default config file: u'/home/molu8455/.ipython/profile_mpi/ipengine_config.py' Genera&ng default config file: u'/home/molu8455/.ipython/profile_mpi/ipcluster_config.py' Genera&ng default config file: u'/home/molu8455/.ipython/profile_mpi/iplogger_config.py' • Edit the file ipcluster_config.py • c.IPClusterEngines.engine_launcher_class = 'MPIEngineSetLauncher' Research Computing @ CU Boulder Monday, April 23, 12 Steps • Request a resource with qsub: • Interactive • Batch • Start the engines • ipcluster start --profile=mpi & • Parallel python work ... • Stop the engines • ipcluster stop --profile=mpi Research Computing @ CU Boulder Monday, April 23, 12 Example #PBS -N example #PBS -j oe #PBS -l walltime=00:45:00 #PBS -l nodes=2:ppn=12 #PBS -q janus-debug . /curc/tools/utils/dkinit reuse .epd-7.1-2 reuse .openmpi-1.4.3_intel-12.0_ib cd $PBS_O_WORKDIR ipcluster start --profile=mpi & sleep 90 python work.py ipcluster stop --profile=mpi Research Computing @ CU Boulder Monday, April 23, 12 A few simple concepts • The client: lightweight handle on all engines of a cluster from IPython.parallel import Client rc = Client(profile='mpi') • The views: “slice” the client with specific execution semantics • DirectView: direct execution on all engines dview = rc.direct_view() • LoadBalancedView: run on any one engine lview = rc.load_balanced_view() Research Computing @ CU Boulder Monday, April 23, 12 Direct Interface Research Computing @ CU Boulder Monday, April 23, 12 Direct Interface • capabilities of each engine are directly and explicitly exposed to the user • Blocking and non-blocking • AsyncResult • Remote execution • apply • execute • map • Remote function decorators • Managing data • push, pull • scatter, gather Research Computing @ CU Boulder Monday, April 23, 12 Blocking vs. Non-blocking • Blocking • waits until all engines are done executing the command dview.block=True • Non-blocking • returns an AsyncResult immediately • Get data later with get method • Check the status with ready • Wait dview.block=False Research Computing @ CU Boulder Monday, April 23, 12 Apply Example • Serial • Parallel import os pid = os.getpid() print pid with dview.sync_imports(): import os dview.block=True pids = dview.apply(os.getpid) dview.block=False ar = dview.apply(os.getpid) dview.wait(ar) pids = ar.get() Research Computing @ CU Boulder Monday, April 23, 12 Apply • For convenience • apply_sync (block=True) • apply_async (block=False) def wait(t): import time tic = time.time() time.sleep(t) return time.time()-tic ar = dview.apply_async(wait,1) if ar.ready() == True: print "ready" else: print "not-ready" dview.wait(ar) Research Computing @ CU Boulder Monday, April 23, 12 Execute • Commands can be executed as strings on specific engines dview['a'] = 10 dview['b'] = 20 dview.execute('c=a*b') rc[1].execute('c=a+b') print dview['c'] [200, 30, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200] Research Computing @ CU Boulder Monday, April 23, 12 Map • Applies a sequence of to a function element-by-element • Does not do dynamic load balancing • map_sync, map_async def fsquare(x): return x**2 xvals = range(32) s_result = map(fsquare, xvals dview.block=True p_result = dview.map(fsquare, xvals) Research Computing @ CU Boulder Monday, April 23, 12 Push and Pull • Limited to dictionary types dview.block=True values = dict(a=1.03234,b=3453) dview.push(values) a_values = dview.pull('a') print a_values dview.push(dict(c='speed')) print dview.pull('c') dview['c'] = 'speed' print dview.pull('c') Research Computing @ CU Boulder Monday, April 23, 12 Scatter and Gather • Partition a sequence and push/pull dview.scatter('a',range(24)) print dview['a'] all_a = dview.gather('a') print all_a # Use dview.scatter('x',range(64)) dview.execute('y=map(fsquare, x)') y = dview.gather('y') print y Research Computing @ CU Boulder Monday, April 23, 12 Task Interface Research Computing @ CU Boulder Monday, April 23, 12 Task interface • Presents the engines as a fault tolerant, dynamic loadbalanced system of workers • No direct access to individual engines • Simple and powerful lview = rc.load_balanced_view() • Components: • map • Parallel function decoders • Dependencies • functional • graph Research Computing @ CU Boulder Monday, April 23, 12 Example: parameter study #!/bin/bash #PBS -N example_mn_1 #PBS -q janus-debug #PBS -l walltime=0:01:30 #PBS -l nodes=2:ppn=12 cd $PBS_O_WORKDIR pbsdsh $PBS_O_WORKDIR/ wrapper.sh 0 Research Computing @ CU Boulder Monday, April 23, 12 Example: parameter study #!/bin/bash #PBS -N example_mn_1 #PBS -q janus-debug #PBS -l walltime=0:01:30 #PBS -l nodes=2:ppn=12 cd $PBS_O_WORKDIR pbsdsh $PBS_O_WORKDIR/ wrapper.sh 0 #!/bin/bash PATH=$PBS_O_WORKDIR:$PBS_O_PATH TRIAL=$(($PBS_VNODENUM + $1)) simulator $TRIAL > $PBS_O_WORKDIR/sim.$TRIAL Research Computing @ CU Boulder Monday, April 23, 12 #!/bin/bash #PBS -N example_mn_3 #PBS -q janus-debug #PBS -l walltime=0:01:30 #PBS -l nodes=2:ppn=12 cd $PBS_O_WORKDIR count=0 np=`wc -l < $PBS_NODEFILE` for i in {1..5} do pbsdsh $PBS_O_WORKDIR/ wrapper.sh $count count=$(( $count + $np)) done Research Computing @ CU Boulder Monday, April 23, 12 Problems • Core failure to run • Load-balance Research Computing @ CU Boulder Monday, April 23, 12 Weaknesses of batch time Monday, April 23, 12 time Load balancing time Monday, April 23, 12 time simulator.py def simulator(x): import subprocess cmd = "./simulator " + str(x) + " 2 4 > trial." + str(x) p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE) (err,out)=p.communicate() status = p.returncode return "%s, %s" % (out,err) print simulator(1) from IPython.parallel import Client rc = Client(profile='mpi') lview = rc.load_balanced_view() r = lview.map_async(simulator, range(100)) print r.get() Research Computing @ CU Boulder Monday, April 23, 12 #!/bin/bash #PBS -N example_mn_3 #PBS -q janus-debug #PBS -l walltime=0:01:30 #PBS -l nodes=2:ppn=12 cd $PBS_O_WORKDIR ipcluster start --profile=mpi & sleep 90 python simulator.py ipcluster stop --profile=mpi Research Computing @ CU Boulder Monday, April 23, 12 Interactive MPI • Create MPI functions def psum(a): s = np.sum(a) rcvBuf = np.array(0.0,'d') MPI.COMM_WORLD.Allreduce([s, MPI.DOUBLE], [rcvBuf, MPI.DOUBLE], op=MPI.SUM) return rcvBuf • Call interactively from direct view • Not currently available on JANUS as a dotkit Research Computing @ CU Boulder Monday, April 23, 12 Conclusions • Direct interface • Quickly parallelize python code • Less control that MPI • Task interface • Fault-tolerant, load-balanced, simple • MPI interactive • Great for developing, debugging • Tutorials: • http://ipython.org/ipython-doc/dev • http://minrk.github.com/scipy-tutorial-2011 Research Computing @ CU Boulder Monday, April 23, 12
© Copyright 2025