Parallel Computing

DM818, Autumn 2011

Daniel Merkle

DM818 Mandatory Assignment 2: Parallelize Particle Simulation

Overview

The purpose of this assignment is introduction to programming in shared and distributed memory models. (Particle Simulations will also be covered in the lecture).

Your goal is to parallelize a toy particle simulator (similar particle simulators are used in mechanics, biology, astronomy, etc.) that reproduces the behaviour shown in the following animation:

The range of interaction forces is limited as shown in grey for a selected particle. Density is set sufficiently low so that given n particles, only O(n) interactions are expected.

Suppose we have a code that runs in time T = O(n) on a single processor. Then we'd hope to run in time T/p when using p processors. We'd like you to write parallel codes that approach these expectations.

Source Code

You may start with the serial and parallel implementations supplied below. All of them run in O(n2) time, which is unacceptably inefficient.

serial.cpp
a serial implementation,
openmp.cpp
a shared memory parallel implementation done using OpenMP,
pthreads.cpp
a shared memory parallel implementation done using pthreads (if you prefer it over OpenMP),
mpi.cpp
a distributed memory parallel implementation done using MPI,
common.cpp, common.h
an implementation of common functionality, such as I/O, numerics and timing,
Makefile
a makefile that should work on all NERSC clusters if you uncomment appropriate lines,
job-franklin-serial, job-franklin-pthreads4, job-franklin-openmp4, job-franklin-mpi4,
job-hopper-serial, job-hopper-pthreads24, job-hopper-openmp24, job-hopper-mpi24
sample batch files to launch jobs on Franklin and Hopper. Use qsub to submit on Franklin and Hopper.
particles.tar
all above files in one tarball.

You are welcome to use any NERSC cluster in this assignment. If you wish to build it on other systems, you might need a custom implementation of pthread barrier, such as: pthread_barrier.c, pthread_barrier.h.

You should use the following visualization program to check the correctness of the result produced by your code: Linux/Mac version (requires SDL, compiles in the IMADA pool), Windows version.

Resources

  • Programming in shared and distributed memory models have been introduced in the Lectures in week 38 and 39, the slides are available in the Blackboard System.
  • Shared memory implementations may require using locks that are availabale as omp_lock_t in OpenMP (requires omp.h) and pthread_mutex_t in pthreads (requires pthread.h).
  • Alternatively, you may consider using atomic operations such as __fetch_and_swap in the IBM compiler (requires builtins.h) and __sync_lock_test_and_set in the newer versions of the GNU compiler.
  • Distributed memory implementation may benefit from overlapping communication and computation that is provided by nonblocking MPI routines such as MPI_Isend and MPI_Irecv.
  • Other useful resources: pthreads tutorial, OpenMP tutorial, OpenMP specifications and MPI specifications.

Report / Submission

The assignment can be done alone, or in groups of two people. The submission of the source code has to be supplemented by a write-up (max. 10 pages), one write-up for the whole group. The write-up should contain

  • The names of the people in your group.
  • A statement of who contributed with what.
  • A plot in log-log scale that shows that your serial and parallel codes run in O(n) time and a description of the data structures that you used to achieve it.
  • A description of the synchronization you used in the shared memory implementation.
  • A description of the communication you used in the distributed memory implementation.
  • A description of the design choices that you tried and how did they affect the performance.
  • Speedup plots that show how closely your parallel codes approach the idealized p-times speedup and a discussion on whether it is possible to do better.
  • Where does the time go? Consider breaking down the runtime into computation time, synchronization time and/or communication time. How do they scale with p?
  • A discussion on using pthreads, OpenMP and MPI for this assignment.

For submitting the report and the sources proceed as follows: Create the following directory structure one of IMADAs pool machines (of course you have to copy the data from the NERSC filesystem to the IMADA system before doing that, and you have to modify the directory names below if you use different systems than Hopper for OpenMP and pthreads implementation, and Franklin for the MPI implementation).

mandatory2/
mandatory2/report/
mandatory2/openmp-hopper/sources/
mandatory2/openmp-hopper/results/
mandatory2/pthreads-hopper/sources/
mandatory2/pthreads-hopper/results/
mandatory2/mpi-franklin/sources/
mandatory2/mpi-franklin/results/

Put your report, sources, and results in the corresponding directory. The source directories should include all files that are necessary to produce an executable code on the corresponing NERSC resources. Each of the three sources directories and each of the three results directories should also contain a README file that describes details. It basically has to be possible to copy the complete sources to the file system of the system of your choice, and to produce executable files via a simple make command. The report directory, the sources directories, and the results directories should not contain more than 10MB of data each. You should consider to include a webpage (please use the filename index.html) in the results directories that includes animations to show the correctness of your simulation for reasonably chosen number of particles.

After putting all the data to the correct location, change to the directory mandatory2 and type aflever DM818 your@email.com.

Additionally, you shall hand in a printout of your report and of your source code (10 pages limit applies only to the report, not to the source code). (Department secretaries office).

The strict deadline for submission (electronically and printouts) is Monday, November, 7th, 10:00am.

Frequently Asked Questions (FAQ)

  • Are we allowed to work in groups?
  • Yes, but you should clearly state who contributed with what in the report. The mandatory assignments are part of your final exam, i.e., you have to fully understand all parts of what you have done.