DM818 Mandatory Assignment 2: Parallelize Particle Simulation
The purpose of this assignment is introduction to programming in shared and distributed memory models. (Particle Simulations will also be covered in the lecture).
The range of interaction forces is limited as shown in grey for a selected particle. Density is set sufficiently low so that given n particles, only O(n) interactions are expected.
Suppose we have a code that runs in time T = O(n) on a single processor. Then we'd hope to run in time T/p when using p processors. We'd like you to write parallel codes that approach these expectations.
You may start with the serial and parallel implementations supplied below. All of them run in O(n2) time, which is unacceptably inefficient.
You are welcome to use any NERSC cluster in this assignment. If you wish to build it on other systems, you might need a custom implementation of pthread barrier, such as: pthread_barrier.c, pthread_barrier.h.
- Programming in shared and distributed memory models have been introduced in the Lectures in week 38 and 39, the slides are available in the Blackboard System.
- Shared memory implementations may require using locks that are availabale as omp_lock_t in OpenMP (requires omp.h) and pthread_mutex_t in pthreads (requires pthread.h).
- Alternatively, you may consider using atomic operations such as __fetch_and_swap in the IBM compiler (requires builtins.h) and __sync_lock_test_and_set in the newer versions of the GNU compiler.
- Distributed memory implementation may benefit from overlapping communication and computation that is provided by nonblocking MPI routines such as MPI_Isend and MPI_Irecv.
- Other useful resources: pthreads tutorial, OpenMP tutorial, OpenMP specifications and MPI specifications.
Report / Submission
The assignment can be done alone, or in groups of two people. The submission of the source code has to be supplemented by a write-up (max. 10 pages), one write-up for the whole group. The write-up should contain
- The names of the people in your group.
- A statement of who contributed with what.
- A plot in log-log scale that shows that your serial and parallel codes run in O(n) time and a description of the data structures that you used to achieve it.
- A description of the synchronization you used in the shared memory implementation.
- A description of the communication you used in the distributed memory implementation.
- A description of the design choices that you tried and how did they affect the performance.
- Speedup plots that show how closely your parallel codes approach the idealized p-times speedup and a discussion on whether it is possible to do better.
- Where does the time go? Consider breaking down the runtime into computation time, synchronization time and/or communication time. How do they scale with p?
- A discussion on using pthreads, OpenMP and MPI for this assignment.
For submitting the report and the sources proceed as follows: Create the following directory structure one of IMADAs pool machines (of course you have to copy the data from the NERSC filesystem to the IMADA system before doing that, and you have to modify the directory names below if you use different systems than Hopper for OpenMP and pthreads implementation, and Franklin for the MPI implementation).
mandatory2/ mandatory2/report/ mandatory2/openmp-hopper/sources/ mandatory2/openmp-hopper/results/ mandatory2/pthreads-hopper/sources/ mandatory2/pthreads-hopper/results/ mandatory2/mpi-franklin/sources/ mandatory2/mpi-franklin/results/
Put your report, sources, and results in the corresponding
directory. The source directories should include all files that are
necessary to produce an executable code on the corresponing NERSC
resources. Each of the three
sources directories and each
of the three
results directories should also contain
README file that describes details. It basically has to
be possible to copy the complete sources to the file system of the
system of your choice, and to produce executable files via a
make command. The
sources directories, and the
directories should not contain more than 10MB of data each. You should consider to
include a webpage (please use the filename
index.html) in the
directories that includes animations to show the correctness of your simulation for reasonably chosen
number of particles.
After putting all the data to the correct location, change to the directory
mandatory2 and type
aflever DM818 firstname.lastname@example.org.
Additionally, you shall hand in a printout of your report and of your source code (10 pages limit applies only to the report, not to the source code). (Department secretaries office).
The strict deadline for submission (electronically and printouts) is Monday, November, 7th, 10:00am.
Frequently Asked Questions (FAQ)
- Are we allowed to work in groups?
Yes, but you should clearly state who contributed with what in the report. The mandatory assignments are part of your final exam, i.e., you have to fully understand all parts of what you have done.