Parallel computation: Difference between revisions
imported>Larry Sanger No edit summary |
imported>Niek Sanders (Some incomplete, rough, initial content) |
||
Line 1: | Line 1: | ||
In parallel computation, multiple code instructions are executed at the same time. This parallelism can range from coarse to fine grained. An example of coarse parallelism is a ray tracer rendering a separate output pixel on each available CPU. An example of fine grained parallelism is found in multiplying an n-dimensional vector by a scalar. The vector is cut up and a chunk is distributed to each CPU, the multiplication is done on each chunk, and the result is reassembled. | In parallel computation, multiple code instructions are executed at the same time. This parallelism can range from coarse to fine grained. An example of coarse parallelism is a ray tracer rendering a separate output pixel on each available CPU. An example of fine grained parallelism is found in multiplying an n-dimensional vector by a scalar. The vector is cut up and a chunk is distributed to each CPU, the multiplication is done on each chunk, and the result is reassembled. | ||
These examples illustrate key limitations to parallel computing. First, there is the overhead of distributing and reassembling the computation. Second, there is the challenge of developing parallel algorithms. In the ideal case, such as in a ray tracer, elements of the problem (pixels to be rendered) do not depend on one another. That is, the computing one output pixel does not require information or affect any of the other output pixels. These uncoupled problems are referred to as Embarrassingly Parallel | These examples illustrate key limitations to parallel computing. First, there is the overhead of distributing and reassembling the computation. Second, there is the challenge of developing parallel algorithms. In the ideal case, such as in a ray tracer, elements of the problem (pixels to be rendered) do not depend on one another. That is, the computing one output pixel does not require information or affect any of the other output pixels. These uncoupled problems are referred to as Embarrassingly Parallel. | ||
Parallel algorithms and programs are significantly more difficult to write and debug. However, as multicore processors (many armed with SIMD commands) become increasingly common, parallel computation is finding its way in to more mass-consumer code. (e.g. mp3 encoders using SIMD, photoshop multiprocessor support, etc). | Parallel algorithms and programs are significantly more difficult to write and debug. However, as multicore processors (many armed with SIMD commands) become increasingly common, parallel computation is finding its way in to more mass-consumer code. (e.g. mp3 encoders using SIMD, photoshop multiprocessor support, etc). | ||
Line 13: | Line 13: | ||
* Test and Set (TAS) | * Test and Set (TAS) | ||
* Test and Swap (for Lockfree programming) | * Test and Swap (for Lockfree programming) | ||
===Programming Language Support=== | ===Programming Language Support=== | ||
Line 22: | Line 22: | ||
* Pure functional languages | * Pure functional languages | ||
===Library Support=== | ===Library Support=== | ||
Line 39: | Line 35: | ||
*# Well suited to cluster computing. | *# Well suited to cluster computing. | ||
*# Major library is LAM/MPI. Some vendors (e.g. Sun) also roll their own, optimized versions. | *# Major library is LAM/MPI. Some vendors (e.g. Sun) also roll their own, optimized versions. | ||
Revision as of 17:21, 26 March 2007
In parallel computation, multiple code instructions are executed at the same time. This parallelism can range from coarse to fine grained. An example of coarse parallelism is a ray tracer rendering a separate output pixel on each available CPU. An example of fine grained parallelism is found in multiplying an n-dimensional vector by a scalar. The vector is cut up and a chunk is distributed to each CPU, the multiplication is done on each chunk, and the result is reassembled.
These examples illustrate key limitations to parallel computing. First, there is the overhead of distributing and reassembling the computation. Second, there is the challenge of developing parallel algorithms. In the ideal case, such as in a ray tracer, elements of the problem (pixels to be rendered) do not depend on one another. That is, the computing one output pixel does not require information or affect any of the other output pixels. These uncoupled problems are referred to as Embarrassingly Parallel.
Parallel algorithms and programs are significantly more difficult to write and debug. However, as multicore processors (many armed with SIMD commands) become increasingly common, parallel computation is finding its way in to more mass-consumer code. (e.g. mp3 encoders using SIMD, photoshop multiprocessor support, etc).
Note that it is generally possible to run parallel code on a single processor. Multiple processors are simulated through context switching. When running on a single processor, parallel algorithms typically have worse performance than their serial analogues, mainly because of the overhead of the distribution/communication logic.
Hardware Support
Typically the CPU must offer some atomic operations to allow for parallel code to be written. There is also a famous algorithm which takes advantage of memory interlock to allow for mutual exclusion.
- Test and Set (TAS)
- Test and Swap (for Lockfree programming)
Programming Language Support
- Semaphores
- Mutexes
- Monitors
- Special purpose languages for parallel computation (Occam)
- Pure functional languages
Library Support
- OpenMP
- Open standard
- Can be used to implement fine grained parallelism
- Built in support in many commercial compilers (Intel C/C++/FORTRAN, GCCv4, Sun CC)
- Uses meta-code to annotate which parts of a C or FORTRAN program can be parallelized.
- In C this is accomplished through the use of preprocessor pragmas
- Pragmas indicate things like "execute loop in parallel" or this variable has no dependency between iterations.
- Message Passing Interface (MPI)
- Well suited to cluster computing.
- Major library is LAM/MPI. Some vendors (e.g. Sun) also roll their own, optimized versions.