ABSTRACT
Parallel processing has been proposed as a means of improving network protocol throughput. Several different strategies have been taken towards parallelizing protocols. A relatively popular approach is packet-level parallelism, where packets are distributed across processors.
This paper provides an experimental performance study of packet-level parallelism on a contemporary shared- memory multiprocessor. We examine several unexplored areas in packet-level parallelism and investigate how vari- ous protocol structuring and implementation techniques can affect performance. We study TCP/IP and UDP/IP protocol stacks, implemented with a parallel version of the x-kernel running in user space on Silicon Graphics multiprocessors.
Our results show that only limited packet-level paral- lelism can be achieved within a single connection under TCP, but that using multiple connections can improve avail- able parallelism. We also demonstrate that packet ordering plays a key role in determining single-connection TCP per- formance, that careful use of locks is a necessity, and that selective exploitation of caching can improve throughput. We also describe experiments that compare parallel proto- col performance on two generations of a parallel machine and show how computer architectural trends can influence performance.
|