Differential equations arise in many fields of application, such as in the simulation of phenomena in chemistry, physics, mathematics, engineering , biology, medicine and so forth. The models are generally in the form of initial value problems (IVPs), which can be extremely costly to solve when they are stiff due to the requirement of working with implicit methods. It is widely believed that computationally intensive problems can be effectively solved using parallel computation. Several authors have designed and implemented parallel ODE solvers. Their works are mainly focused on constructionof new integration formula which accomodate parallelism. The common part of the algorithms of their work is the stepsize iteration uses nonlinear iteration to solve the nonlinear system, and the nonlinear iteration uses some linear solver. However, none of their works investigate the effect of parallelism inside the linear solver loop to the nonlinear solver loop and further to the stepsize loop, nor analyze the performance of linear system in terms of execution time. In this disertation we investigate parallel approaches on a semi-implicit Runge-Kutta method and analysize the performance in terms of speedup. The parallelization is performed in two levels: parallelization across the method in solving the two nonlinear systems simultaneously and parallelization across the system in solving the associated linear systems. We also construct an analytical model for the execution time of our linear solver and compare the measurement with the execution time computed using our model. THe experiment was performed on a cluster of PCs on PVM message passing environment. We observe that parallelization on the linear solver provides a good performance in terms of speedup. In addition, there is no significant difference between the execution time computed with our analytical models and the experimental execution time, and thus our model can be considered appropriate to extrapolate the performance of the linear system with the increasing problem size and system size. Since our system consistes of two decoupled blocks, the parallelization in the non-linear solver is applied at the block level if there are only two processors. If more processors are involved in the computation, the parallelization is also applied in solving the linear system. We observe that the speedup of the nonlinear solver for the Brusselator problem increases with the increasing number of the processors, whereas for the Dense probelem the speedup is bounded to two which is equal to the number of the decoupled blocks. In the ODE implementation, again we note that for the Brusselator problem a better performance can be achieved if more processors are available, whereas for the Dense probelm the maximum performance that can be achieved is equal to the number of the decoupled block. We conclude that the parallelization inside the linear system will contribute to a better performance for the Brusselator problem, but not for the Dense problem. In addition, we also observe that thespeedup of the linear solver is larger than the speedup of the nonlinear solver, which is turn is larger that the speedup of the ODE Solver. This is a reasonable fact since the amount of spequential work tends to increased toward the outer level and that sequential work contributes to performance degradation.
|
|