The Fast Fourier Trasform (FFT) is a challenging algorithm to implement efficiently on a parallel computer. Recent algorithm advances have led to greatly improved FFT performance on parallel vector cmputers such as the CRAY-2 and CRAY Y-MP . Variations on these techniques can be used to extend this improved performance to other parallel architectures. A simple evalution seveals that eht per data word, or computer itensity. This high compute intensity is lost when the FFT computation is reduced to simple vector operations. Viewing the algorithm from a high level and exploiting compute itensity is the key to archieving high perrformance on parallel such as the CRAY PAP. This paper descrives how high compute intensity programming techniques combined with algoriths in the literature can result in efficient single and multidimensional FFTs on large numbers of processors on the CRAY APP. the CRAY APP is a shared -memory parallel computer based on the interl i860 microprocessor, It incorporates up to 84 i860s in an architecture which allows for very efficient gang scheduling and barrier synchronization. FFT performance figures for various data set sizes and processor configurations are included