-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathreadme.txt
147 lines (136 loc) · 7.76 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
FFTE: A Fast Fourier Transform Package
Description:
A package to compute Discrete Fourier Transforms of
1-, 2- and 3- dimensional sequences of length (2^p)*(3^q)*(5^r).
Files:
cuzfft1d.f : 1-D complex FFT routine (for NVIDIA GPUs)
cuzfft2d.f : 2-D complex FFT routine (for NVIDIA GPUs)
cuzfft3d.f : 3-D complex FFT routine (for NVIDIA GPUs)
cuztrans.f : Transposition routine (for NVIDIA GPUs)
dzfft2d.f : 2-D real-to-complex FFT routine
dzfft3d.f : 3-D real-to-complex FFT routine
factor.f : Factorization routine
fft235.f : Radix-2,3,4,5 and 8 FFT routine
kernel.f : Radix-2,3,4,5 and 8 FFT kernel routine
mfft235.f : Radix-2,3,4,5 and 8 multiple FFT routine
param.h : Header file for parameters
readme.txt : Readme file
vzfft1d.f : 1-D complex FFT routine (for vector machines)
vzfft2d.f : 2-D complex FFT routine (for vector machines)
vzfft3d.f : 3-D complex FFT routine (for vector machines)
zdfft2d.f : 2-D complex-to-real FFT routine
zdfft3d.f : 3-D complex-to-real FFT routine
zfft1d.f : 1-D complex FFT routine
zfft2d.f : 2-D complex FFT routine
zfft3d.f : 3-D complex FFT routine
tests/ : Test Directory
Makefile : Makefile for test programs
Makefile.cuda : Makefile for test programs (for NVIDIA GPUs)
Makefile.vec : Makefile for test programs (for vector machines)
cuspeed1d.f : Speed test program for zfft1d (for NVIDIA GPUs)
cuspeed2d.f : Speed test program for zfft2d (for NVIDIA GPUs)
cuspeed3d.f : Speed test program for zfft3d (for NVIDIA GPUs)
cutest1d.f : Test program for zfft1d (for NVIDIA GPUs)
cutest2d.f : Test program for zfft2d (for NVIDIA GPUs)
cutest3d.f : Test program for zfft3d (for NVIDIA GPUs)
rspeed2d.f : Speed test program for dzfft2d
rspeed3d.f : Speed test program for dzfft3d
rtest2d.f : Test program for dzfft2d and zdfft2d
rtest3d.f : Test program for dzfft3d and zdfft3d
speed1d.f : Speed test program for zfft1d
speed2d.f : Speed test program for zfft2d
speed3d.f : Speed test program for zfft3d
test1d.f : Test program for zfft1d
test2d.f : Test program for zfft2d
test3d.f : Test program for zfft3d
mpi/ : MPI version Directory
pcuzfft1d.f : Parallel 1-D complex FFT routine (for NVIDIA GPUs)
pcuzfft2d.f : Parallel 2-D complex FFT routine (for NVIDIA GPUs)
pcuzfft3d.f : Parallel 3-D complex FFT routine (for NVIDIA GPUs)
pcuzfft3dv.f : Parallel 3-D complex FFT routine (with 2-D decomposition, for NVIDIA GPUs)
pdzfft2d.f : Parallel 2-D real-to-complex FFT routine
pdzfft3d.f : Parallel 3-D real-to-complex FFT routine
pdzfft3dv.f : Parallel 3-D real-to-complex FFT routine (with 2-D decomposition)
pfactor.f : Parallel factorization routine
pvzfft1d.f : Parallel 1-D complex FFT routine (for vector machines)
pvzfft2d.f : Parallel 2-D complex FFT routine (for vector machines)
pvzfft3d.f : Parallel 3-D complex FFT routine (for vector machines)
pzdfft2d.f : Parallel 2-D complex-to-real FFT routine
pzdfft3d.f : Parallel 3-D complex-to-real FFT routine
pzdfft3dv.f : Parallel 3-D complex-to-real FFT routine (with 2-D decomposition)
pzfft1d.f : Parallel 1-D complex FFT routine
pzfft2d.f : Parallel 2-D complex FFT routine
pzfft3d.f : Parallel 3-D complex FFT routine
pzfft3dv.f : Parallel 3-D complex FFT routine (with 2-D decomposition)
tests/ : Test Directory
Makefile : Makefile for test programs
Makefile.cuda : Makefile for test programs (for NVIDIA GPUs)
Makefile.vec : Makefile for test programs (for vector machines)
pcuspeed1d.f : Speed test program for pzfft1d (for NVIDIA GPUs)
pcuspeed2d.f : Speed test program for pzfft2d (for NVIDIA GPUs)
pcuspeed3d.f : Speed test program for pzfft3d (for NVIDIA GPUs)
pcuspeed3dv.f : Speed test program for pzfft3dv (for NVIDIA GPUs)
pcutest1d.f : Test program for pzfft1d (for NVIDIA GPUs)
pcutest2d.f : Test program for pzfft2d (for NVIDIA GPUs)
pcutest3d.f : Test program for pzfft3d (for NVIDIA GPUs)
pcutest3dv.f : Test program for pzfft3dv (for NVIDIA GPUs)
prspeed2d.f : Speed test program for pdzfft2d
prspeed3d.f : Speed test program for pdzfft3d
prspeed3dv.f : Speed test program for pdzfft3dv
prtest2d.f : Test program for pdzfft2d and pzdfft2d
prtest3d.f : Test program for pdzfft3d and pzdfft3d
prtest3dv.f : Test program for pdzfft3dv and pzdfft3dv
pspeed1d.f : Speed test program for pzfft1d
pspeed2d.f : Speed test program for pzfft2d
pspeed3d.f : Speed test program for pzfft3d
pspeed3dv.f : Speed test program for pzfft3dv
ptest1d.f : Test program for pzfft1d
ptest2d.f : Test program for pzfft2d
ptest3d.f : Test program for pzfft3d
ptest3dv.f : Test program for pzfft3dv
References:
1. Daisuke Takahashi: A Blocking Algorithm for FFT on Cache-Based
Processors, Proc. 9th International Conference on High
Performance Computing and Networking Europe (HPCN Europe 2001),
Lecture Notes in Computer Science, Vol. 2110, pp. 551-554,
Springer (2001).
2. Daisuke Takahashi: A Blocking Algorithm for Parallel 1-D FFT on
Shared-Memory Parallel Computers, Proc. 6th International
Conference on Applied Parallel Computing (PARA 2002),
Lecture Notes in Computer Science, Vol. 2367, pp. 380-389,
Springer (2002).
3. Daisuke Takahashi: Efficient implementation of parallel
three-dimensional FFT on clusters of PCs, Computer Physics
Communications, Vol. 152, pp. 144-150, (2003).
4. Daisuke Takahashi: A parallel 1-D FFT algorithm for the Hitachi
SR8000, Parallel Computing, Vol. 29, pp. 679-690 (2003).
5. Daisuke Takahashi: A Hybrid MPI/OpenMP Implementation of a
Parallel 3-D FFT on SMP Clusters, Proc. 6th International
Conference on Parallel Processing and Applied Mathematics
(PPAM 2005), Lecture Notes in Computer Science, Vol. 3911,
pp. 970-977, Springer (2006).
6. Daisuke Takahashi: An Implementation of Parallel 3-D FFT with
2-D Decomposition on a Massively Parallel Cluster of Multi-core
Processors, Proc. 8th International Conference on Parallel
Processing and Applied Mathematics (PPAM 2009), Part I,
Workshop on Memory Issues on Multi- and Manycore Platforms,
Lecture Notes in Computer Science, Vol. 6067, pp. 606-614,
Springer (2010).
7. Daisuke Takahashi: Implementation of Parallel 1-D FFT on GPU
Clusters, Proc. 2013 IEEE 16th International Conference on
Computational Science and Engineering (CSE 2013), pp. 174-180
(2013).
8. Daisuke Takahashi: Implementation of Parallel 3-D Real FFT with
2-D Decomposition on Intel Xeon Phi Clusters, Proc. 13th
International Conference on Parallel Processing and Applied
Mathematics (PPAM 2019), Part I, Lecture Notes in Computer
Science, Vol. 12043, pp. 151-161, Springer (2020).
Copyright:
Copyright(C), 2000-2004, 2008-2014, 2020, Daisuke Takahashi
Center for Computational Sciences
University of Tsukuba
1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan
e-mail: [email protected]
You may use, copy, modify this code for any purpose (include
commercial use) and without fee.
You may distribute this ORIGINAL package.