Muutke küpsiste eelistusi

E-raamat: Automatic Parallelization For A Class Of Regular Computations

(Univ Of Reading, Uk), (.)
  • Formaat: 272 pages
  • Ilmumisaeg: 04-Jan-1997
  • Kirjastus: World Scientific Publishing Co Pte Ltd
  • Keel: eng
  • ISBN-13: 9789814498418
Teised raamatud teemal:
  • Formaat - PDF+DRM
  • Hind: 32,76 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Raamatukogudele
  • Formaat: 272 pages
  • Ilmumisaeg: 04-Jan-1997
  • Kirjastus: World Scientific Publishing Co Pte Ltd
  • Keel: eng
  • ISBN-13: 9789814498418
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

The automatic generation of parallel code from high level sequential description is of key importance to the wide spread use of high performance machine architectures. This text considers (in detail) the theory and practical realization of automatic mapping of algorithms generated from systems of uniform recurrence equations (do-lccps) onto fixed size architectures with defined communication primitives. Experimental results of the mapping scheme and its implementation are given.
Terminology vii
1 Introduction
1(12)
1.1 Automatic Parallelization
1(2)
1.2 Processor Array Structures
3(2)
1.3 Nested Loops and Data Dependencies
5(3)
1.4 Space-Time Mapping
8(2)
1.5 About the Book
10(3)
2 Survey and Analysis of Partitioning and Mapping
13(14)
2.1 The Problems
13(1)
2.2 Independent Partitioning
14(2)
2.3 General Partitioning and Projecting For Space-Time Mapping
16(8)
2.3.1 Projecting-Partitioning Methods
17(5)
2.3.2 Partitioning-Projecting Methods
22(1)
2.3.3 Comparison
23(1)
2.4 Lower-Dimensional Mapping
24(2)
2.5 Summary
26(1)
3 Improvements to Some Existing Partitioning and Mapping Methods
27(20)
3.1 On the Maximal Independent Partitioning
27(6)
3.1.1 Introduction
27(1)
3.1.2 Improved Methods of Independent Partitioning
28(1)
3.1.3 Producing HSNF
29(2)
3.1.4 Algorithm Generation
31(2)
3.2 A Synthesis Method using LSGP Partitioning for Given-Shape Regular Arrays
33(10)
3.2.1 Introduction
33(1)
3.2.2 A New LSGP Synthesis Method
33(3)
3.2.3 The Conditions for Valid Q and t
36(4)
3.2.4 An Example
40(1)
3.2.5 A Particular Area of Application
41(2)
3.3 Bouncing LPGS Method
43(2)
3.3.1 Introduction
43(1)
3.3.2 Bouncing LPGS
44(1)
3.4 Summary
45(2)
4 A Methodology of Partitioning and Mapping for Given (N-1)-D Regular Arrays
47(28)
4.1 A New Methodology
47(2)
4.2 Transforming to Canonical Dependencies
49(6)
4.2.1 A Cone Including Dependencies
49(3)
4.2.2 Forming Canonical Dependencies in Supernode Space
52(3)
4.3 Space-Time Mapping
55(5)
4.3.1 S-T Transformation and Interconnection Primitives
55(1)
4.3.2 Choosing S and t
56(3)
4.3.3 Permuting the Space Projection Matrix
59(1)
4.4 Further LSGP Partitioning for SBC
60(1)
4.5 Scaling and Optimisation
61(7)
4.5.1 SUC Cases
63(1)
4.5.2 SBC Case
64(1)
4.5.3 Optimising
65(2)
4.5.4 Integralization of the Quasi-supernode Transformation Matrix
67(1)
4.6 Examples and Discussions
68(3)
4.6.1 A 3-D Example
68(3)
4.6.2 More Results and Discussions
71(1)
4.7 Summary
71(4)
5 Optimal Mapping onto Lower-Dimensional Regular Arrays
75(24)
5.1 A Methodology for Partitioning and Mapping onto Lower Dimensional Arrays
75(2)
5.2 The Transformations into K-D Time Domain and M-D Processor Array
77(2)
5.2.1 Selecting a Family of T
77(2)
5.2.2 Scaling the Supernode Parallelepiped
79(1)
5.3 Maximum Local K-D Time Domain
79(3)
5.3.1 Intersecting a Polyhedron with a Hyperplane
79(1)
5.3.2 Maximum Local Supernode Domain
80(1)
5.3.3 Mapping to K-D Time Domain
81(1)
5.4 Valid Minimum Projecting Vector
82(3)
5.5 A General Methodology
85(4)
5.5.1 The First and Last Nodes of Polyhedrons
85(1)
5.5.2 Deriving P
86(3)
5.6 Optimisation
89(4)
5.6.1 Method
89(1)
5.6.2 An Example
90(3)
5.7 Special Example: Partitioning and Mapping a Knapsack Problem onto a Linear Array
93(4)
5.7.1 Description of Computational Structure
93(2)
5.7.2 Supernode Polyhedron
95(1)
5.7.3 Transformation onto a Time-Processor Domain
95(2)
5.8 Summary
97(2)
6 The Structure of Parallel Programs
99(30)
6.1 Introduction
99(2)
6.2 On Supernodes
101(8)
6.2.1 Boundary of the Supernode Polyhedron
102(5)
6.2.2 Vertices of the Enlarged Quasi-Supernode Polyhedron
107(1)
6.2.3 Boundary of a Single Supernode
108(1)
6.3 Data Flows
109(2)
6.3.1 Outgoing Data of a Single Supernode
109(1)
6.3.2 Outgoing Data Packets of a Processor
110(1)
6.4 Algorithm Generation for a Lower-Dimensional Processor Array
111(4)
6.4.1 K-D Parallel Algorithms
111(1)
6.4.2 K-D Time Domain to 1-D Time Domain
112(3)
6.5 Algorithm Generation For LSGP Case
115(7)
6.5.1 Rectangular Boundary in Virtual Processor-Time Domain
116(1)
6.5.2 Algorithm Generation Involving LSGP
117(2)
6.5.3 Outgoing Data after LSGP
119(1)
6.5.4 Example
120(2)
6.5.5 Algorithm Generation for (N-1)-D SUC Case
122(1)
6.6 From Inequalities To Boundaries
122(4)
6.6.1 Finding All Possible Lower and Upper Bounds
123(1)
6.2.2 Deleting Redundant Bounds
124(2)
6.7 Summary
126(3)
7 Parallel Code Generation
129(22)
7.1 Introduction
129(1)
7.2 Processor Array
129(2)
7.3 Supernode Storage
131(4)
7.3.1 From Supernode Domain to Local Supernode Domains
132(1)
7.3.2 Supernode Storage and in/out Data Storage
133(2)
7.4 Data Flow and Relay
135(8)
7.4.1 LSGP Case
136(5)
7.4.2 Non LSGP Case and Direct Data Flows
141(2)
7.5 Outline of the Parallel Code
143(7)
7.5.1 Parallel Codes for Non-LSGP
144(4)
7.5.2 Parallel Codes with LSGP
148(2)
7.6 Summary
150(1)
8 Experimental Results and Discussions
151(12)
8.1 Experimental Results
151(6)
8.1.1 Test of the Correctness
151(1)
8.1.2 Measuring Performance
152(5)
8.2 Discussions on Factors Affecting Efficiency
157(5)
8.2.1 Space and Time Fitness of the Mapping
157(3)
8.2.2 Data Communication
160(1)
8.2.3 Single Supernode Loops
161(1)
8.2.4 The Effect of Granularity
161(1)
8.3 Summary
162(1)
9 Conclusion
163(6)
9.1 Summary of the Whole Design Procedure
163(1)
9.2 Evaluations and Comparisons
164(3)
9.2.1 Theory
164(2)
9.2.2 Practice
166(1)
9.2.3 Results
167(1)
9.3 Closing Remarks
167(2)
Bibliography
169(80)
A Scaling SBC
177(6)
A.1 Initial Wu'iS and Wl'iS
177(2)
A.2 SF as a Function of fk
179(1)
A.2.1 Determining fa with a Known Set of Wui and Wli
179(1)
A.2.2 Determining the Truning Points
179(1)
A.2.3 Selecting Wui and wli from Multi-Candidates
180(1)
A.3 Delimit fk According to Dependencies
181(2)
B Collection of Algorithms for (N-1)-D Partitioning and Mapping
183(4)
B.1 Pre-Compilation Work
183(1)
B.2 Compiling Work
184(3)
C Collection of Algorithms for Lower-Dimensional Mapping
187(4)
D Parallel Algorithm for Pure LSGP Method
191(2)
E Generating Data Flows and Relays for LSGP
193(4)
F The Collections of Experimental Results
197(28)
G Examples of Automatically Generated Parallel Codes
225(24)
G.1 Parallel Codes for Non-LSGP
225(1)
G.1.1 h.file
225(5)
G.1.2 Parallel Code
230(7)
G.2 Parallel Codes for LSGP
237(1)
G.2.1 h.file
237(4)
G.2.2 Parallel Codes
241(8)
Index 249