DEVELOPMENT OF A PARALLEL 3D FINITE ELEMENT POWER SEMICONDUCTOR DEVICE SIMULATOR
A.R. Brown, A. Asenov, S. Roy and J.R. Barker
Introduction Due to the complicated behaviour of power semiconductor devices, numerical modelling has become an important and widely accepted approach for device analysis and optimisation1,2. The design of appropriate simulation tools however is confronted with several intricacies that are not usually found in conventional simulators. Complicated physical phenomena like carrier-carrier scattering, lifetime engineering and surface recombination govern the behaviour of the power devices 3 . High density power dissipation requires inclusion of hear flow in the simulations4 . The circuit environment has drastic influence on the switching behaviour5 and should be introduced through more complicated boundary conditions or mixed type simulations. The cellular structure of most power devices requires a 3D solution of the basic semiconductor equations. The octagonal or hexagonal shape of typical power MOSFET, IGBT or MCT cells6 and their complex doping distributions require a finite element (FE) discretization. In many cases more than one cell should be included in the simulations in order to obtain an adequate description of the device behaviour. The size and the computational complexity of the problem make it a distinguished candidate for massively parallel implementation. Attempts to port existing serial FE device simulators on massively parallel systems often result in a disappointingly low efficiency. Only recently has a more systematic approach been applied to the design of parallel device simulation codes7,8. To achieve better results the design of the parallel simulation software should reflect the architecture of the parallel platforms. Here we report on the development of a parallel, scalable and portable 3D finite element power semiconductor device simulator. The simulator is implemented on a Parsytec Supercluster Model 64 which is a medium class 64 transputer MIMD machine and on a four node Power PC based Parsytec X-plorer both in 3L and Parix Fortran. It is based on a spatial decomposition of the simulation domain over an array of processors9. This approach minimises the interprocessor communications by reducing the ratio between the bulk and the surface of the partition subdomains. The emphasis is placed on the generation of topologically rectangular FE grids amenable to the domain decomposition approach, on the optimised parallel generation and assembly of the discretization matrices, and on the development of suitable, scalable linear solvers. The advantages of the 3D simulation are highlighted in comparison with 2D simulation results. Spatial Device Decomposition The numerical simulation of power semiconductor devices is based on the simultaneous solution of the basic set of semiconductor equations including the Poisson equation and the current continuity equations for electron and holes in a drift-diffusion approximation. The heat flow equation should be attached to the system to account for the thermal effects. Analytical or table look-up models represent the electron and hole mobilities, the generation-recombination term and the thermal conductivity. The set of partial differential equations is solved in a solution domain representing realistically the structure of the simulated device with appropriate boundary conditions. Our parallel power semiconductor device simulator is designed for Multiple Instructions Multiple Data (MIMD) parallel computers with distributed memory. It is based on a spatial device decomposition9. In this approach the solution domain is physically divided into subdomains, each one placed on a different processor. There is one boundary layer overlap between the subdomains on adjacent processors necessary for the parallel discretization and the proper functioning of the parallel linear solvers. A.R. Brown, A. Asenov, S. Roy and J.R. Barker are with the Department of Electronics and Electrical Engineering, University of Glasgow, Glasgow G12 8QQ.
To enhance the portability of the simulator, the code is split into two distinct parts: a hardware dependent communications harness and the simulation engine. The communications harness provides all communications between the processors necessary for the proper operation of the simulation engine. Two types of communications are necessary: global communications and local communications. The global communications are used at the beginning of the simulation to distribute the portions of the solution domain among the processors and to initiate the simulation. During the simulation they provide the necessary synchronisation and are part of the linear solver. At the end of the simulation the global communications are used to collect the portions of the solution from different processors in order to save or to display the results of the simulation. The local communications exchange the overlapping boundaries between the adjacent solution subdomains. Boundary exchange is necessary at the beginning of each new iteration and is also part of our linear solver. The simulation engine is designed to operate on an arbitrary 1D, 2D or 3D array of processors including a single processor. This means that the solvers are virtually independent of the processors topology if the necessary global and local communications are provided.
Cathode contact
Gate contact
Oxide layer
n-type regions
p-type regions
Figure 1. Theoretical speed-up of a hypothetical 3D linear solver based on a 1D, 2D and 3D array of mesh connected processors
Figure 2. Partition of the 3D device simulation domain on a 2D array of mesh connected processors.
The simulator can work with both a rectangular finite difference grid and a topologically rectangular10 finite element grid. This simplifies significantly the partitioning of the solution domain on the array of processors and the design of the communications harness. It is clear that the best processor configuration for spatial device decomposition of the topologically rectangular 3D grid is a 3D array of mesh connected processors. This is illustrate in Fig 1 where the theoretical speed-up of a hypothetical linear solver 7 is plotted as a function of the size of a nxnxn rectangular grid. The grid is partitioned on 64 processors organised in three different configurations: a 64 processor pipeline, an 8x8 2D array and a 4x4x4 3D array of processors. However we are restricted to a 2D array of processors on our Parsytec parallel computers, where one processor may be connected only to its 4 nearest neighbours. Because of this the current version of our communication harness is designed to run on an arbitrary NxM array of mesh connected processors. The spatial device decomposition of a 3D device on a 2D array of processors is illustrated in Fig 2. The solution domain is automatically decomposed into NxM subdomains in one grid plane (i,j). All corresponding grid point in the third grid direction k lie on the same processor. To achieve better speed-up each subdomain in the (i,j) plane should be as square as possible. Solution Domain and Grid Generation The solution domain and the generated grid depend on the structure of the simulated device and the doping concentrations inside. Templates for generating the grid for different types of cellular power semiconductor devices including IGBTs, MOSFETs, MCTs and diodes will be provided. The grid generation in the solution domain proceeds on a single processor. After the grid generation the doping profile is assigned to the grid points, the material type is assigned to each finite element and the boundary conditions are identified. The generated grid with the doping, material and boundary conditions information is then distributed over the processor array. The distributed semiconductor solver is universal and independent of the particular device structure.
We will illustrate this approach on an IGBT example. The layout of a typical octagonal IGBT, with cells arranged in a Cartesian grid, is outlined in Fig 3(a). The cross sectional view of a single cell is given in Fig 3(b). Because of the symmetry the simulation domain involves only one quarter of the cell. The two possible grids for our simulator, a finite difference grid and a finite element grid, are presented in Fig 4(a,b) respectively. In the case of cellular devices the disadvantages of the finite difference grid are obvious. Small steps and large numbers of grid points are necessary to represent properly the octagonal shape of the cell. Even more difficult is to keep a high local grid density near the metallurgical p-n junctions.
(a) Cell layout Figure 3. Octagonal cell IGBT
(b) Vertical cross section
(a) Finite difference grid
(b) Finite element grid based on distorted bricks
Figure 4. Discretization of the solution domain in the simulation of an octagonal cell IGBT The finite element grid is a topologically rectangular grid. It keeps the number of grid points along the grid lines constant in each one of the index directions i and j. The grid generation is based on distorted bricks. In the i,j plane the brick sides are quadrilateral elements. The grid generation process is determined by specified contours in the solution domain. In this particular example the guiding contours are the shape of the gate
electrode and the metallurgical p-n junctions in the device. An example of the partitioning of the (i,j) plane of the solution domain on an array of 3x3 processors is given in Fig 5. An optimum load balancing for such a topologically rectangular grid occurs when the number of grid nodes in the i and j directions are divisible by the number of processors in that direction. To avoid the oscillations in the performance, when the number of grid nodes is not divisible by the number of processors, an optimisation procedure for grid partitioning based on simulated annealing is being developed
Figure 5. Partition of the (i,j ) plane of a topologically rectangular grid in the simulation of an octagonal cell IGBT on an array of 3x3 processors
Figure 6. Mapping of a distorted brick finite element into a unit cube
The 3D doping profile in the current version of the IGBT template is introduced by a simple analytical approximation. The generic one dimensional doping distribution is a Gaussian profile whose parameters are determined by the junction depth and the surface doping concentration. The boron distribution in the example IGBT is outlined in Fig 4. Parallel Discretization and Solution We have adopted the decoupled Gummel procedure for the solution of steady-state problems and a modification of the decoupled Mock procedure based on time dependent modification of the Poisson equation. The both schemes are simple and amenable to parallelization. The finite difference discretization is trivial and will not be discussed. The Galerkin finite element approach has been adopted to solve the Poisson equation on a finite element grid. The integration over the distorted brick finite elements during the discretization was carried out by a linear isoparametric mapping of each element into an unit cube (Fig 6). For the parallel matrix generation and assembly we use a node based partition of the grid in which all elements which belong to a particular condensation node are on the same processor and no communications between processors are required. In such a partition there is element overlap between the processors where each element of the subdomain boundary shell appears at the same time on two processors. If standard element by element matrix generation and assembly is used the contribution of each boundary element will be calculated twice on two different processors which reduces the parallel efficiency. It is natural in this case to use a node based assembly approach in which the solution subdomain on each processor is scanned not element by element, but node by node. For all elements which condense in a given node, only the contribution to the matrix coefficients attached to this node are calculated. This leads to almost 100% efficiency when the number of nodes in the i and j directions are divisible by the corresponding numbers of processors10. To solve the nonlinear system arising from the discretization of the Poisson equation we have adopted a Block Newton SOR scheme. The finite element discretization of the topologically rectangular grid after the linearisation leads to a 27 diagonal matrix with a regular structure. A four colour variant of the SOR variant in the (i,j) plane is required to preserve the uniform convergence of the SOR method in the parallel implementation11. In the third index direction all grid points lie on a same processor which favours the use of
a block approach. Only one SOR step is performed per Newton iteration. Although such an approach increases the number of Newton iterations it reduces drastically the overall solution time. The parallel performance of the method is illustrated in Fig 7. The discretization of the current continuity equation on the distorted brick finite element grid is more complicated. We are examining three possible approaches. The simplest one is to divide each distorted brick into six tetrahedral elements and to carry out a standard 3D control volume Gummel type of discretization. The second approach is to use a 3D analogue of the 2D Gummel like discretization developed for quadrilateral finite elements. Finally shape functions exponentially fitted to the potential distribution could be used. A parallel implementation of the BiCGSTAB(2) method has been adopted for the current continuity equation.
Figure 7. Speed-up of the Block Newton SOR method for a cubic nxnxn problem on an 8x8 array of transputers.
Figure 8. Electric field distribution in a octagonal cell IGBT at 600V anode voltage
An example of the electrical field distribution in a cellular IGBT at 600V anode voltage illustrates the importance of the 3D simulations (Fig 8). The 2D simulations of the same device, both with our in-house 2D IGBT simulator and with MEDICI, show a 30% lower electric field in the same device. Only the constant field contours which are above the maximum electric field predicted by the 2D simulations are presented in the 3D simulation picture. Conclusions In this work we have presented our systematic approach to the design of a parallel finite element 3D power semiconductor device simulator. Our attempt was to build a portable and scalable parallel code which runs with high efficiency on our Departmental parallel computers. To achieve this goal special measures were undertaken at each stage of the software development. A specially developed performance theory was used at each development stage to study the potential of the different possible approaches and to choose the optimal variant. A special type of finite element, topologically rectangular grid was used to simplify the solution domain decomposition. The node based matrix generation and assembly provides almost 100% efficiency in the case of optimal partitioning. The Coloured Newton-SOR solver in combination with an appropriate damping procedure guarantees the convergence of the Poisson equation at the high voltages typical of power semiconductor devices, and has good parallel performance.
Acknowledgements This work is funded by the EPSRC under grant GR/H23085 as part of a LINK/PEDDS project. We would like to thank Peter Waind and David Crees of GEC Plessey Semiconductors for their information and helpful discussions. References 1. Brown, A R, Asenov A, Barker, J R, Waind, P, Crees, D E: “Calibration of the numerical simulations in the design of high temperature IGBTs”, Proc. International Seminar on Power Semiconductors, Ed. Vitezslav Benda, Czech Technical University in Prague, 1994, pp151-157 Baliga, B J, Adler, M S, Love, R P, Gray, P V, Zommer, N D: “The Insulated Gate Transistor: A New Three-Terminal MOS-Controlled Bipolar Power Device”, IEEE Trans. Electron Dev., Vol. ED-31, No.6, pp.821-828, 1984 Baliga, B J, Krishna, S: “Optimization of recombination levels and their capture cross section in power rectifiers and thyristors”, Solid State Elec., Vol 20, pp 225-232, 1977 Brunner, H, Gertenmaier, Y C, Mattausch, H-J: “Impact of cell geometries and electrothermal effects on IGBT latch-up in 2D simulation”, Simulation of Semiconductor Devices and Processes Vol 5, Eds. S. Selberherr, H. Stipel & E. Strasser, Springer-Verlag, Vienna, 1993 Litsios, J, Müller, S, Fichtner, W: “Mixed-mode multi-dimensional device and circuit simulation”, Technical Report No. 93/28, Swiss Federal Institute of Technology, Zurich (Integrated Systems Laboratory) Brown, A R, Asenov A, Barker, J R, Jones, S, Waind, P, “Numerical simulation of IGBTs at elevated temperatures”, Proc. Int. Workshop on Computational Electronics, Ed C. M. Snowden, University of Leeds Press, pp. 50-55, 1993 Asenov A, Reid, D, Barker, J R: “Speed-up of scalable iterative linear solvers implemented on an array of transputers”, Parallel Computing,Vol 20, pp 375-387, 1994 Dutton, R W, Goosens, R J G: “Technology CAD at Stanford University: Physics, algorithms, software, and applications”, Technology CAD Systems, Eds. F. Fasching, S. Halama, S. Selberherr, SpringerVerlag, Vienna, 1993, pp 113-130 Reid, D, Asenov, A, Barker, J R, Beaumont, S P: “Parallel simulation of semiconductor devices on MIMD machines”, Proc. Int. Workshop on Computational Electronics, Ed C. M. Snowden, University of Leeds Press, pp. 161-165, 1993
2
3. 4.
5.
6.
7. 8.
9.
10. Asenov, A, Barker, J R, Brown, A R, Lee, G L: “Scalable parallel 3D finite element nonlinear Poisson solver”, Massively Parallel Processing Applications and Development, Eds. L. Dekker, W. Smit and J.C. Zuidervaart, Elsevier, 1994, pp 665-672 11. Asenov, A, Reid, D, Brown, A R, Barker, J R: “The implementation and speed-up of coloured SOR methods solving 3D problems on arrays of transputers”, Transputer Applications and Systems ‘93, Eds. R. Grebe et al, IOS press, Amsterdam, 1993, pp 578-587