# A 64Kb × 32 DRAM for graphics applications

by T. Sunaga K. Hosokawa S. H. Dhong

K. Kitamura

A high-speed 2Mb CMOS DRAM with 32 data I/Os is described. A 0.6- $\mu$ m CMOS process with a single polysilicon layer, two levels of metal, and substrate-plate trench-capacitor (SPT) memory cells is used to fabricate the chip. It is designed to provide the wide data bandwidth required by high-performance graphics applications. A 35-ns access time with an 80-ns cycle time has been demonstrated. The 32-bit data bus and the high-speed feature achieve more than two times better graphics performance than conventional dual-port memories. A sensing method with a 2/3  $\emph{V}_{\rm DD}$  bit-line precharge voltage and a limited bit-line voltage swing is exploited to optimize speed and power. The chip, which operates on a 5-V power supply. dissipates 140 mA at the 80-ns cycle time.

#### Introduction

A remarkable advance in graphics display systems has been achieved in recent years. It is seen in a wide range of products from portable systems to high-end workstations. As a frame buffer memory for these display systems, DRAMs or DRAM-based memory chips have been used extensively because of density and cost advantages. As the number of colors and the screen size of a display increase, the memory density necessary for the frame buffer becomes large. For example, one and two megabytes of memory are very common memory sizes required in a

personal computer system; multiple-megabyte frame buffer memories are not unusual in high-end workstations. In addition to a large memory size, graphics applications also require a very high data rate for frame buffer memories. The data rate is the most important performance factor in maintaining a fast screen change. It plays a crucial role as the number of bits per pixel becomes large. Frame buffer memories have two functions, a screen refresh and data update. To display the memory contents on the screen, all data bits required for one screen must be read within a specific time which is determined by horizontal and vertical refresh periods. The read operation for the screen refresh is done periodically. On the other hand, the memory contents must be updated by the graphics controller or the main processor. Since the update operation happens randomly, it is done in the time slots between the periodical read operations for the screen

These performance and functional requirements are unique to graphics applications, and conventional DRAMs are not an optimal solution for them. Dual-port video memories (VRAMs) specially designed only for these particular applications have been announced in 256Kb-4Mb generations [1-4]. They have a serial access memory (SAM) port for screen-refresh operations in addition to a random access memory (RAM) port. A single read access obtains either a full page or a half page of data and places it in the shift registers. A serial read operation through the SAM port is available for screen refreshing. It is done in parallel and independently of any access through

\*\*Copyright 1995 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor.

0018-8646/95/\$3.00 © 1995 IBM

| Design rule                                                       | 0.6 μm                                                                                        |
|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
| CMOS process                                                      | Retrograded n-well p- epitaxial on p+ substrate                                               |
| Cell structure Cell transistor Storage cell Cell capacitance Size | p-MOS<br>Substrate-plate trench capacitor (SPT)<br>80 fF<br>$1.9 \times 4.35 \ \mu\text{m}^2$ |
| Polysilicon                                                       | Ti polycide (3.0 $\Omega/\Box$ )                                                              |
| Diffusion                                                         | Ti salicide (2.0 Ω/□)                                                                         |
| Metal<br>M1<br>M2                                                 | Tungsten (0.16 $\Omega/\Box$ )<br>Aluminum (0.03 $\Omega/\Box$ )                              |

Table 2 Chip features and functions.

| Chip size         | $5.85 \times 8.70 \text{ mm}^2$                                    |
|-------------------|--------------------------------------------------------------------|
| Organization      | 64Kb × 32<br>Two sets of 8-bit address inputs                      |
| Function          | Fast page<br>Read-modify-write<br>Write-per-bit                    |
| Refresh mode      | RAS only<br>CAS before RAS<br>Hidden refresh                       |
| Refresh           | 256 cycles/4 ms                                                    |
| Package           | 80-pin PFP                                                         |
| Pin configuration | 32 I/Os, 16 addresses,<br>RAS, CAS, WE, OE,<br>12 power and ground |

the RAM port. Therefore, for memory content updates, almost the full time is available in VRAM, while only a small fraction of the full time can be allocated in a conventional DRAM because the latter must share its data I/Os for both screen refresh and update operations. However, the RAM port in VRAM is basically the same as the data I/Os of a conventional DRAM, and there is no advantage in normal and page-mode cycle times for memory-content update operations. The difference in the time windows available for updates causes differences in performance. This suggests that a simpler single-port DRAM can have a better performance than VRAM if a higher data rate, which compensates the time lost for screen-refresh operations, can be obtained.

A 2Mb CMOS DRAM has been designed to realize a high graphics system performance by this single-port architecture. Key features needed to obtain the high data rate are a 32-bit-wide data bus and an 80-ns cycle time for normal RAS (row address strobe) access. Increasing the number of data I/Os is an effective way of enhancing the

memory bandwidth; however, it also creates some difficult circuit design problems. In particular, when the chip is operated at a fast switching speed, active power becomes a serious concern. This paper describes some CMOS circuit design techniques that solve these problems and enable high-speed operation of the 32-I/O DRAM. A detailed circuit design approach to optimize DRAM speed and active power is shown, in which the DRAM architecture is tuned specifically for graphics system flexibility. Some performance advantages for this specific application are also explained.

## **Process technology**

The CMOS process used in fabricating the 2Mb DRAM features a 0.6- $\mu$ m design rule, a single layer of polysilicon, and two levels of metal. A p— epitaxial layer is grown on a p+ substrate, and low-resistivity retrograde n-wells are implemented by ion implantation. It is a smaller version of the 0.8- $\mu$ m process with substrate-plate trench-capacitor (SPT) memory cells which is used for the 4Mb DRAM [5]. The  $1.9 \times 4.35$ - $\mu$ m<sup>2</sup> SPT cell consists of an 80-fF trench capacitor and a p-MOS FET transfer gate. The first level of metal, M1, provides bit lines. Word lines consist of the polycide layer and the second level of metal, M2. Self-aligned silicide is formed on source and drain diffusion regions of both p- and n-channel devices. The process features are summarized in **Table 1**.

# Chip features and functions

For graphics application flexibility, the 64Kb × 32 chip consists of two 64Kb × 16 arrays. There are two sets of 8-bit address receivers associated with each 1Mb array. Each bus accepts independent 8-bit row and 8-bit column addresses in a conventional address multiplex method. One set of signal inputs—a row address strobe (RAS), a column address strobe (CAS), a write enable (WE), and an output enable (OE)—is provided to control all of the 2Mb arrays. The chip has 72 pads around its peripheral area. However, eight of them are monitoring pads for wafer tests, and only 64 pads are actually used: 32 data I/Os, 16 address pads, four pads for RAS/CAS/WE/OE, and 12  $V_{\rm CC}/V_{\rm SS}$  pads. To reduce switching noise due to the data I/O simultaneous activation, four pairs of  $V_{\rm CC}/V_{\rm SS}$  pads are used for power and ground of 32 data I/Os. Two other pairs are for internal circuits only; there is no on-chip connection to  $V_{cc}/V_{ss}$  of data I/Os to protect receiver circuits from power and ground line noise induced by data I/Os. An 80-pin plastic flat package (PFP) is used for the package of the chip.

In addition to conventional operational modes such as fast-page and read-modify-write, a write-per-bit function is implemented. Write-mask operation for 32 individual bits is defined by driving selected data I/O pins when RAS and WE are turned low. The chip is designed for high-speed



#### Figure :

(a) Microphotograph of the 64Kb  $\times$  32 DRAM chip. (b) Floor plan of the 64Kb  $\times$  32 DRAM chip. Two 1Mb arrays are placed in the top and the bottom halves of the chip. All 72 pads (64 active and eight wafer test purpose) are placed around chip peripheral areas. Pad identifications: W = WE, R = RAS, A = address, O = OE, C = CAS, V = internal  $V_{CC}$ , G = internal circuit ground, D = data I/O, VO = off-chip driver  $V_{CC}$ , and GO = off-chip driver ground.

operation, and 80 ns is a typical cycle time for its normal read and write modes. It has CAS before RAS and hidden refresh functions (256 words per 4 ms is the refresh cycle). Features and chip information are summarized in **Table 2**.

# Chip organization

#### • Floor plan

Figure 1(a) shows a microphotograph of the 2Mb CMOS DRAM chip (actual chip size is  $5.85 \times 8.70 \text{ mm}^2$ ). The chip consists of two 1Mb arrays which are placed in the top and bottom halves of the floor plan shown in Figure 1(b). The floor plan and circuit block placements are optimized to obtain the shortest signal lines from address receivers to data I/Os. Each 1Mb block in the  $64\text{Kb} \times 16$  array has its own eight address receivers and 16 data

I/Os in peripheral areas. Eight address pads for the tophalf arrays are placed on a top peripheral. True and complement pairs for the output bus of row and column addresses run vertically in the center portion of the chip, where redundancy and predecoder/decoder circuits are also placed. The scheme realizes effective connections from the address circuits to both redundancy and decoder circuits. Write buffers and I/O sense amplifiers are placed along the sides of the sense amplifier and column decoder blocks. Since two sets of eight data I/O circuits and pads for the top-half arrays are located in both left and right top-half peripheral areas, data stream paths from sense amplifiers to I/O pads are also short. The bottom-half arrays, which mirror those of the top half, have a similar structure. Therefore, top and bottom peripherals contain address receivers of eight bits each, and there are 16 data I/Os on both left and right

# Figure 2

Subarray organization of the  $64\text{Kb}\times32$  DRAM chip. The chip has a total of 16 such 128Kb subarrays. Two I/O pads are placed close to their array blocks.

peripheral areas. In addition to the address receivers, the bottom peripheral area contains all control signal input circuits for RAS, CAS, WE, and OE.

#### Memory array

There are 16 subarray blocks in the chip. Figure 2 shows one subarray block which contains 128Kb of storage cells. Column decoders and sense amplifiers divide it into two 64Kb blocks, each containing 128 word lines and 512 folded bit-line pairs. There are 64 cells on each bit line. In parallel with the 512-bit-long polycide word lines, M2 lines run above them, and these two layers of lines are connected every 128 bits to minimize word-line delays. A RAS access activates one of two 64Kb arrays, and the column decoder selects two out of 512 bit switches to transfer two data bits from or to two I/Os for write or read operation. The half-array activation, short bit lines with 64 cells, and 512-bit-long M2-assisted word lines are key array design features for achieving low active power, fast access and cycle times, and a large signal margin.

# • Power system

The chip is supplied by 5-V power only, but all internal circuits, including the memory arrays, operate at 3.3 V. The only exception is power for the off-chip drivers, which is supplied directly from the external 5 V. The 3.3-V internal voltage was selected because of the 0.6- $\mu$ m technology used, but it also contributes to a power reduction. For high-speed operation of the 32-data-I/O DRAM, a solid internal power system is desirable. Four on-chip voltage regulators convert the externally supplied

5 V to the internal 3.3 V. There are two internal  $V_{\rm CC}$  pads on the top and the bottom peripheral areas of the floor plan. To minimize voltage drop due to  $V_{\rm CC}$  power wiring resistance, each  $V_{\rm CC}$  pad is connected directly to two regulators on its left and right sides. Regulated 3.3-V outputs are distributed all over the chip by wide M2 wiring as an internal  $V_{\rm DD}$  bus. Cell transistors are p-channel devices in an array n-well. The array n-well voltage must be biased at about 1.1 V above the internal voltage,  $V_{\rm DD}$ , to minimize subthreshold leakage currents. A charge-pump voltage generator on the chip supplies this n-well potential. It generates 4.4 V from the internal 3.3 V, shielding it from the external 5-V variations.

#### Circuits

### High-speed circuits

Some circuit design techniques explored in previous highspeed DRAMs are also used [6-8]. Fast RAS access times



#### Figure 3

Sense circuit of the  $64Kb \times 32 DRAM$ .

were the prime targets of these DRAMs. Since graphics operations move data to and from frame buffer memories consecutively in both normal and page modes, cycle times are also important. The high-speed design in the 2Mb DRAM is therefore focused to achieve fast access and cycle times. As the floor plan shows, simple and short signal propagation paths are used throughout, from address inputs to data outputs. Row and column circuits are physically independent of one another. They do not share true and complement output buffer lines. The separate true and complement buses eliminate an address control circuit and the delays associated with it. Because redundancy circuits are located close to row and column predecoders, they require no modification of the decode timing. Since the polycide word line is the key delay factor in RAS access time, it is shorted by M2 every 128 bits, and it connects only 512 cell transistors rather than the conventional 1024 [8, 9]. The bit-line length is also half that of typical 1Mb/4Mb DRAMs. Only 64 cells are connected to the bit line. The short word and bit lines are important for the fast cycle time, because they reduce the time required for the precharge operation.

#### · Sense circuits

The sensing scheme is the most important circuit in any DRAM design, because signal development speed and active power depend on it strongly. Since p-channel array transistors are used in the 2Mb DRAM, the fastest signal development can be obtained when bit lines are precharged at the full internal supply voltage,  $V_{\rm DD}$ . However, this causes a high active current, since one of the bit-line pair must swing over the full rail-to-rail voltage. The signal development speed and the active power can be traded off against each other. As the optimum sensing scheme,  $2/3 V_{DD}$  sensing with a limited bit-line swing is used [8, 10]. Figure 3 shows the sense circuit. When cross-coupled p- and n-channel transistors are activated by the p-set node and the n-set node, they latch to the full  $V_{\rm pp}$  voltage. However, one of the bit-line pair is clamped at the threshold voltage, V, because of p-channel transistors between the bit lines and the sense amplifier. The 2/3  $V_{\rm DD}$ precharge voltage is obtained by simply shorting the bitline pair. Since the p-set node pulls up one of the bit-line pair from 2/3  $V_{\rm DD}$  to full  $V_{\rm DD}$  when the sense amplifier is turned on, the array current is about 1/3 of the full  $V_{\rm DD}$ precharge sensing. The 2/3  $V_{\rm DD}$  precharge voltage allows a reasonable signal development speed to meet the 35-ns access time. This sensing method is faster and uses less power than the conventional 1/2  $\ensuremath{V_{\mathrm{DD}}}$  sensing scheme.

The use of a short 64-cell bit line and  $2/3\ V_{\rm DD}$  sensing with a limited bit-line swing creates an ideal combination for optimum sensing. The combination is also very effective in reducing cycle time. Because of the small capacitance ratio between bit line and cell, the word line is



Simulated waveforms of the sense circuit. Simulation was performed at 87°C and 3.3 V internal voltage. The first half of simulation is a read operation. When the sense amplifier is activated, the low-level bit line is clamped at  $V_i$ . The latter half of simulation is a write operation which starts from a time around 170 ns to write the opposite polarity of signal into the cell.

not boosted in the write-back operation, and the chip can keep a sufficient signal margin. The unboosted word line is a favorable feature for short cycle time. Clamping the bit-line voltage with grounded-gate p-channel devices is a simple and cost-effective sensing scheme, but it has a longer write-back time because of the source-follower operation of the clamp devices. However, the short bit line overcomes this drawback because of its small capacitance. Generation of the  $2/3\ V_{\rm DD}$  precharge voltage by shorting the bit-line pair is another good feature for fast precharge because of the short bit line. Figure 4 shows simulated waveforms of key signals.

#### • Low-power design

Reducing active power is a very important task in fast-cycle-time DRAM designs, because the current, which provides the majority of CMOS DRAM power, is proportional to the reciprocal of the cycle time. In addition, the wide data bus chip dissipates significantly higher power in data path circuits than conventional DRAMs. Furthermore, the 32 data I/Os also have a considerable amount of power consumption. Thus, low-active-power designs are the primary focus of the chip architecture definition. Since a dominant part of the total power in memory arrays is consumed when sense amplifiers are fired and bit lines are restored to the





#### Figure 5

RAS access waveform. RAS access time is observed at  $25^{\circ}\text{C},\,5$  V, and 80-pF load capacitance with 2-mA output current load.



# Figure 6

Plot of  $V_{\rm CC}$  vs. RAS access time. Plot taken at 25°C and 80-pF load capacitance with 2-mA output current load.

precharge state, selection of a sensing scheme and a fractional array activation requires careful consideration. As shown in the section on sense circuits, the 2/3  $V_{\rm DD}$  sensing with a limited bit-line swing is the lowest-power

**Table 3** Chip characteristics at 25°C, 5 V, and 80-pF load capacitance.

| RAS access time $(t_{RAS})$      | 35 ns  |
|----------------------------------|--------|
| Cycle time $(t_{RC})$            | 80 ns  |
| RAS precharge time $(t_{RP})$    | 15 ns  |
| Page cycle time $(t_{PC})$       | 30 ns  |
| Active power at 80-ns cycle time | 140 mA |

sensing method, because bit lines swing only 1/3 of  $V_{\rm DD}$ . The short bit line, which is half that of a conventional DRAM, is also effective in reducing array power. It allows a 1/2 fractional array activation with the standard refresh requirement of 256 words per 4 ms.

### Characteristics

Figure 5 shows waveforms of the RAS signal and the data output at 25°C and 5 V. A RAS access time of 35 ns is measured. Figure 6 shows a  $V_{\rm CC}$ -RAS access time plot. Read-write operation with an 80-ns cycle time is also demonstrated. The RAS precharge time of 15 ns is achieved by the unboosted word line and the short bit line. The typical active current is 140 mA at the 80-ns cycle time. Some performance data for a 5.0-V power supply voltage and a 25°C ambient temperature are summarized in Table 3.

# **Graphics applications**

The overall graphics performance of the  $64\text{Kb} \times 32$ DRAM can be compared with that of dual-port memories. For example, a typical 256-color, 1024 × 768-pixel screen usually needs a 1MB frame buffer, including off-screen memories. It therefore uses four 2Mb DRAMs with a 64-bit bus, or two 256Kb × 16 dual-port memories with a 32-bit bus. With a vertical screen refresh rate of 70 Hz, almost the full 14.28 ms (1/70) of the one-screen trace time is available for memory content update by a CPU or a graphics controller to the VRAM's RAM port. The time window for a single-port DRAM is smaller, because the time required for screen refresh must be subtracted from 14.28 ms. However, with a typical screen refresh buffer in the graphics controller, the 2Mb DRAM loses only about 20% of the 14.28 ms for screen refresh because of the wide data bus. The data rate for the memory-content update operation is determined by the product of the bandwidth and the ratio of available time to the full single-screen trace time. The 20% loss of the DRAM can be well compensated by the bandwidth. In horizontal line accesses, the page mode is extensively used. If the same page cycle time is assumed, the relative data rate for DRAM with respect to VRAM is 1.6, obtained from the double-width data bus and 80% of the available time window. In vertical accesses such as line drawing, the

nominal access mode must be used. Because of the two independent address inputs, the 2Mb DRAM can access two vertical pixels in a single nominal access. A typical cycle time for the nominal access mode in VRAM is 120 ns. The 1.5× advantage in cycle time for two pixels (but only an 80% available time window) gives the 2Mb DRAM about 2.4× better bandwidth than a typical VRAM. The two independent address inputs also allow the single DRAM to access a 2 by 2-pixel box area of an 8-bit-perpixel screen.

#### Conclusion

A 2Mb CMOS DRAM organized 64Kb × 32 has been developed for graphics applications. The 32-bit-wide data bus and an 80-ns cycle time provide the high memory bandwidth. The chip consists of two 1Mb arrays, and each 64Kb × 16 block has independent 8-bit address inputs to realize flexible memory mapping to graphics screens. The chip supports fast-page, read-modify-write, and write-perbit modes. Its package is an 80-pin PFP with 64 active

The floor plan is optimized to obtain short signal propagation paths from address receivers to data I/Os. Key circuit design features for achieving fast access time, short cycle time, and low active power consumption include M2strapped 512-bit-long word lines, short bit lines with 64 cells, and  $2/3 V_{\rm DD}$  sensing with a limited bit-line swing. A 35-ns RAS access time is observed with an 80-ns cycle time. A 140-mA active current is measured at the short 80-ns cycle time. The single-port ×32 DRAM can provide 1.6-2.4 times better bandwidth for graphics systems than conventional dual-port video memories because of its wide data bus and high-speed design.

#### **Acknowledgments**

The authors are indebted to many individuals in the IBM Japan Yasu manufacturing and development organization who worked on the project. We would like to thank N. Tanigaki, T. Saito, K. Fujisawa, and M. Kazusawa for process technology support. We would also like to acknowledge the test and characterization assistance by T. Yoshikawa and S. Iwamoto. We are grateful as well to L. M. Terman for his encouragement of our work and his review of an earlier version of this paper.

#### References

- 1. S. Ishimoto, A. Nagami, H. Watanabe, J. Kivono, N. Hirakawa, and Y. Okuyama, "A 256K Dual Port Memory," Digest of Technical Papers, International Solid-State Circuits Conference, pp. 38-39 (1985).
- 2. K. Ohta, H. Kawai, M. Fujii, T. Nishimoto, S. Ueda, and Y. Furuta, "A 1-Mbit DRAM with 33MHz Serial I/O Ports," IEEE J. Solid-State Circuits SC-21, 649-654 (October 1986).
- 3. R. Prinkham, D. Russell, A. Balistreri, T. H. Herndon, D. Anderson, A. Metha, T. Nguyen, N. H. Hong, H.

- Sakurai, S. Hatakoshi, and A. Guillemaud, "A 128k × 8 70MHz Multiport Video RAM with Auto Register Reload and 8 × 8 Block Write Feature," IEEE J. Solid-State Circuits SC-23, 1133-1139 (October 1988).
- 4. 4M-bit Dual Port Graphics Buffer, NEC Data Sheet,
- PD482445, NEC Corporation, Tokyo, Japan, April 1993. 5. N. C. C. Lu, P. E. Cottrell, W. J. Craig, S. Dash, D. L. Critchlow, R. L. Mohler, B. J. Machesney, T. H. Ning, W. P. Noble, R. M. Parent, R. E. Scheuerlein, E. J. Sprogis, and L. M. Terman, "A Substrate-Plate Trench-Capacitor (SPT) Memory Cell for Dynamic RAM's," IEEE J. Solid-State Circuits SC-21, 627-634 (October 1986).
- 6. N. C. C. Lu, H. H. Chao, W. Hwang, W. H. Henkels, T. V. Rajeevakumar, H. I. Hanafi, L. M. Terman, and R. L. Franch, "A 20-ns 128-kbit × 4 High-Speed DRAM with 330-Mbit/s Data Rate," IEEE J. Solid-State Circuits SC-23, 1140-1149 (October 1988).
- 7. N. C. C. Lu, G. B. Bronner, K. Kitamura, R. E. Scheuerlein, W. H. Henkels, S. H. Dhong, Y. Katayama, T. Kirihata, H. Niijima, R. L. Franch, W. Hwang, M. Nishiwaki, F. L. Pesavento, T. V. Rajeevakumar Y. Sakaue, Y. Suzuki, E. Yano, and Y. Iguchi, "A 22-ns 1-Mbit CMOS High-Speed DRAM with Address Multiplexing," IEEE J. Solid-State Circuits SC-24. 1198-1205 (October 1989).
- 8. T. Kirihata, S. H. Dhong, K. Kitamura, T. Sunaga, Y. Katayama, R. E. Scheurlein, A. Satoh, Y. Sakaue, K. Tobimatsu, K. Hosokawa, T. Saitoh, T. Yoshikawa H. Hashimoto, and M. Kazusawa, "A 14-ns 4-MB CMOS DRAM with 300-mW Active Power," IEEE J. Solid-State Circuits SC-27, 1222-1228 (September 1992).
- T. Furuyama, T. Ohsawa, Y. Watanabe, H. Ishiuchi,
   T. Watanabe, T. Tanaka, K. Natori, and O. Ozawa, "An Experimental 4-Mbit CMOS DRAM," *IEEE J. Solid-State* Circuits SC-21, 605-611 (October 1986).
- 10. S. H. Dhong, N. C. C. Lu, W. Hwang, and S. A. Parke, "High Speed Sensing Scheme for CMOS DRAM's," IEEE J. Solid-State Circuits SC-23, 34-40 (February 1988).

Received May 25, 1994; accepted for publication November 17, 1994

Toshio Sunaga IBM Japan Ltd., Yasu Technology Application Laboratory, 800 Ichimiyake, Yasu-cho, Yasu-gun, Shiga-ken 520-23, Japan (SUNAGA at YSUVM1). Mr. Sunaga received the B.S. degree in applied physics from the Science University of Tokyo, Japan, and the M.S.E. degree in electrical engineering from Princeton University. He joined IBM Japan, Ltd. at the Fujisawa plant, Kanagawa, Japan, in 1970, and worked on analog circuit design and power semiconductor devices. After educational leave from IBM Japan at Princeton University, he worked on semiconductor process technology, VLSI design methodology, 1Mb, 4Mb, and 8Mb ROM development, and DRAM circuit design at the Yasu manufacturing and development facility, Shiga, Japan, and the Japan Science Institute (JSI), Tokyo, Japan. Mr. Sunaga is currently with the IBM Japan Yasu Technology Application Laboratory; he is responsible for semiconductor product development.

Koji Hosokawa IBM Japan Ltd., Yasu Technology Application Laboratory, 800 Ichimiyake, Yasu-cho, Yasu-gun, Shiga-ken 520-23, Japan. Mr. Hosokawa received the B.S. degree in mechanical engineering from Hiroshima University, Japan, in 1983. Since joining IBM Japan Ltd. at the Yasu facility in 1983, he has been engaged in CMOS DRAM product development.

Sang H. Dhong IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598 (DHONG at YKTVMV). Dr. Dhong received the B.S.E.E. degree from the Korea University, Seoul, in 1974 and the M.S. and Ph.D. degrees in electrical engineering from the University of California at Berkeley in 1980 and 1983, respectively. He joined the IBM Research Division in Yorktown Heights, New York, in 1983 as a research staff member involved with the research and development of silicon processing technology, mainly bipolar devices and reactive-ion etching (RIE). From 1985 to 1992, he was engaged in research and development of DRAM designs spanning many generations of IBM DRAMs, from 1 Mb to 256 Mb. After spending three years in development in one of the IBM PowerPC. Chips as a circuit designer, he is currently working on low-power design aspects of the IBM PowerPC.

Koji Kitamura IBM Japan Ltd., Semiconductor Operation, 800 Ichimiyake, Yasu-cho, Yasu-gun, Shiga-ken 520-23, Japan. Mr. Kitamura received the B.S. degree from Osaka University and the M.S. degree from Kyoto University, Japan, both in chemical and solid-state physics. In 1979, he joined IBM Japan at the Yasu facility. He joined the semiconductor group in 1982, working in back-end-of-line (BEOL) processing. In 1988, he joined a process development group to work in fast-access DRAM fabrication and characterization. His technical interests include device simulation and design, DRAM characterization, nonvolatile memory design, and yield modeling. Mr. Kitamura is a manager of semiconductor process development at the IBM Yasu site.

PowerPC is a trademark of International Business Machines Corporation.