# The LT1280 for Through-the-Pins Testing of the Thermal Conduction Module

Testing the thermal conduction module (TCM), the high-density field-replaceable unit (FRU) used in the IBM 3081 processor models, and diagnosing the faults encountered to a minimal repairable set of entities posed a new problem for the IBM engineers. The requirement and the economic necessity of thoroughly exercising the entire TCM logic and random access memory (RAM) array structure through the input/output pins of the TCM are discussed. This is followed by a description of the test system alternatives and the LT1280 [logic tester having 1280 input/output (I/O) pins] as the selected TCM manufacturing test system. The TCM logic density and high I/O count required new concepts of test system organization, size, and complexity to achieve a test and diagnostic system with high flexibility and high throughput capability.

### Introduction

## • Product description

A review of some of the major attributes of the thermal conduction module (TCM) is necessary to understand the test and diagnostic requirements that guided the development of the LT1280 as the "high end" of the LT series of testers developed by IBM test engineers in the early 1970s. The initial tester models which were used to test chips and modules (via chip-in-place tests) having 96 signal I/O pins were conceived, architected, designed, and built at the Test Advanced Product Unit of the IBM General Technology Division in Endicott. The architecture of the tester allowed for expansion of the number of signal I/O logic pins from 96 to 1280. This made it possible to use it for TCM throughthe-pins (TTP) testing (i.e., through the module pins).

The TCM is the field-replaceable unit (FRU) of IBM 3081 processor models [1]. There are six key TCM characteristics (see Fig. 1) which had to be considered:

- 1. The multi-layer ceramic substrate supports up to 133 logic and array chips.
- 2. Of the 1800 I/O pins, 1200 can be logic I/O; the balance can be service/power.
- 3. The unit dissipates 300 W.

- 4. There are 96 engineering change (EC) pads along the periphery of each chip site. In addition to EC and repair capability, these EC pads allow test point probing for both chip-in-place testing and through-the-pins diagnosis.
- 5. Each TCM has nominally 40 000 circuits, and greater than forty circuits per module signal I/O.
- There are many module types and many combinations of different EC levels.

## • Through-the-pins (TTP) testing approach

The TCM design lends itself to either chip-in-place (CIP) or through-the-pins (TTP) testing. In CIP, the tester accesses the EC pads of the chips on the TCM, whereas in TTP the tester accesses only the TCM I/O pins. For CIP testing (Fig. 2), the TCM is powered through the pins while a cluster probe is indexed to each chip site on the top surface of the module. In this manner, the EC pads of each chip site are contacted and a complete stuck-fault test is made of the chip. The TTP test is performed by both powering and exercising the logic by tests created for execution through the signal I/O pins of the TCM module. A high priority was placed on the TTP test method because of its high throughput and the quality of the test. This method also ensured the integrity of the I/O connection through the substrate to the logic.

© Copyright 1983 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the *Journal* reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by computer-based and other information-service systems. Permission to *republish* any other portion of this paper must be obtained from the Editor.



Figure 1 Thermal conduction module—the product to be tested.



Figure 2 Block diagram of chip-in-place (CIP) test system. The probe is placed over the chip site, making contact with the EC pads. Tests are run. If a test fails, the analyzer determines probable cause and initiates repair action. All chip sites are handled individually.

Although some test systems with 96 signal I/Os were available commercially at the time, none projected a greater than 1000-pin capability. Thus, the decision was made to develop an in-house TCM-TTP tester.

## Logic and array design considerations

The high circuit count per module I/O required the establishment of a number of product, logic, and array design considerations to ensure that the dc stuck-fault tests could be generated for the entire TCM. The key attributes of the logic design included the incorporation of certain features specific to manufacturing testing. First, level-sensitive scan design (LSSD) [2] was to be used throughout the design for ensuring that logic could be partitioned for test generation and test execution. Second, there was to be independent control of the array chip I/O and accessibility of any array buried in the logic, thus allowing for the test of all array bits. Finally, there was to be a chip-select control for CIP testing. This was made possible by the use of a module pin to control all chip outputs and to force the outputs to a noncontrolling state. Because it used the EC pads of each chip site as test contacts, the tester was capable of performing independent CIP tests at each chip site.

### Test plan

A test plan had to be developed which took into consideration the three major factors affecting the system and TCM development cycle: testing capability, test data development turnaround time, and cost.

A low-volume production/high-EC time was projected for initial system design, with a later high-volume/stable-EC time for system production. For high-EC times, throughput is not as important as the ability to respond to high EC activity. As shown in Fig. 3, the CIP test method is able to make use of reprocessed chip test data (used in chip manufacturing) to do a CIP test on the TCM. Since these test data need only be reformatted for CIP testing, the time for their availability to the test floor is significantly shorter than that of the TTP method. This is an important advantage of the CIP method for major chip changes to the TCM (and therefore for changes which are dependent on the availability of chip test data).

The test data development cycle for the TTP tests for the TCM is relatively long compared to that for the CIP tests. As shown in Fig. 4, the engineering design system test data generation cycle typically results in a release interface tape (RIT) of greater than 100 megabytes of data. Post-RIT processing by manufacturing produces more than 400 megabytes, which results in a verified tester file of up to 33 megabytes, with approximately 5 megabytes of executable test data. The rest of the data are available for diagnostics during fault isolation. The test data cycle (EC to tester) takes a few weeks to process.

Once ECs occur less frequently and the testing volumes start to increase, the throughput advantage of the TTP-LT1280 becomes important in decreasing the manufacturing



Figure 3 Test data flow for chip-in-place testing. The manufacturing release interface tape (RIT) contains both physical design and test data for each chip released to manufacturing. The manufacturing post-processor converts the RIT data into tester language and stores it in an LT verification file for callout when needed to run a tester.

cost. The throughput capability of the LT1280 is roughly four times that of a CIP tester.

# LT1280 test system description

Figure 5 is a sketch of the total system with the TCM handler enclosure removed. The primary attributes of the LT1280 hardware design are its high performance with throughthe-pins testing, its capability for testing both capped and uncapped TCMs, and the fact that two diagnostic probes are available for use during uncapped testing which leads to effective fault resolution.

Figure 6 shows the hardware system blocks: the control unit, the drive units, the I/O box, the handler, and the cooling support unit. Each drive unit contains 640 programmable I/O pin drivers, which control the one or zero levels, and a programmable reference for the voltage-level detector. Each pin driver has a  $90-\Omega$  output impedance (terminating) to connect it to the  $90-\Omega$  coaxial cable leading to the I/O box.



Figure 4 Test data flow for LT1280 through-the-pins (TTP) testing.



Figure 5 Sketch of LT1280 test system.

The I/O box hardware contains FET switches for selecting the pin source impedance (90 or 750  $\Omega$ ) and for facilitating the connection of drivers, level detectors, and error collection circuitry.

The handler (Fig. 7) contains a 1800-pin zero-insertionforce connector, which provides an interface between the



Figure 6 Test system block diagram for an LT1280.



Figure 7 Photograph of the LT1280 handler showing module, probes, and life support chamber.

TCM and the program board. This program board allows quick changeover between module families having different I/O and power pin designations. A chamber is automatically clamped against the uncapped module flange and is filled

with a low boiling-point hydrocarbon fluid for cooling the powered chips. Within the chamber, two single-point diagnostic probes are individually moved under program control to any EC pad for use in resolving the diagnosis to the failing chip(s) or interconnection(s).

The cooling support unit controls water flow for capped module testing and the hydrocarbon fluid flow for uncapped testing in a completely closed-loop environment. The interfacing logic between the test control computer and the test drive unit is supplied by the tester control unit. It is primarily an electronic package containing a microprocessor, analog-to-digital converters, pulse generators, and recursive test control logic.

The microprocessor is a register-based machine with separate control and main stores. It receives test data from the test control computer in a highly compressed format. Microprogram routines decode and execute these data by generating a sequence of subcommands that are the primary control functions for the tester hardware. Sensor-based communications interconnect the microprocessor with the test control computer, which is presently an IBM 4341 capable of multiplexing three LT1280s concurrently.

The control unit recursive logic allows recursive data formats to be utilized for specific types of test sequences. Its primary purpose is to reduce data volumes and thus enhance testing speeds. By means of recursive test buffers, this section of the tester makes it possible to apply 6000 sequential words of instruction repetitively up to 32 768 times. It provides a means for altering the instructions in 4000 of these 6000 words by the use of data mix registers. The use of this buffer alone increases the testing speed by a factor of eight. Furthermore, the fact that data for modification are transferred in parallel and used serially reduces storage requirements by a factor of fifteen, which saves fifteen positions in direct memory access for other uses.

The LT1280 pin electronics (see Fig. 8) is capable of three function-independent programmable dc conditions at each I/O pin of the TCM. These include 10-V programmable drivers, a programmable level detector reference voltage, and a product load selection capability.

With the knowledge that the test programs were going to be long and would require fast test execution times, the system was designed to be capable of rapid input changes and of executing selected functions (load/unload shift-registers, apply input stimuli, set-up expected conditions, sample errors at recursive rates, etc.). Each of eighteen programmable pulse generators can be electronically multiplexed to any of the LT1280 tester I/O drivers. These pulse generators are used for basic timing and sequencing of module clock

functions, LSSD controls, and any I/O control where sequential timing is desired. Because of the size and functional complexity of the LT1280, an extensive set of tester diagnostic programs has been implemented to allow efficient tester malfunction diagnosis.

## **TCM** testing

Even though the chips supplied for TCM fabrication have been thoroughly tested earlier, the TCM final test consists of a full stuck-fault logic and RAM test rather than a minimum-level fabrication test for detecting chip I/O and wiring defects. The requirement for a full fault module test necessitated a significant extension of the test generation methodology for very large logic structures. Manual generation was required initially to supply the test data for a dc stuck-fault test of the multi-chip memory arrays on the module.

A complete dc stuck-fault test for the average module requires approximately 2.5 min and 6000 tests to exercise every circuit on the module. Four types of test are executed, usually in order. These are pre-power-on, shift-register, RAM, and logic tests.

Pre-power-on tests check resistance, opens, and shorts at reduced bias voltages. This ensures that each I/O resistance is correct and that no hazard conditions exist to prevent bringing up full bias on the module. Shift-register tests are necessary to ensure proper shifting of 1/0 values throughout the TCM. RAM tests are a set of tests which, when expanded at the tester controller and applied at recursive speeds, perform a set of dc stuck-fault tests of all arrays on the module. Because of the special requirements of test generation, test time application, and automatic multiple-defect diagnosis (AMDD), RAM tests are handled separately from the logic tests. AMDD is fully described by Barry [3] and is briefly explained below.

Logic tests include a full stuck-fault test-pattern set, except for tests eliminated because of failures detected by previously applied tests. When one or more of the applied tests fails or the module response is different than expected, a diagnostic routine is automatically invoked to localize the failing circuitry. This LT1280 test system function is designated AMDD. AMDD is unique in its approach to minimizing rework cycles on the TCM by diagnosing as many defects as possible in a single test cycle with good accuracy and resolution. This process is automatically invoked by use of a software algorithm in the Test Control System (TCS) which is resident in the host IBM 4341 which drives the LT1280. Multiple defect diagnosis is achieved in four ways. First, the implications of the LSI structured-design guidelines are exploited. Second, logically independent defects are recognized. Third, a test data organization was adopted which divides the 40 000 circuits of a TCM into groups of logic



Figure 8 LT1280 pin electronics block diagram.

partitions of approximately 5000 circuits each (called TGPs). This facilitates defect circumvention and continued testing. Fourth, two single-point diagnostic probes are used for defect verification and diagnosis resolution.

Diagnosis for pre-power-on tests is provided by doing a straightforward interpretation of the test results. If a short to a power net is indicated, it is regarded as hazardous to continue testing, and the test is terminated. Otherwise, the failures encountered are classified into categories such as resistance problems on I/O nets, shorts between I/O nets and ground, or open I/O nets. For resistance problems, the appropriate diagnostics are called out and tests for the TGP(s) containing these nets are applied. For shorts and opens the identified nets are regarded as the cause of failure, and the TGP(s) associated with them may be excluded from further test application depending on the severity of the problem. If no hazardous conditions exist, the AMDD algorithm proceeds to apply tests that exercise the component parts allowed by LSI structured designs, namely, data, scan, and clock paths, and storage arrays.

Tests for the scan paths, called shift register (SR) tests, are applied first. These check for the abilities of the SR string to function as a data bus and to scan in and out of the SR, one bit at a time. Failures are diagnosed by probing and reapplying the SR tests to identify interchip physical net failure modes. Here, as in the logic and array tests, a clock path failure condition is checked by probing the clock paths before a physical net failure mode is deduced. This is accomplished with the application of specially provided clock tests (i.e., the clock TGP in logic tests) in which simulation has provided logic level responses rather than the clocking pulses used in the stuck-fault tests. Tests for storage arrays are fully described by Barry [3].

Tests for data paths are called logic stuck-fault tests; these tests are invoked after analyzing the results of the SR and RAM tests and after determining which TGP(s) can be applied, if any. The TGP(s) eligible are applied, and the first failing tests in each TGP are used for the final diagnosis.

The objectives of probing are to trace back from a failing latch or primary output and to test the nets along the failure propagation path until a terminal physical failing net is established using two rules. One, if the probe encounters EC pad contact problems, contact is attempted via an algorithm that incrementally jogs the probe around the target site within the EC pad until contact is made. Two, a net is probed only when its expected state is known. During the logic tests, all the relevant information about that net is collected to avoid reprobing the net.

The straightforward back-tracing procedure is dynamically enhanced by ordering the candidate nets to be probed at each stage of back tracing on the basis of observed logic and RAM test results and the fault simulation information. When the terminal physical failing net is established for all failing TGPs, the repair/rework processor is called to provide output messages to manufacturing personnel to replace the failing component(s), or to analyze the defective wire net(s). This encompasses functions such as verifying terminal nets that have switched logically but also failed certain tests, performing pairwise short tests using the two probes, or locating nets that are indeed shorted to other physical points (e.g., EC pads, adjacent chip pads, or miswired nets). The final results are displays on the analyzer's terminal and a data base copy for archival purposes. These are used to assist the analyzer in determining the repair or rework action or in calling for additional physical information, such as the rework history, for help in diagnosing or in making additional analysis decisions.

## Conclusions

Use of the LT1280 has made it possible to resolve the diagnosis of a failure within a TCM to a single defective chip, wire, or I/O. Only in the case of bussed logic and array nets

is a multiple repair action designated. This test system is in production in various IBM locations worldwide and it meets all production reliability, serviceability, and quality objectives.

## **Acknowledgments**

The authors acknowledge key technical contributions by A. Hermann, N. Mulvenna, M. S. Narasimha, and U. Suleman for TCM testing and AMDD; by A. F. Kopec for repair action conclusion; and by F. L. Villante for the LT1280 test system hardware strategy and documentation.

## References

- A. J. Blodgett and D. R. Barbour, "Thermal Conduction Module: A High-Performance Multilayer Ceramic Package," IBM J. Res. Develop. 26, 30-36 (1982).
- E. B. Eichelberger and T. W. Williams, "A Logic Design Structure for LSI Testability," Proceedings of the 14th Design Automation Conference, New Orleans, LA, June 1977, pp. 462-468.
- P. L. Barry, "Failure Diagnosis on the LT1280," IBM J. Res. Develop. 27, 41-49 (1983, this issue).

Received August 12, 1981; revised August 30, 1982

Roland L. Pierson IBM Data Systems Division, P.O. Box 950, Poughkeepsie, New York 12602. Mr. Pierson is a senior engineer and manager of the TCM test systems engineering project. His previous assignments since joining IBM in 1963 at East Fishkill, New York, include component test equipment engineering, product performance engineering, and work on the technical staff at System Products Division headquarters. Before joining IBM, he worked as a test equipment engineer at Raytheon Company, Boston, Massachusetts. Mr. Pierson received a B.S. in electrical engineering from Michigan State University, East Lansing, in 1960.

Thomas B. Williams

1BM Data Systems Division, P.O. Box
950, Poughkeepsie, New York 12602. Mr. Williams is a development engineer, responsible for manufacturing engineering test processes for the new TCMs. After joining IBM in 1965 at Endicott, New York, he worked as manufacturing electrical engineer for the printed circuit tester. Prior to joining IBM, he served as commander of the Radio Operator School at Ft. Knox, Kentucky, for the U.S. Army. Mr. Williams received his B.E. in electrical engineering from Youngstown University, Ohio, in 1963. Mr. Williams received a DSD Division Award in 1981 for his work on TCM array test development.