MEMORANDUM Aug. 16, 1998
TO: Distribution
FROM: R. Lacasse
SUBJ: GBT Spectrometer Temperature Tests
Introduction
There is a well known relationship between chip operating temperature and chip reliability. Reliability increases as operating temperature drops. For a chip of the complexity of the Canaris correlator chip, every 10oC drop in operating temperature improves the device's reliability by 60%(1). Thus there is great motivation to minimize chip operating temperature, within reason, early in the life cycle of the spectrometer.
To this end, several tests were performed to evaluate potential modifications to the as-delivered Spectrometer. This memo summarizes these tests. Detailed data can be found in my lab notebook "GBT Spectrometer II", pages 19 to 56.
Modification of Airflow System
Several modifications of the airflow system were evaluated by measuring their effect on the digital rack temperature, as monitored by the sensor atop the card stack. (This sensor is an integral part of the correlator.) Most of these are listed in the table below along with their effect on the rack operating temperature.
| Effect of Airflow Modifications | |
| Modification | Delta Rack Temperature (oC) |
| As delivered | - |
| Delivered fans (Comair Rotron Caravel CL2T2) removed | +1.8 |
| Delivered fans replaced with similar capacity squirrel cage
blower (McClean 2EB512D) |
+0.3 |
| Remove VME chassis covers | -0.3 |
| Remove VME chassis fans | +0.2 |
| Add various deflectors to channel airflow to critical areas | 0.0 |
| Add two fans on top of rack, same type as delivered | -0.6 |
Modification of Heat Sink Configuration
The temperature of the Canaris chip can be estimated by measuring the temperature of the back of the chip package with a thermistor fastened to the chip with conductive epoxy. In Burgess's tests1 the temperature of the back of the chip package was about 10oC cooler than the chip itself. Two types of thermistors were obtained for these tests: YSI 44008 (0.2oC interchangeability) and YSI 44032 (0.1oC interchangeability). Both are small enough that the chip can still be plugged into its socket with the thermistor mounted to it. It was found that there is a considerable gradient across the back of the chip package. A thermistor displaced 0.3" from the center of the package measured an operating temperature 6oC cooler than on displaced by 0.1".
Another interesting side-light to these tests was the measurement of the effect of the data test pattern on the chip temperature. Most testing was performed using a pseudo-random data pattern which caused the data inputs at the correlator chips to have a data rate 100Mbps/16; this was done for convenience and consistency. When tested with a normal IF run through a sampler, the data rate is, of course, 100 Mbps. With the pseudo-random data pattern, power dissipation in the rack was 20% less than with the "normal" IF. Also, temperature rise of the chip package above local ambient was about 40% less. The rise of the chip package above rack inlet air temperature was about 30% less, and the rise of the air temperature at the air stack outlet was about 20% less.
The data presented below were taken with thermistors glued to the backs of correlator chips located in quadrant 0, correlator card 1, chip locations 8C, 8D, and 8E. These are at the top of the air stack and thus should represent the hottest chips in the system.
| Effect of Heat Sinks on Chip Temperature with Pseudo-Random Test Data (test t1) | ||||
| Heat Sink
Type
(Thermalloy Part #) |
Chip Package
Temperature
(OC) |
Air Stack
Temperature
(Top of stack) (OC) |
Chip Temp. Rise
Above Local Ambient (OC) |
Improvement
(OC) |
| 2287B (as delivered) | 60.9 | 24.3 | 36.6 | - |
| 2 x 2287B
(stack of 2) |
57.8 | 25.7 | 32.1 | 4.5 |
| 2518B | 50.2 | 25.2 | 25 | 11.6 |
Chips that Are Behind Brackets
There was some concern that those chips on the leeward side of support brackets would run very hot because of airflow obstructions. To test this, a chip with a thermistor mounted to it was operated both in location 8C and 2F, location 2F being lower on the card, but with an air obstruction. In location 8C the chip package temperature was 56oC and in location 2F it was 48oC. Thus the chips behind the obstructions run cooler than those at the top of the stack. Although the data is not totally consistent with the expected gradient in air temperature in the air stack.
Fan Capacity
The airflow out of the rack was estimated using a linear airflow meter. The average linear velocity was measured over both rack air outlets and multiplied by the outlet's area to derive cfm. The total airflow (both outlets combined) was measured as 884 cfm. The total fan capacity is 1100 cfm. We are getting, then, 80% of the rated capacity of the fans. This appears to be a reasonable number.
Improvement of Thermal Coupling
An attempt to improve the thermal coupling between the package and the heat sink was made. This was done by filling the volume between the chip and the first ring of the heat sink with thermal epoxy. This improvement, if any, was less than a few tenths of a degree.
Discussion
In general, modifications to the airflow system were unfruitful. There is one exception: the modifications to the VME chassis. These chassis modifications simplify the maintenance of the system by making it unnecessary to replace fans in the VME chassis (this would require the removal of the entire chassis.)
The addition of heat-sinks on top of the existing heat sinks is of value. Consider the addition of the 2518B heat sink. It reduces the chip package temperature to 50.2 oC with the pseudo-random data input. With a normal IF input the package temperature is ten degrees warmer (~60.2) and the chip itself (from Burgess1) is about 10 degrees warmer still (70.2). With quadrants 3 and 4 installed, the chip temperature would rise about 10 additional degrees to 80 degrees, due to the rise in ambient of this amount. If all the devices were at 80 degrees, the failure rate1 for the 256 chips in the system would be 1.74 failures/year, or 35 devices over 20 years. Without the additional heat sink, the chips would run at about 90 oC, and the failure rate would be 2.74 devices per year or 55 devices over 20 years. Note that the devices at the bottom of the stack will run about 20 oC cooler, so their failure rate would be 1.08 failures/year (22 devices/20 years) without heat sinks or 0.65 failures/year (13 devices/20 years) with heat sinks. The failure rate for the system will fall somewhere in between the failure rate of the top and bottom chips.
Another option is to reduce the temperature of the air at the rack inlet. This can be done by cooling the plenum air before it enters the rack, or by running the entire plenum at a lower temperature. Presently the inlet air temperature is 16oC. This could be realistically be decreased to 10oC, with 33% improvement in reliability.
1. Burgess, Tom, "Quaint Chip Thermal Measurements", ACSIS Note 300, DRAO, 1997.