CALL-OUT Summaries

The Orbiting VLBI Tracking Station at NRAO Green Bank has implemented an automatic phone CALL-OUT system for problem resolution. This web page contains a dated summary of the CALL-OUTs in inverse time order (newest CALL-OUTs first.) The CALL-OUTs should have two responses, an immediate response to solve problems causing loss of data during passes and and long term solutions to avoid CALL-OUTs in the future.

The OVLBI station automatically runs a diagnostic test, called quickTest, to check for proper functioning of the system. This test is run 15 minutes before the scheduled start of each tracking pass and at 14 and 2 UTC each day. CALL-OUTs occur most frequently during quickTest.

2000 December 27 2h03m02; RTT RPC ERR

The "sitetime" computer by the equipment room masers crashed. This causes the OVLBI realtime system to loose data on the round trip phase monitoring.

Response: Galen received the CALL-OUT and checked the station using screens. No serious problems were found and he pushed the reset button on "sitetime". After a few minute interval, the computer rebooted and started responding to requests for phase monitor data.

Solution: The site time computer software seems to have some inherent design flaws that are being regular exposed. In the past, this system has been more reliable, but has more problems now, perhaps due to increased ethernet traffic. A software review is needed.

2000 November 27 2h04m02; TAP1 REC FI

QuickTest caused a CALL-OUT due to some sort of Tape Recorder 1 Firmware error.

Response: Galen check the station using screens, found no serious problems and reset the CALL-OUT system.

Solution: Recoder 1 monitor bits are indicating some sort of intermittent problem within the recorder. However, there was no recording pass scheduled, so no CALL-OUT was needed. The software has been changed to only WARN of a recoder firmware anomaly, unless data is being recorded. The VLBA code comments indicates that a firmware error may occur in rare circumstances when reading MCB data.

2000 October 21, 19h45m02; TAP1 VAC ER

Neither tape drive was mounted, because the next tracking pass was a Navigation pass. However the command file indicated that tapes should have been mounted.

Response: John Ford was called, who noted that no recording pass was scheduled. He just acknowledge the Zetron call out. Later quickTest caused the same call out. He also acknowledged this call out.

Solution: The real time system correctly detected that the tapes were not mounted when the command file indicated the tape drives should be mounted. There appears to be an error in "schedco", in that it should have added a "tapeDrive0.cmd" file after the end of the recording pass, to indicate that no tapes were needed.

2000 October 21, 8h32m03; OSC DET

The oscillation detector was triggered by an elevation oscillation during galaxy mapping early in the morning.

Response: The oscillation detector stopped antenna motion as designed. Operators found they could reproduce the problem for a short while. When the problem occured, the currents on all motors were behaving similarly, indicating that the problem is in the servo system, not in a individual broken tack.

Solution: After a few tests, the problem disappeared and could not be reproduced. It is thought that this problem is due to some fault in the design of the servo system that can most easily be solved when the entire servo system is upgraded. No further action will be taken until that point.

2000 October 06, 15h02m02; RTT RMS 10

High RMS was detected in the 10 MHz input level to the Lo Receiver module.

Response: The call out was acknowledged, but no action was taken. The High RMS level cleared after a while. A few days latter, the new GPS 10 MHz reference was used to measure the 10 MHz in the trailer. The 10 MHz level appeared to be OK. However the anomaly was not present at that time.

Solution: The input levels to the LO Receiver module should be carefully measured. If the 10 MHz level is dropping, action should be taken.

2000 September 25, 16h07; !SYNK and SYNK NOLOCK

Resolution of this error took two days (three passes), although the problem turned out to be simple.

Response: At first, problems with the REF LO Receiver, then Synthesizer were suspected. The module was pulled, tested in the Lab and found OK. Returning the module to the system continued to show the problem. Cycling the -5.2V power supply in the vertex rack cured the problem.

Solution: We failed to notice the problem on the SCREENS, and apparently few other modules use the -5.2V. A new Monchk function monchkPower was added to raise an anomaly due to this problem. Also the screen POWERS will be changed to flag currents out of range.

2000 September 21, 15h33m02; RTM PRC ERR

Could not monitor the round trip between the Maser and station. Also could not confirm proper functioning of the Maser. No tracking data were lost due to this problem.

Response: We checked the sitetime computer and found it was not responding to telnet and ping from other station computers. Pushing the red button on the sitetime VxWorks board near the Maser cures this problem.

Solution: The solution is not determined, but probably involves some sort of watch dog timer on the site time computer.

2000 September 13, 10h42; FMT HEADERR

Since this CALLOUT is continuing to occur, it might be a real problem.

Response: Rebooted the Formatter and ran quickTest.cmd to reset all hardware and test the system.

Solution: George Peck, in Socorro, suggests we test the power supply voltages which are used by the data FIFO chips. In the past, if the voltages drifted too low, the FIFO errors occurred.

2000 September 10, 17h50; FMT HEADERR

See 2000 Sept 9.

2000 September 9, 22h40; FMT HEADERR

Error call out occurred, probably due to residual problems with cycling the power to the Formatter and S2 systems. These systems were turned off to reduce the heat load in the Tape Room, when the cooling failed.

Response: Galen used the SCREENs to RESPOND to the call out. This only solves the problem until MONCHK is restarted, usually at the next quickTest.

Solution: The Formatter and S2 system were not properly reset after the power was restored. The problem was solved by using the front panel reset buttons were on the VME racks containing the computers in both systems. After running quickTest to reset all hardware, the system ran normally.

2000 September 1, 4h06m; ACU EL UP

Emergency call out occurred, probably due to a glitch in the Elevation encoder readout. No data was lost, as this occurred at the end of the tracking pass.

Response: This anomaly was handled by the operator manually commanding the antenna back in range and then restarting the command file.

Solution: This is a hardware problem with the antenna encoders that will be solved when the Inductosyn system is replaced with the new optical encoders. Since the problem is rare, the solution will not be implemented until 2002. (A bug was exposed in the CALLOUT task, it continually calls out in the case of an Emergency, the task should only call out once.)

2000 August 30, 5h17m; DECS XILINX

The call out was raised by the DECS XILINX anomaly. The station was idle (the pass had ended at 4h37m). No data were lost.

Response: Galen came in and ran the command file restart.cmd and the anomaly cleared. Galen also used the DECODER INIT SCREEN to STOP the DECODER. In the morning, Glen found that the DECODER had stopped and was not counting the 100 MHz ticks. This was because Galen had stopped the DECODER. Galen had to update the command file after running restart.cmd.

Solution: Galen plans to swap the DECODER boards and put the one exhibiting the problems on the bench. It will be set up with the test fixture and left until the problem re-occurs (maybe a few weeks.) At that point the interrupt state will be examined. If the problem proves impossible to reproduce, Glen could make a modification CALLOUT task to reset the DECODER upon a DECODER CALLOUT.

The problem could have been cleared without change to the command file by using the DECODER MODE screen. Pushing the MODE = 64 button, would have completely reset the DECODER. This would have eliminated the need for changing the command file. If this problem occurs during a tracking pass, the best action is to use the DECODER MODE screen to reset the mode to 64 MHz.

glangsto@nrao.edu tminter@nrao.edu gwatts@nrao.edu ovlbiops@nrao.edu
2000 Sept 27