DECISION TO CHANGE OPERATING SYSTEM OF THE CONTROL COMPUTER L. D'Addario and D. Varney 14 May 1992 Several things have happened that now cause us to re-examine our choice of the Venix operating system for the Earth Station control computer (see OVLBI-ES Memo No. 13). First, our own tests have shown that in one crucial sense Venix does not meet its performance specifications. Second, the difficulty in porting the VLBA control system to Venix has been much greater than we had expected. The VLBA code uses methods of memory management and task control that are very different from those of AT&T System V UNIX, on which Venix is based. Third, the time remaining to complete the Earth Station within the planned funding is growing short. The following paragraphs consider each of these points. Tests have shown that the interrupt latency of Venix is sometimes far longer than specified, and it may not be deterministic. It is fundamental to a real time operating system that the interrupt response time have a well-defined maximum. VenturCom documentation ("Venix Performance Metrics," version 3.2.2, March 1990) claims an average latency of 31 microsec and a maximum of 116 microsec on an 80386/25 processor. We measure 21 microsec average on an 80486/33, which is consistent with VenturCom's result, but we occasionally see latencies as large as 2500 microsec, more than 25 times the expected maximum. (We will write up the details of our tests separately.) We presented such data to VenturCom in February 1992, and in May 1992 they finally responded by saying that they do not have the manpower necessary to investigate and possibly fix the problem, in spite of our offer to assist them in duplicating our measurements at their facility. Apparently no other user has complained about this. It is clear that they are a small company that is unable to provide the level of support we require. An analysis of our needs (memo from D'Addario to Varney, 5 May 1992) shows that occasional latencies of 2500 microsec might be tolerable, but there is no guarantee that still larger latencies will not occur in rare instances, and the result sheds doubt on all of the Venix performance specifications. Our second problem with Venix is the unexpectedly large difficulty of porting the VLBA code. Even those large sections that have been ported to SunOS (based on BSD UNIX) by Ron Heald (mainly the "screens" infrastructure) do not run correctly on the AT&T System V UNIX underlying Venix. After working on this for many weeks and correcting various incompatibilities, still more corrections are needed. In addition, there are large sections of the VLBA code that we want to use and whose porting problems are unknown. One basic difficulty is that Unix and vxWorks handle memory in rather opposite ways. In vxWorks, nearly everything is global and special effort must be made to cause two tasks executing the same code to keep their variables separate. In Unix, all processes have separate data segments and special effort must be made to cause variables to be shared. The Unix shared memory IPC can handle this in all cases so far studied, but we have not yet investigated all of the VLBA code. Other problems include the use of semaphores, which is different in vxWorks and UNIX; and methods of task/process control, including starting up and stopping of tasks, which are also quite different in the two operating systems. The porting job is made particularly difficult by the lack of an adequate source-level debugger; we have been using the UNIX tool "sdb," which is very limited. In view of our experience over the last few months, the risk that this effort will consume more than the available manpower is too great to ignore. The final problem we have is not exactly technical, but still crucial. The funding profile for the Earth Station project is such that we must finish near the end of 1993 in order to remain on budget. Some finish-up work during 1994 is possible, but we should then be making a transition from construction to operation. Therefore, the control software must be essentially complete (although probably not in final form) by January 1994. At present, one of us (DV) is the only full-time person involved in the software effort. We plan to add another person to the staff for the last 18 months of the project, but an appropriate candidate has not yet been found. Although there are other real time OS's that might be considered, the only practical alternative for us is vxWorks. The porting of VLBA code will be greatly simplified (although not eliminated, since many of our requirements are different and we may have some hardware configuration differences). Furthermore, several things have changed since we chose Venix (and rejected vxWorks) in June 1991. At that time, there was no one in Green Bank who was knowledgeable about vxWorks; now several people are learning it for use on the GBT. Also, version 5.0 had just been released and it was causing problems for many users; now version 5.0 has proved successful and the NRAO groups using it report few difficulties. These considerations need to be balanced against the fact that none of us in the OVLBI-ES group has any experience with vxWorks, so there will be a delay in progress while we learn about it. There is also the fact that the cost of vxWorks is much higher, including the development license, the target hardware, and the need for a development workstation (perhaps $25k total); on top of this, most of the expenditures on Venix (about $11k in hardware and software) will not be recoverable. All things considered, we have come to the conclusion that a switch to vxWorks is now required. Looking three to six months ahead, we expect to be much farther along in the project if we switch to vxWorks now than if we stay with Venix. It is possible to handle the higher cost from our contingency budget, although the latter is starting to get somewhat low. This part of the project is sufficiently important and the time is sufficiently late that this expense must be accepted.