|
As a student body, we feel that the current implementation of CWRU’s campus data network, CWRUnet lacks one key feature: Reliability. Throughout the years, network expansion, infrastructure changes, and deteriorating hardware have weakened the efficacy of CWRUnet. An unreliable network, no matter how fast or high-tech it is, is completely ineffective if it cannot be depended upon. Therefore, this document aims to establish a "Reliability Initiative" for CWRUnet. A "Reliability Initiative" means that when planning for the 1998-1999 academic year, only changes that improve the reliability of CWRUnet should be approved. Unfortunately, due to the size and complexity of CWRUnet, achieving "reliability" is a rather vague and difficult goal to attain. Therefore, it is necessary to attempt to clearly define which areas of CWRUnet are unreliable, and furthermore provide alternatives to the present implementation. Achieving CWRUnet Reliability involves addressing two main problems: the reliability of the actual network itself, and also the reliability of the services (servers, content, etc.) that run on top of the network. Furthermore, the only way to achieve the kind of reliability that students expect from CWRUnet is to engineer its infrastructure and services to be reliable. A well designed network will be able to handle every kind of traffic imaginable, be easily scalable, and provide robust performance – all without being a constant drain on the engineering staff. Reliability of InfrastructureThe CWRUnet infrastructure represents the core of CWRU’s Information Services policy. Unfortunately, the results of unplanned expansion and a difficult transition to newer ATM technology have taken their toll, and the CWRUnet of today lacks many important and necessary features. The core values of CWRUnet are its ability to seamlessly transfer large amounts of data quickly, with low latency, and reliably. Unfortunately, as the transition between Ethernet and ATM LANE progresses, instabilities are causing CWRUnet's core values to be violated. Packet loss and latency are all too common on CWRUnet circa 1998, and as the network (and particularly the ATM portion of it) continues to expand, this is a trend that is sure to continue, unless a different architectural approach is taken. The use of Multiprotocol Routers to migrate CWRUnet from its flat, bridged approach to a hierarchical, routed approach would afford a myriad of advantages. For example, Multiprotocol Routers would allow CWRUnet to be transparently separated, increasing reliability in several ways. Firstly, the segmentation of CWRUnet would make it impossible for one computer to bring the entire network down. Outages would be limited to certain segments of CWRUnet, but the rest of the network would still function. Secondly, Multiprotocol Routers would increase security on CWRUnet, by shielding important servers from certain attacks and making activities such as packet snooping more difficult. Thirdly, Multiprotocol Routers would reduce the broadcast traffic on CWRUnet, thus lessening the load on the ATM LANE BUS servers. And fourthly, Multiprotocol Routers would afford CNS more control over CWRUnet, which is always inherently more stable. However, Multiprotocol Routers are not without their own disadvantages. These disadvantages are largely due to the added complexity that routing affords, as well as in the necessary planning that would need to take place to successfully add such devices to the network. For instance, CWRUnet's IP address space would have to be carefully subnetted, so that proper IP routing could occur, and so that IP address waste is minimized. In the "old days", this increase of complexity used to make client configurations overly difficult, and limited the scope of machine-mobility on the network. However, with the use of Dynamic Host Control Protocol (DHCP) technology, these issues can largely be circumvented. CNS already operates a DHCP server on CWRUnet, demonstrating the possibility of this technology. In addition, Multiprotocol Routers generally add more administrative overhead (as compared to ideal bridged networks), and as a result most CNS engineers seem to be resistant to such a change. In summary, Multiprotocol Routers have many inherent advantages and disadvantages, but clearly they are necessary in order to create the stable, robust, and fully scalable network that CWRUnet needs to be. Another strategic change would be the removal of ATM LANE, using instead Classical IP over ATM (CLIP, as detailed in RFC 1577). The basic idea behind CLIP involves doing raw IP-over-ATM. This technology is advantageous because it is much faster than LANE (no emulation necessary). It is also a less complex (i.e. easier to troubleshoot), and more proven technology, which means that CLIP is more reliable. The chief disadvantage is that only IP services can be used in this environment. Thus, this solution would only be feasible if all of the services that CWRUnet offers were migrated to IP. Additionally, CLIP requires the use of IP Routers in order to provide legacy (Ethernet) connectivity. This fact has its own aforementioned advantages and disadvantages, but can largely be seen as a disadvantage because it would require vast amounts of new equipment to implement properly. Also, it should be noted that CLIP requires the use of ATM ARP Servers, which must be handled just like any other server on CWRUnet. In summary, CLIP is a somewhat more radical approach to attaining CWRUnet reliability, since it requires all services to be IP-only, as well as the addition of IP routers to handle ATM Û Ethernet traffic. However, this solution offers the greatest gains for it will not only achieve reliability but also a significant performance boost. Reliability of ServicesThe reliability of the services that CWRUnet offers must also be paramount, because the services are what the students, faculty, and staff need in order to do their jobs. In truth, most people do not care if CWRUnet is bridged or routed, but rather if they can send electronic mail when they need to. Consequently, it makes sense for CNS to differentiate between their core and supplementary services, and then provide two different tiers of service accordingly. The core services that CWRUnet needs to provide in order to form the Electronic Learning Environment are e-mail, World Wide Web, threaded discussion (newsgroup), and network printing. Of all of the core services, e-mail is probably the most utilized, and most vital. As a result, it’s terribly unfortunate that e-mail reliability has been problematic as of late. Yet, it seems that given the proper server-side infrastructure, "100% e-mail uptime" could be achieved. For example, disk mirroring could be used on the individual servers, so that in the likely-event of a disk failure, no one would lose access to their mail. Under this scenario, faulty drives can be swapped out at the engineer’s convenience. Thus, the system could run until the next regularly scheduled maintenance window, at which point the repair could easily (and safely) be made. Furthermore, fully redundant servers could be used, and with clustering – any server failure would automatically be covered by the redundant backup. These types of solutions, albeit expensive, demonstrate the seriousness with which CWRUnet’s core services should be treated. The use of threaded discussions, such as those provided by a NNTP server, should be a common component of most classes by now. Yet, the continued adoption of such a service seems to be impeded by the lack of reliability of CWRUnet’s current NNTP-based delivery system. The current solution for providing Usenet news relies on two separate servers, one handling the external newsfeed, the other handling campus client access. Theoretically, this system should work perfectly, but in practice the servers often become out of sync. This makes it difficult to follow threads, and adds unnecessary delays to new postings. When this sort of behavior occurs, people are less inclined to use this service, because it does not meet their expectations. In order to fix this service, two different approaches could be used. The first would be to simply remove NNTP out of the equation, and move to some other form of threaded discussion technology. Alternatively, if it is decided to keep the current Usenet-architecture, then it would be advisable to split the division of labor between one (highly reliable) core server that maintains cwru.* newsgroups, and another (less reliable) server that provides access to all other newsgroups. A lot of the supposed "reliability" problems concerning the network printers aren’t actually hardware problems, but typically represent end-user confusion. For example, when an end-user sends a postscript file to a Wade printer, and is subsequently told that it "didn’t work", the blame is often placed on the network printers, when in actuality, their postscript file was malformed. Therefore, it seems that the best way to improve the "reliability" of the network printers is to enhance the end-user experience. This could be accomplished in several ways: provide direct printing from applications (like the Macintosh computers can), and by providing helpful feedback. The latter would inform the end-user when their document actually printed, and if it didn’t, whether-or-not it was a printservices problem (server/printer failure), or a local problem (malformed postscript). Finally, the actual servers that provide network printing need to be given the core-level of importance, and thus engineered to provide high-uptime. For example, the "apricot" server, who affords printing access for Macintoshes, has become somewhat unreliable as of late, and a more robust solution should be considered. ConclusionAny one of these solutions can elevate CWRUnet closer to its rightful standard of excellence, bringing it more in line with its original goals and purpose. Of course, it is not up to the students to decide policy: Information Services must examine the way that these ideas might integrate into the current CWRUnet framework. From an end-user perspective, it is more desirable for resources to be allocated towards refining the current network infrastructure, as opposed to building solely for the future. In fact, building a strong foundation today is essential to building the future that Information Services envisions. Thus it is the fervent hope of the student body that this reliability initiative be given a most thorough and thoughtful inspection. |