Topics in Advanced Network Management: Results from a workshop of R&E network architects ========================================================================================= _November 2005_ *1. Introduction* Network architecture, security, and management in higher-ed have lacked a collective forum to convene individuals from multiple institutions and varying strata in the managerial structure. The creation of a discussion body that meets occasionally to cover selected themes in institutional networking begins to fill this need. A special session was held following the Joint Techs Workshop in Vancouver as a stepping stone towards the establishment of such a body. Over the past twenty years, the course of the evolution of physical networks on campuses and the number and types of devices and services using them has left significant integration and legacy issues. The workshop revealed themes in this progression which tended to be common to many schools, laying the foundation for future conversation. The following document represents some tentative conclusions and disagreements worked during that workshop. The workshop was structured into three sessions: · physical convergence and logical networking: physical convergence of media/protocol and logical network issues, including the interplay of the two forces. Topics include architectural considerations, security, economic and policy issues, and IPv6 and addressing dimensions. · network authentication: Topics include wireless authentication, including roaming between institutions, and wired. An interesting case study from Indiana is also discussed. · packet disruption: middleboxes that affect end-end packet flow, often to effect policy. Topics include basic issues, sidecar selection mechanisms, intrusion detection approaches, load balancing, minimum security standards and firewalls. There are clear dependencies amongst these topics and during the course of the workshop earlier points were revisited. For the purpose of this narrative, the discussions have been merged into a somewhat logical flow. In general, information is presented without attribution or institutional association. Individual names or schools are attached to comments that were particularly astute or particularly "distinctive". This document concludes with some action items that have not been acted on yet. *2. Physical Convergence and Logical Networking* *Basics* Two independent and powerful forces shape much of enterprise network design today. The end goal of logical (or virtual) networking is to create closed communication systems over an arbitrary physical network that touch multiple selective hosts in disparate locations to provision particular isolated services with reliability and scalability. Physical convergence works at a different level, seeking to support an arbitrary set of diverse protocols and service types and needs across a single physical layer. Voice, video, and data are the services pressing this convergence to date, with control systems such as alarms, door locks, and others looming on the horizon. At times, services have come online on common infrastructure in buildings even unannounced -- one participant described an ambush deployment of security cameras on the network. These different services typically come with different expectations for uptime and reliability from each other and place new demands on a converged infrastructure. Some combination of the two architectural themes of logical networking and physical convergence occurs in many deployment scenarios. Decisions have to be made based on the specific needs of a given service and the networks available. Policy and legal ramifications inevitably enter the picture along with classic difficulties in acquiring full co-operation from other campus entities. *Architectural Considerations * Delivering a normal touch and feel to service for applications accustomed to dedicated networks or protocols other than IP with convergent networks may be easier with advanced routing protocols such as MPLS. The "fish problem", frequently encountered at gigapops, wherein the shortest path is not necessarily the best path, can be alleviated to some degree with the deployment of MPLS. Multi-exit routing at the campus level can also be supported more effectively. Terry saw two major strategies emerging from the discussion: multiple VLANs projected over one single routed network, or a set of services isolated by VLANs within the layer 2 building infrastructure, but trunked from building to building at level 2.5 or 3, essentially tunnelling together VLANs that provide equivalent connectivity semantics, but without trying to extend broadcast domains across routed interfaces. In the end, this is an engineering task: there are no right answers, but there are series of answers that are consistent with each other. Local conditions have significant impact on the overall architecture. Redundancy also generally is easier the higher in the protocol stack it's provided. Another primary concern is manageability, but there was little consensus about which sort of network structure lent itself most readily to manageability beyond ensuring an infrastructure as homogenous as possible. Ensuring that people connecting devices to the network(s) use the proper ports is an ongoing struggle. A common need of workshop attendees was better tools for managing information about subnets and VLANs at the wall jack level. Legal issues also arise rapidly surrounding the necessary HIPAA and FERPA protection of information. The requirements for network architecture to support this legislation are not fully understood and legal interpretations can vary widely from campus to campus. Some legal departments have decided entirely separate physical networks are required for compliance. There were two cases in point that described themselves in greater detail, Duke and Berkeley. Duke's end goal is to build a single, strong physical infrastructure running networks that touch multiple, selective hosts in disparate locations using firewalling and VLANs as a logical overlay. Someone mentioned that this approach had led in their instance to a maintenance network running over 250 firewalls alone over this unified physical backplane. Berkeley makes use of both VLANs and separate physical infrastructure as needed. Cliff made the point that this generally leads to the simplest possible implementation, and troubleshooting is an easier process with multiple physical networks. However, as Terry noted, this could be limit the ability to utilize spanning trees and other redundancy approaches (not that Terry has ever advocated spanning tree for anything). Physical system approaches varied amongst schools with devices like networked digital door locks: some default in situations of power or network failure to open with security patrols, and others default to locked. Failsafes will need to be devised for physical and other more critical systems using campus network infrastructure, and highlight the need for availability. The growing use of externally provided networks on campus is an interesting twist; most pressing today is the widespread availability of cellular services and the extension of these services to data. The general consensus was that not owning a network(e.g. cellular) was not an issue in integrating with campus management. There's a need to address these externally provided networks anyway, given the desire to integrate with them to some extent. Some examples cited are roaming from WiFi networking to cellular networking, or number portability between cellular, campus, and blackberry voice services. The growing set of service requirements out of institutional control limits the ability to make classic network choices, and Mark thought running IP services over cellular to be an unwise idea. *Security* One of the most important aspects of the convergence discussion was the extent to which sufficient security can be provided on networks serving such different devices and needs. Isolation of traffic may be necessary at layer 1, 2, 2.5, and 3 for various application needs. There are many secure protocols and deployment techniques such as IPSec that have been developed to deal with attacks such as MITM, spoofing, and wiretapping, but there are some attacks for which there is no real defense. Without better approaches to security, convergence may become intractable. One participant described a large struggle with facilities, who had decided to spend $100,000 pulling fiber right next to existing available light paths for new security cameras since the digital ones on the main network had already been hacked before. Firewalling techniques can insulate these networks to some degree from the external world, limiting this exposure. There was a suggestion that IPv6 could add some limited additional security, or use of 10.x.x.x networks and NAT as an alternative. On the other hand, firewalls can also serve to defeat some of the benefits of advanced routing, limiting multipathing's capabilities, port agile applications, etc. When computers are on the same physical layer network, it is virtually impossible to insulate a distributed system on it from DDoS attacks. Protecting nodes from these attacks using logical networking alone is essentially impossible. There is also always danger of poorly implemented software containing vulnerabilities which would be much more easily exploited on a common physical TCP/IP-based network. *Economics* There was a division of opinion over the best way to partition these services particularly when limited funding is available. Some in attendance felt that throwing the entirety of the budget at a single network was the wisest policy: the more money available, the more reliable, powerful, and scalable it can be made. Others were concerned that many of the security and uptime limitations of fiber-optic IP networks were innate and building separate networks; Cliff suggested it was reasonable to use up to even 5 or 6 physical networks affordably, and that these networks could be built with fewer components while maintaining reliability. In many instances there's already legacy copper wiring in the buildings that will eventually be useful for networked systems for critical functions that require little bandwidth. Using redeployed copper for physical networks for systems such as alarm and door systems or energy management which have limited requirements but require insulation from security threats is one example of making use of these facilities. As mentioned above, common understanding of what degree of network-layer insulation is required by FERPA, HIPAA, and other relevant legislation, and a consistent vision of the true needs of standard application types is lacking. A general assessment of guidelines would benefit the process of convergence and general network design that should be provided for any given deployment. Care must be taken to ensure that any "HIPAA network certification" doesn't result in overly onerous processes; one campus reported that the credit card industry checklist burned up three quarters of network staff time for an entire month and a half. *IPv6 and Addressing Space* The continued slow pace of deployment of IPv6 is an ongoing concern to some in the community (but not others, who are in no hurry). One of the most pressing reasons for its development, the lack of address space in the v4 space, has been greatly mitigated by the advent of NATs and private networks which limit the need for publicly accessible addresses. The other benefits of IPv6 have not proven compelling thus far in most instances. One school purchased a /32 address space "because [they] thought it was cool," (but noted they had also bought namespace related to X.500.) The purchase came with the stipulation that the namespace be actively used or it would be returned to the granting company, presumably to limit speculative activity. Only a small fraction of the machines at their site have IPv6 designations at this point. Stanford mentioned that they broke IPv6 support during a backbone upgrade and were surprised to receive a complaint from someone on campus. There have been isolated requests from other quarters for IPv6, such as someone running INN, but no serious motivation. The only request the University of British Columbia had ever received was for hosting this very (Joint Techs) conference. Even the people present considered IPv6 a playground that isn't well-understood. There was great concern that insufficient attention had been given to re-engineering the security of IP when IPv6 was being developed. There are many security considerations which are different from IPv4, and less known, and existing security constructs such as IP-address based ACLs will break. Deployment of separate first-hop gateways for IPv6 was suggested as an intermediate step to ease the pain on routers. One school with dual IPv4 and IPv6 support at the backbone had a router that melted regularly due to IPv6 traffic, which for some reason seems to have much greater port scanning activity or processing load. The alternative -- and by far more prevalent -- solution to address space needs is the use of NAT technology. NATs have well-known security and accessibility implications, but especially with the decentralized nature of network roll-out on campuses, they tend to be a factor to consider regardless of the rest of the architecture. One difference between NAT and a stateful firewall is that NAT boxes block incoming connections by default, although stateful firewalls can be configured to do the same. Another difference is that one-to-many NAT devices can preserve public address space by allowing many hosts with private RFC-1918 addresses to share a single public address for external connections. The University of Washington routes some private address space on campus, and provides NAT-mediated access to the Internet from those private addresses. Departments can "opt in" so that DHCP returns private addresses, and static public addresses can be mixed on the same subnet. Some units have said this is the best networking service the central IT organization has ever provided. One of the most likely use cases identified in the near term for the deployment of IPv6 is VoIP, where there is a serious need for both the expanded addressing space to support inbound calls and enhanced mobility. However, a lack of vendor support by companies providing end devices has been another serious barrier to development of IPv6 use in this and other applications. *3. Network Authentication* Network authentication can apply in several key situations: authenticating a user for network access; authenticating a device for network access; and authenticating the state of a device. These issues are often coupled, and can include authorization issues as well as authentication. This session at the workshop dealt primarily with wireless network access issues, including authentication, access point placements, VoIP, and advanced wireless technologies. The session concluded with a case study from Indiana. *Wireless and Authentication* The wireless space continues to witness continuous, rapid development of protocols and standards to support the unique needs of that environment. At the same time, cohesive, secure production deployments are critical to campus infrastructures. The challenges posed by this confluence were discussed at length by the group. Authentication in wireless is one of the least consistent pieces, with multiple encryption and standards sometimes performing this duty at the link layer. More flexible solutions such as 802.1x and web-based sign-on living higher in the protocol stack may work alone or in concert with the WPAs and WEPs of the world. The most common approaches identified are MAC-based authentication, middle-box authentication (typically via a captive web portal with an authentication back-end module), 802.1x, and the authentication-less alternative, "free love." Greg observed at the end of the discussion that everything said was also applicable to the wired networks, but the omnipresent nature of wireless makes it a greater concern. Only Penn State among those present had a fully deployed and functional 802.1x authentication system, using a Cisco AP; many older APs do not support AES or 802.1x at all and will need to be replaced. This is coupled with WPA-TKIP encryption and backended to a FreeRADIUS server. The Penn State wireless deployment is moving to use EAP-TTLS/PAP to authenticate against the university-wide krb5 realm; this is currently deployed in two building at University Park Campus and will be deployed in future wireless installations. Support is built into Mac OS X systems, but Windows users have had to install the open-source SecureW2 client, while Linux support is variable by edition, card, and patches. Pocket PC clients can use SecureW2 as well, but there is currently no access mechanism for Palms. There has been no sharp cutover due to the end-user transition challenges. However, the user experience has been more positive than the standard captive portal approach, and the encryption is stronger. There is unfortunately no EAP that talks Kerberos at this point in time that anyone was aware of, nor any active projects to do so. The group concurred that this would be a long-term and challenging project since the protocol itself would likely have to be extended first with implementation following. Other schools are actively examining 802.1x, with the University of Michigan and Carnegie Mellon both planning on using some form of certificates issued to individual computers. This authentication supports the issuance of access to devices, while the authentication of members and guests as users for services are handled separately. The University of British Columbia has test deployments in place as well. When there is no authentication performed, which isn't entirely uncommon, there are still concerns about information leakage from hosts using the open wireless network sniffing each other's traffic. Encryption without authentication is routinely used in these circumstances. "Lots of us have fat ACLs for a flat wireless space." This problem has resulted from a variety of limitations, including broadcast storm issues and constructs such as local wireless subnets to address limitations in old hardware. Participants were anxious to have products that support layer 3 tunneling ("mobile IP") which along with more capable devices would collectively allow for functional mobility while maintaining a routed network. One of the biggest challenges to providing roaming service and mobile IP is the handoff of users from one access point to another, which must be done in under 50 milliseconds to support glitch-free VoIP calls. Debugging the automatic tunnels everywhere would be virtually impossible. Access point vendors have finally hired knowledge from the cellular community to try to provide this functionality soon. *VoIP over Wi-Fi* VoIP over Wi-Fi is another voice application which may be of interest to campus users. It might only be possible to do in a scalable fashion over 802.11a and not b/g. The University of British Columbia has handed out some officially unsupported Wi-Fi phones as an experiment in functionality, and others schools are trying to use these phones over 802.11b networks. RIM recently came out with an 802.11b Blackberry device which is SIP VoIP capable. There are many issues yet to be resolved for voice services over wireless, including QoS, prioritization and fair/unfair channel access. UC Berkeley's representatives said that VoIP should be forgotten about entirely: "You want voice, use a cell phone." The 3G mobile communication devices in particular represent the evolution of the cellular phone to more general mobile communication device. Some participants cited the greater service and coverage that could be provided with cellular networks. There was also discussion of hybrid approaches, where cell phones would cutover to VoIP over WiFi when near a WiFi access point. The advantage for users might be better coverage in basements, or areas too far from a cell tower. On the other hand, carriers would seek to recoup costs by charging roaming fees to users. Whether there's even any influence campuses can exert in this marketplace is subject to debate. *Access Point Placement* Providing effective wireless service starts with a well-planned physical footprint. Adding closets and network drops at appropriate intervals is an easy and relatively cheap thing to do when a building is first being designed, and construction hasn't begun. The participants agreed strongly on being aggressive in engaging the builders to make sure that proper plans are in place before ground is broken. Any changes during or after the building process are prohibitively expensive. The issues in placing an access point are threefold. They must be in a location where existing network access can be supplied; they must have electricity available (either via conventional 110VAC outlet or via DC power injection over the Ethernet cable); and they must be in a sufficiently secured location that none of the pieces involved can be tampered with. Beyond this, nobody present had any standard used for structuring wireless in buildings. Strategies ranged from doing whatever necessary to mandating the availability of an XY grid within which individual points would be activated as necessary to ensure consistent coverage. Very few people went to any great length to try to retrofit existing buildings intelligently for the most elegant access point placement. The costs tend to be so prohibitive that adding a handful more access points is far cheaper than trying to rewire the building to provide greater coverage. *Advanced Deployments * A few advanced wireless deployment techniques were discussed as well. Phased array antennae were amongst them, which everyone considered very cool -- and very expensive -- technology. The price/performance ratio is nowhere near where it needs to be for production deployments. Mesh networks, which have typically been used in outdoors deployments, may very well apply to indoor networking as well, particularly in situations where the wired infrastructure has significant limitations. These networks utilize 802.11a as a backbone and 802.11g for distribution. The biggest advantage is that existing cabling may be sufficient because the wireless network itself is used to expand the range where access points may be placed. As long as secured locations with power are available this will work, but in many cases electricity drops cost even more than network drops. Closets with electricity and physical security are, as always, important. One requirement unique to the campus environment is the accommodation of professors who want to disable the wireless network in a region during exams and other specific times. Strategies have ranged from wireless jammers to putting access points on light switches. There are even concerns about students within a single lecture hall forming an ad-hoc wireless network to conspire during the test, as has been happening in other countries for some years via cell phone text-messaging. (Cell phone connectivity is the primary reason attempts to block, jam, or disable WiFi are doomed to fail.) *Wired network authentication and authorization* In contrast to the situation with wireless network authentication, most of the representatives at the conference were still using "free love on the wired side." While this may provide a clean slate for deployment scenarios, many campuses were also unclear what sort of end state they even wanted. Assigning appropriate VLANs to systems seemed like the one common thread, and one still common to the other discussions; how does the right end machine receive the right service? Some schools used authorization in addition to authentication at decision points when deciding which services to grant a device. Penn State performs LDAP queries against the user's directory entry when determining whether they have permission to use the wireless network. Others use similar checks for VPN, IP address assignment, or to verify that accounts are still active. Other schools continue to see all this as somewhat superfluous. The alternative philosophy is that the network should be assumed insecure all the way up to the NIC. The world has witnessed a large number of scenarios where a compromised host was located behind the firewall where most boxes were soft due to a reliance on the insulation of the firewall for protection. One medical school even experienced a DoS attack where a server farm had been compromised and beat on the firewall itself from the inside, causing all network access for their users to go down. Network authorization could be viewed in a similar "pointless" light: it's more important to identify, track down, and quarantine machines behaving badly than to attempt to prevent illicit access entirely. Virtually every misbehaving box is a legitimate member of the community anyway, so forcing a network log-in or authorization based on user identity arguably does nothing to improve network security. Columbia uses Netflow and a large amount of other intrusion detection technology to identify hosts that are compromised or misbehaving. There is still a simple filter in their router simply to deal with some inherent Windows problems. *The Indiana Experience* Indiana's approach exemplifies the philosophy that the network is inherently evil, good boxes can go bad, and the ability to respond is as critical as the ability to prevent. The basic question boils down to how bad apples are identified and dealt with when everyone is assumed equal initially. Most requests for termination of service come from either internal monitoring systems or external bodies such as RIAA or ISP's. They have successfully responded to every subpoena, all of which thus far have regarded file sharing by a student. Virtually everyone must use DHCP, although a handful of static addresses are accepted. Machines granted these addresses can be captured as well if necessary. There is no requirement that any device be registered because they can track an IP address or connection down to an individual jack, riser, floor, room, etc. DNS logging is used and MAC and IP addresses are quickly recorded. A variety of actionable options become available once the offending machine and its physical layer location are identified. Once a host has been identified and quarantined, the individual's web requests are all routed to a customized help page. This page will display different information based on the reason (bandwidth allocation, malware, RIAA, drone, etc.) the individual site was captured. The participants were taken through a quick tour of currently quarantined sites, most of whom had badly exceeded extremely generous bandwidth limits, generally located in res halls and nursing stations, along with an employee in the libraries. There have been a few instances where an individual who should not have had any network access at all did receive it, but in general, this approach has been extremely successful and been a very convenient system for honest campus users. *4. 0 Packet Disruption Devices* The afternoon session on the second day went deep into the ecosystem of packet disruption and shaping devices, including firewalls, load balancers, and intrusion prevention systems (IPSs). Deployments and philosophy amongst campuses showed more variability than was seen with other parts of the discussion. There are a lot of deployment concerns that are common to all packet disruption devices regardless of the purpose of the device. Mitigation techniques exist to address some of the problems for some devices, but in general, most of these challenges will have to be the subject of future research to enhance the devices and protocols used to perform these functions or reduce the need for them entirely. Three concerns were dominant. First, these devices may limit network availability through their own failure, due to device failure or simple volume of traffic that these devices can handle relative to what the underlying network would otherwise be able to carry. This ratio becomes worse for the systems designed to provide protection at higher levels of the protocol stack, such as intrusion prevention systems (IPSs), due to the increased processing implied. Deployments must take these limitations into account. There are also limitations imposed on the set of architectures that can be used in conjunction with the devices, since these boxes generally operate as single points on the network. Virtual and physical networks in particular may be used to route traffic selectively around or through these devices. If the system is deployed without full awareness of the infrastructure into which it's placed, security vulnerabilities may arise. On the other hand, it is also necessary to structure the deployments such that a set of hosts on the internal network may be excluded from these devices. Only two schools present even had complete control of the network from backbone uplink down to the wall jack throughout their entire network. Lastly, and perhaps the hardest of the problems is the loss of end-to-end transparency and diagnostic ability. Firewalls are the most notorious example, but other packet distruption devices can give misleading or imperfect information about the state of the network itself. (E.g. one copyright music detection appliance scans network traffic for music signatures; when finding such a stream, the appliance emits a TCP reset command to sever the connection and confuse the user and diagnostician alike.) If some form of packet disruption is responsible for service failure, very little beyond thorough knowledge of the fingerprints left by these devices, the entire network structure, and intuition can be any guide to diagnosis. Nevertheless, these systems are important tools and the group spent the afternoon of the second day discussing in detail the wide variety in packet disruption devices and their deployment in higher ed. *Sidecar Selection Mechanism* The critical "sidecar" capability, providing a route around the device for special hosts or ranges, is generally the biggest variable in deployment. One option is to use source-based routing. On Cisco 6500's, source-based routing can't be done inside a VRF, but Juniper makes boxes capable of doing this. (Everyone present was running exclusively either Cisco or Juniper routers, with the exception of a couple scattered Foundry boxes being phased out.) Uplinks from individual edge routers within the campus can be routed to a VLAN on a 6500 to allow for multiple devices in the data center on the routed segment. This model also allows IPv6 to be run on a second box in parallel with no need for separate fiber to the building. The interesting part of this model is the division of traffic in the most intelligent fashion feasible. While the selection mechanism is implemented using static ACLs presently, there is a unusual proposal on the table at IETF to use policy-based routing and other functionality to make these decisions not only dynamic, but signaled. A FlowSpec draft describing ways to encourage BGP to push more interesting information that would form a basis for this capability has already received a support commitment from Juniper. Cisco finds this interesting but incomplete, with thorough definition of verbs needed that could lead to a final protocol specification which would be coded to. One of the weaknesses of currently implemented policy-based routing is the lack of liveness detection and automatic failover: if a path is down, there is no alternative routing provisioned. Kevin speculated further about inserting policy modules into boxes to process packets according to various rules up to level 4 selection before sending the traffic back to the main routing engine. Cliff was curious what could be done with source-based routing that couldn't be done with standard VLAN deployment; UC-Berkeley uses VLANs as a way to handle opt-out of firewalls. David Sinn of the University of Washington replied that their LAN hardware was such that they could not pervasively deploy VLANs. Further, in a hypothetical situation with an IPS deployed near the core while one department consistently performed very large transfers that would crush it, how could it be architected so some traffic would avoid the IPS in a flexible way without having drunk MPLS kool-aid? Terry Gray suggested a hybrid L2/L3 approach with the capabilities of VLANs used where available to offer multiple classes of connectivity. *Intrusion Detection/Prevention* David Sinn gave a presentation on TippingPoint use at the University of Washington, where there are three IPS device channels used inline with router egress to the border. Several other brands have been considered, including McAfee, but TippingPoint is the market leader. If these boxes fail, there is a backup path avoiding the array entirely. They would prefer to have the TippingPoints sit as sidecars off the border router instead. Cliff was curious whether these could be connected in monitor mode, but to fulfill their prevention mission (rather than just detection) they must be somewhere in the path of traffic to be able to block the flow of malicious traffic once it is detected. However, it is possible to have an IDS insert blocking ACLs into a border router, as the LBL "Bro" system does. To avoid the IPS systems entirely, machines must be connected specifically through a special router. The installation process was somewhat painful due to incorrect information and bad releases of software, followed by many upgrade and tuning cycles resulting in a fairly stable system. There is currently a fear of a slammer attack channeled into the deep inspection path, and one filter had to be disabled due to mislabeling AIM users as contaminated with Sasser. The University is working with TippingPoint to improve the handling of certain packets and overall performance. It's impossible to select policies based on user group, and the University is still trying to understand how best to extract reports and notify appropriately. IDS systems were generally deployed using Snort on an optical tap or via span ports. These taps are also used for other security and research purposes, both potentially contentious and difficult matters. (In the case of security the issues center around the value and pitfalls of log files in legal contexts; using the data for research raises issues around anonymization while preserving the research value of the data.) Schools generally use multiple taps placed at various intervals with the data consolidated to one location, where Argus is run to analyze network metrics. It's even capable of assembling bidirectional flows from multiple streams, and full support of this ability often means combining data from commodity and Abilene uplinks for a complete picture of communications. The sheer volume of traffic flowing back out of this system to multiple endpoints can pose severe challenges. The commercial version is capable of IPv6 and has special educational pricing. Netflow was also used by virtually everyone present, where there was a round consensus that it was highly desirable to collect every packet rather than a statistical sample --even though that is very difficult to do at 10Gbps and above. Several scenarios require complete assurance that this data is available; most notably, when a system is compromised, it's critical to determine whether the attacker was a script kiddie, how much data was transferred, other forensic data, and real-time diagnosis. Minnesota's new architecture allows them to span both sides of the firewall and most routers for an extremely detailed picture of the network. Other sites simply run tcpdump on the same host that serves Snort and other monitoring functions. *Internet2-Specific Security* There was a brief discussion of differential treatment of data flowing to and from the commodity internet and Abilene. The University of Berkeley only performs IDS on the commodity connection, but not on Abilene traffic, while Minnesota had seen similar problems with both. Multicast floods from Abilene caused significant problems for Indiana. Packeteer traffic shapers are often used in addition to other approaches. *Load Balancers* There was tremendous diversity in the approaches and products selected to handle load balancing on campuses to major applications such as Blackboard, SMTP, LDAP, and the web hosting environment. Duke evaluated the full set of load balancers prominent in the marketplace -- F5, Netscaler, a Cisco 6500 module, etc., which were all priced similarly at around $100,000 a pair, with Cisco being slightly pricier. Each of these products had seen deployment among the assembled, and others such as Nortel's Alteon were used as well. Stanford has deployed several Cisco Blades directly in various routers and assigned load balancing responsibility directly to the sysadmins. Microsoft's load balancer was considered pretty poor, and requires its own dedicated switch. Every switch in the load-balancing cluster needs to see every inbound packet, resulting in a need to flood traffic to every port. Multicast mode is supported but immediately results in invalid ARP responses, which can be fixed only with static ARP and static CAM entries. Terry expanded on this idea, believing that this sort of layer-violating appliance in general introduces a technical and organizational " impedance mismatch" into network support because it's never clear whether they should be managed by the server people or the network people. Things that should be application requirements are pushed out to the network itself. They've used simple DNS rotaries and load-based DNS rotaries with great success and are eagerly awaiting MS SQL Server 2005, which will provide better balancing without the need for external appliances. Network-based balancers can cause a tremendous deal of pain when they fail, especially when failure mode is subtle and hard to diagnose. However, DNS-based load-balancing can cause issues for applications designed in specific ways which expect a degree of statefulness. Webmail, for example, has issues with attachments being lost when the server is switched on the client. Kevin had pushed for DNS-based load-balancing, but applications pushed back for this reason. The worst situation is when L7 switching is being performed by the applications themselves; he would rather be in a position to provision a service that can enable reliable backend server pools instead. The group felt that, to some extent, network architects should do their best to engage application developers to ensure their applications' behavior doesn't cause these approaches to fail, such as not placing images on one server and web pages on another. The load balancers seem to be a ready target for blame in many cases, and the approach used by Washington is to provide a method to verify failure before contacting network services. The support model requires that all backend web servers host a simple static web page, and each of these servers is placed in its own load balancing group that contains no other servers. If this static service is broken, then it's appropriate to contact the network team. *Packet Shaping & Bandwidth* There are also myriad strategies to prevent single hosts from consuming excessive network resources. Frequently, the biggest bandwidth drain -- on the order of two-thirds -- comes from the residential networks and dorms. Negotiations and desperate pleas have met with no success and most schools present had been forced to implement some sort of scheme to forcibly limit the bandwidth available to these networks. This is not always user misbehavior, however, and often will result from infected hosts chewing up ghastly amounts of bandwidth as worms attempt to propagate themselves. Packet shapers were the most common method chosen to selectively limit bandwidth use. Packeteer in particular was widely used selectively to shape traffic flowing from the dorms to prevent P2P from absorbing the entire network, though some schools used Introvert. Simple counts of total data throughput using Netflow or microflow policers on the 6500's were also used. Some use this technology only for in-depth classification of packets to give a better picture of network utilization in addition to providing application-specific rate limits. However, this is something of an arms race with the developers of P2P applications and protocols, as these grow stealthier in their network use to better avoid detection and categorization. Deep packet inspection is currently providing a sufficient edge in detection of P2P traffic. Duke gives students 5 gigabits of bandwidth, and after 5 violations, they're dropped into a rate-limited category for the rest of the semester. Cliff and Berkeley have extended this idea to allow the school to charge residential halls as a whole for the bandwidth they consume. These halls have proved reluctant to allow individual students to directly purchase additional bandwidth. *Firewalls * Firewalls have had a long and embattled history within higher ed, but have finally reached a point of limited deployment for specific needs. Regardless of how the central IT at a university may feel, specific departments and entities on campus are likely to deploy or opt out of firewall services entirely. A flexible stance is the only way to deal with this decentralized deployment of a device that impacts network architecture and visibility so severely. Mark disliked inline packet disruption as a general principal, but has moved anyway due to internal needs to try to support some form of departmental firewalling. This would be a simple customizable filter placed on the first-hop router, and if more extensive or stateful firewalling were necessary, they would be willing to accommodate a box. "If there's a bunch of different ports, we'll argue long and hard about that." This turned out to be the norm for most schools. Lea Roberts from Stanford observed that TCAM memory on Cisco 6500's tends to be limited, and port-based filtering is most likely to cause this to overload. An idea from Mark for the management of this sort of system "filled [Cliff] with horror." He proposed a web-based protected interface to allow for delegated management of ACLs, but Cliff feels that ACLs don't scale: too many ACLs on a router will cause it to go "belly-up, and when it will die can't be quantified." There was a shared fear of all present that users would create extremely complicated ACL-based rulesets that simply wouldn't be supportable. Instead, they created a service based on FWSM and Netscreen. For $600 installation and $300/yr per subnet, using a hidden VLAN, central Berkeley IT will install and manage this sort of firewall for a department. This hidden VLAN allows network engineers to manage all the switches in the department. The department does have the option to manage the FWSM software themselves, but this is rarely accepted. There are some tricks to extend this capacity, such as making every statement a permit followed by a deny. Almost everyone present used FWSM for firewalling to the extent it was done at their institution. The major problem commonly reported was that, although level 3 failover was performed well, layer 2 failover didn't always work. The general preference of the group was for separation of the administration of FWSM from IOS to enable firewall functionality to be controlled independently. Some network protocols such as IPSec/VPN are generally impossible to broadly block for a campus due to the number of legitimate uses for the technology. This has already caused significant problems and users frequently take advantage of these holes to tunnel through the firewall. Firewalls generally have to be deployed close to the edge so as not to adversely impact other legitimate campus applications such as VoIP. There are several tools out there that attempt to either test a set of policies against a ruleset, or generate a ruleset to match a chosen set of policies, but tools that handle the first scenario in particular are lacking. The management tool for FWSM is "a glorified command line" and can lead to serious expansion of the configuration file. Tom mentioned Netscreen Standalone Manager as being quite good, and Lea feels the new Netscreen 2000 "is really slick." *Minimum Device Security Standards* Most schools have found it necessary to explicitly require a certain level of security and update for connected boxes as an additional measure to limit exploits. A current OS, regular updates, some form of virus protection and a firewall on the box are part of the policy, which also contains a large number of exceptions. Berkeley's policy, available at http://security.berkeley.edu/MinStds/, is the most widely known. It certainly doesn't assure compliance, but it's been sufficient to force some upgrades of some bitterly entrenched Mac OS 7 systems. Indiana distributes a CD to all campus users that forces their machine into auto-update mode. Terry suggested giving all students a CD with Knoppix on it. Other schools have imposed restrictions directly on the network, such as not allowing students to bring up servers visible to the 'net at large, although they can be used intra-campus. The security and network teams at Berkeley conduct regular scans to identify machines that are not in compliance with the policy. If these boxes are not brought into compliance after a given number of warnings, they are disconnected. *5. Action Items* While most of the session was spent talking about mutual concerns and issues, two distinct deliverables resulted from this discussion. Strong desires were expressed for a development process to be initiated. Ken took ownership of both follow-ups. The group found it appalling that no good GUIs had been developed for management of VLANs, a problem compounded in systems utilizing components from multiple vendors, which is a fairly common situation. Nobody knew of any vendor or project working in this space. Development of a broad toolkit to allow for configuration, visualization, and monitoring of these networks seemed relatively straightforward and of tremendous value to the entire community. Support for some form of delegated management of this infrastructure could follow later. *6. Participants* Ville Aikas University of Washington Alan Crosswell Columibia University Chris Chin University of California - Berkeley Mike Contino Penn State University Steve Corbato Internet2 Rich Cropp Penn State University Jeremy Dahl Pacific Northwest National Laboratory Matt Davy Indiana University David Farmer University of Minnesota Cliff Frost University of California - Berkeley Terry Gray University of Washington Peter Gutierrez University of Massachusetts Amherst Marilyn Hay University of British Columbia Roy Hockett University of Michigan Shumon Huque University of Pennsylvania Deke Kassabian University of Pennsylvania Ken Klingenstein Internet2 Nate Klingenstein Internet2 John Kristoff Northwestern University Mike LaHaye Internet2 Kevin Miller Duke University Chris Misra University of Massachusetts Amherst Andy Palms University of Michigan James Pepin University of Southern California Mark Poepping Carnegie Mellon University David Richardson University of Washington Lea Roberts Stanford University Mike Sawyer University of California - Berkeley Jeffrey Schiller MIT David Sinn University of Washington John Streck University North Carolina at Chapel Hill Greg Travis Indiana University Tom Zeller Indiana University