Topics in Advanced Network Management:
             Results from a workshop of R&E network architects
=========================================================================================

_November 2005_

*1. Introduction*

Network architecture, security, and management in higher-ed have lacked a
collective forum to convene individuals from multiple institutions and
varying strata in the managerial structure. The creation of a discussion
body that meets occasionally to cover selected themes in institutional
networking begins to fill this need. A special session was held following
the Joint Techs Workshop in Vancouver as a stepping stone towards the
establishment of such a body.

Over the past twenty years, the course of the evolution of physical networks
on campuses and the number and types of devices and services using them has
left significant integration and legacy issues. The workshop revealed themes
in this progression which tended to be common to many schools, laying the
foundation for future conversation.

The following document represents some tentative conclusions and
disagreements worked during that workshop. The workshop was structured into
three sessions:

  · physical convergence and logical networking: physical convergence of
    media/protocol and logical network issues, including the interplay of
    the two forces. Topics include architectural considerations, security,
    economic and policy issues, and IPv6 and addressing dimensions.
  · network authentication: Topics include wireless authentication,
    including roaming between institutions, and wired. An interesting case
    study from Indiana is also discussed.
  · packet disruption: middleboxes that affect end-end packet flow, often to
    effect policy.

Topics include basic issues, sidecar selection mechanisms, intrusion
detection approaches, load balancing, minimum security standards and
firewalls.

There are clear dependencies amongst these topics and during the course of
the workshop earlier points were revisited. For the purpose of this
narrative, the discussions have been merged into a somewhat logical flow. In
general, information is presented without attribution or institutional
association. Individual names or schools are attached to comments that were
particularly astute or particularly "distinctive".

This document concludes with some action items that have not been acted on
yet.

*2. Physical Convergence and Logical Networking*

*Basics*
Two independent and powerful forces shape much of enterprise network design
today. The end goal of logical (or virtual) networking is to create closed
communication systems over an arbitrary physical network that touch multiple
selective hosts in disparate locations to provision particular isolated
services with reliability and scalability. Physical convergence works at a
different level, seeking to support an arbitrary set of diverse protocols
and service types and needs across a single physical layer. Voice, video,
and data are the services pressing this convergence to date, with control
systems such as alarms, door locks, and others looming on the horizon. At
times, services have come online on common infrastructure in buildings even
unannounced -- one participant described an ambush deployment of security
cameras on the network. These different services typically come with
different expectations for uptime and reliability from each other and place
new demands on a converged infrastructure.

Some combination of the two architectural themes of logical networking and
physical convergence occurs in many deployment scenarios. Decisions have to
be made based on the specific needs of a given service and the networks
available. Policy and legal ramifications inevitably enter the picture along
with classic difficulties in acquiring full co-operation from other campus
entities.

*Architectural Considerations *
Delivering a normal touch and feel to service for applications accustomed to
dedicated networks or protocols other than IP with convergent networks may
be easier with advanced routing protocols such as MPLS. The "fish problem",
frequently encountered at gigapops, wherein the shortest path is not
necessarily the best path, can be alleviated to some degree with the
deployment of MPLS. Multi-exit routing at the campus level can also be
supported more effectively.

Terry saw two major strategies emerging from the discussion: multiple VLANs
projected over one single routed network, or a set of services isolated by
VLANs within the layer 2 building infrastructure, but trunked from building
to building at level 2.5 or 3, essentially tunnelling together VLANs that
provide equivalent connectivity semantics, but without trying to extend
broadcast domains across routed interfaces. In the end, this is an
engineering task: there are no right answers, but there are series of
answers that are consistent with each other. Local conditions have
significant impact on the overall architecture. Redundancy also generally is
easier the higher in the protocol stack it's provided.

Another primary concern is manageability, but there was little consensus
about which sort of network structure lent itself most readily to
manageability beyond ensuring an infrastructure as homogenous as possible.
Ensuring that people connecting devices to the network(s) use the proper
ports is an ongoing struggle. A common need of workshop attendees was better
tools for managing information about subnets and VLANs at the wall jack
level.

Legal issues also arise rapidly surrounding the necessary HIPAA and FERPA
protection of information. The requirements for network architecture to
support this legislation are not fully understood and legal interpretations
can vary widely from campus to campus. Some legal departments have decided
entirely separate physical networks are required for compliance. There were
two cases in point that described themselves in greater detail, Duke and
Berkeley.

Duke's end goal is to build a single, strong physical infrastructure running
networks that touch multiple, selective hosts in disparate locations using
firewalling and VLANs as a logical overlay. Someone mentioned that this
approach had led in their instance to a maintenance network running over 250
firewalls alone over this unified physical backplane.

Berkeley makes use of both VLANs and separate physical infrastructure as
needed. Cliff made the point that this generally leads to the simplest
possible implementation, and troubleshooting is an easier process with
multiple physical networks. However, as Terry noted, this could be limit the
ability to utilize spanning trees and other redundancy approaches (not that
Terry has ever advocated spanning tree for anything).
Physical system approaches varied amongst schools with devices like
networked digital door locks: some default in situations of power or network
failure to open with security patrols, and others default to locked.
Failsafes will need to be devised for physical and other more critical
systems using campus network infrastructure, and highlight the need for
availability.

The growing use of externally provided networks on campus is an interesting
twist; most pressing today is the widespread availability of cellular
services and the extension of these services to data. The general consensus
was that not owning a network(e.g. cellular) was not an issue in integrating
with campus management. There's a need to address these externally provided
networks anyway, given the desire to integrate with them to some extent.
Some examples cited are roaming from WiFi networking to cellular networking,
or number portability between cellular, campus, and blackberry voice
services. The growing set of service requirements out of institutional
control limits the ability to make classic network choices, and Mark thought
running IP services over cellular to be an unwise idea.

*Security*
One of the most important aspects of the convergence discussion was the
extent to which sufficient security can be provided on networks serving such
different devices and needs. Isolation of traffic may be necessary at layer
1, 2, 2.5, and 3 for various application needs.
There are many secure protocols and deployment techniques such as IPSec that
have been developed to deal with attacks such as MITM, spoofing, and
wiretapping, but there are some attacks for which there is no real defense.

Without better approaches to security, convergence may become intractable.
One participant described a large struggle with facilities, who had decided
to spend $100,000 pulling fiber right next to existing available light paths
for new security cameras since the digital ones on the main network had
already been hacked before.

Firewalling techniques can insulate these networks to some degree from the
external world, limiting this exposure. There was a suggestion that IPv6
could add some limited additional security, or use of 10.x.x.x networks and
NAT as an alternative. On the other hand, firewalls can also serve to defeat
some of the benefits of advanced routing, limiting multipathing's
capabilities, port agile applications, etc.

When computers are on the same physical layer network, it is virtually
impossible to insulate a distributed system on it from DDoS attacks.
Protecting nodes from these attacks using logical networking alone is
essentially impossible. There is also always danger of poorly implemented
software containing vulnerabilities which would be much more easily
exploited on a common physical TCP/IP-based network.

*Economics*
There was a division of opinion over the best way to partition these
services particularly when limited funding is available. Some in attendance
felt that throwing the entirety of the budget at a single network was the
wisest policy: the more money available, the more reliable, powerful, and
scalable it can be made. Others were concerned that many of the security and
uptime limitations of fiber-optic IP networks were innate and building
separate networks; Cliff suggested it was reasonable to use up to even 5 or
6 physical networks affordably, and that these networks could be built with
fewer components while maintaining reliability.

In many instances there's already legacy copper wiring in the buildings that
will eventually be useful for networked systems for critical functions that
require little bandwidth. Using redeployed copper for physical networks for
systems such as alarm and door systems or energy management which have
limited requirements but require insulation from security threats is one
example of making use of these facilities.

As mentioned above, common understanding of what degree of network-layer
insulation is required by FERPA, HIPAA, and other relevant legislation, and
a consistent vision of the true needs of standard application types is
lacking. A general assessment of guidelines would benefit the process of
convergence and general network design that should be provided for any given
deployment. Care must be taken to ensure that any "HIPAA network
certification" doesn't result in overly onerous processes; one campus
reported that the credit card industry checklist burned up three quarters of
network staff time for an entire month and a half.

*IPv6 and Addressing Space*
The continued slow pace of deployment of IPv6 is an ongoing concern to some
in the community (but not others, who are in no hurry). One of the most
pressing reasons for its development, the lack of address space in the v4
space, has been greatly mitigated by the advent of NATs and private networks
which limit the need for publicly accessible addresses. The other benefits
of IPv6 have not proven compelling thus far in most instances.

One school purchased a /32 address space "because [they] thought it was
cool," (but noted they had also bought namespace related to X.500.) The
purchase came with the stipulation that the namespace be actively used or it
would be returned to the granting company, presumably to limit speculative
activity. Only a small fraction of the machines at their site have IPv6
designations at this point. Stanford mentioned that they broke IPv6 support
during a backbone upgrade and were surprised to receive a complaint from
someone on campus. There have been isolated requests from other quarters for
IPv6, such as someone running INN, but no serious motivation. The only
request the University of British Columbia had ever received was for hosting
this very (Joint Techs) conference.

Even the people present considered IPv6 a playground that isn't
well-understood. There was great concern that insufficient attention had
been given to re-engineering the security of IP when IPv6 was being
developed. There are many security considerations which are different from
IPv4, and less known, and existing security constructs such as IP-address
based ACLs will break. Deployment of separate first-hop gateways for IPv6
was suggested as an intermediate step to ease the pain on routers. One
school with dual IPv4 and IPv6 support at the backbone had a router that
melted regularly due to IPv6 traffic, which for some reason seems to have
much greater port scanning activity or processing load.

The alternative -- and by far more prevalent -- solution to address space
needs is the use of NAT technology. NATs have well-known security and
accessibility implications, but especially with the decentralized nature of
network roll-out on campuses, they tend to be a factor to consider
regardless of the rest of the architecture. One difference between NAT and a
stateful firewall is that NAT boxes block incoming connections by default,
although stateful firewalls can be configured to do the same. Another
difference is that one-to-many NAT devices can preserve public address space
by allowing many hosts with private RFC-1918 addresses to share a single
public address for external connections. The University of Washington routes
some private address space on campus, and provides NAT-mediated access to
the Internet from those private addresses. Departments can "opt in" so that
DHCP returns private addresses, and static public addresses can be mixed on
the same subnet. Some units have said this is the best networking service
the central IT organization has ever provided.

One of the most likely use cases identified in the near term for the
deployment of IPv6 is VoIP, where there is a serious need for both the
expanded addressing space to support inbound calls and enhanced mobility.
However, a lack of vendor support by companies providing end devices has
been another serious barrier to development of IPv6 use in this and other
applications.

*3. Network Authentication*

Network authentication can apply in several key situations: authenticating a
user for network access; authenticating a device for network access; and
authenticating the state of a device. These issues are often coupled, and
can include authorization issues as well as authentication. This session at
the workshop dealt primarily with wireless network access issues, including
authentication, access point placements, VoIP, and advanced wireless
technologies. The session concluded with a case study from Indiana.

*Wireless and Authentication*

The wireless space continues to witness continuous, rapid development of
protocols and standards to support the unique needs of that environment. At
the same time, cohesive, secure production deployments are critical to
campus infrastructures. The challenges posed by this confluence were
discussed at length by the group.

Authentication in wireless is one of the least consistent pieces, with
multiple encryption and standards sometimes performing this duty at the link
layer. More flexible solutions such as 802.1x and web-based sign-on living
higher in the protocol stack may work alone or in concert with the WPAs and
WEPs of the world. The most common approaches identified are MAC-based
authentication, middle-box authentication (typically via a captive web
portal with an authentication back-end module), 802.1x, and the
authentication-less alternative, "free love." Greg observed at the end of
the discussion that everything said was also applicable to the wired
networks, but the omnipresent nature of wireless makes it a greater concern.

Only Penn State among those present had a fully deployed and functional
802.1x authentication system, using a Cisco AP; many older APs do not
support AES or 802.1x at all and will need to be replaced. This is coupled
with WPA-TKIP encryption and backended to a FreeRADIUS server. The Penn
State wireless deployment is moving to use EAP-TTLS/PAP to authenticate
against the university-wide krb5 realm; this is currently deployed in two
building at University Park Campus and will be deployed in future wireless
installations. Support is built into Mac OS X systems, but Windows users
have had to install the open-source SecureW2 client, while Linux support is
variable by edition, card, and patches. Pocket PC clients can use SecureW2
as well, but there is currently no access mechanism for Palms. There has
been no sharp cutover due to the end-user transition challenges. However,
the user experience has been more positive than the standard captive portal
approach, and the encryption is stronger.

There is unfortunately no EAP that talks Kerberos at this point in time that
anyone was aware of, nor any active projects to do so. The group concurred
that this would be a long-term and challenging project since the protocol
itself would likely have to be extended first with implementation following.

Other schools are actively examining 802.1x, with the University of Michigan
and Carnegie Mellon both planning on using some form of certificates issued
to individual computers. This authentication supports the issuance of access
to devices, while the authentication of members and guests as users for
services are handled separately. The University of British Columbia has test
deployments in place as well.

When there is no authentication performed, which isn't entirely uncommon,
there are still concerns about information leakage from hosts using the open
wireless network sniffing each other's traffic. Encryption without
authentication is routinely used in these circumstances.

"Lots of us have fat ACLs for a flat wireless space." This problem has
resulted from a variety of limitations, including broadcast storm issues and
constructs such as local wireless subnets to address limitations in old
hardware. Participants were anxious to have products that support layer 3
tunneling ("mobile IP") which along with more capable devices would
collectively allow for functional mobility while maintaining a routed
network.

One of the biggest challenges to providing roaming service and mobile IP is
the handoff of users from one access point to another, which must be done in
under 50 milliseconds to support glitch-free VoIP calls. Debugging the
automatic tunnels everywhere would be virtually impossible. Access point
vendors have finally hired knowledge from the cellular community to try to
provide this functionality soon.

*VoIP over Wi-Fi*
VoIP over Wi-Fi is another voice application which may be of interest to
campus users. It might only be possible to do in a scalable fashion over
802.11a and not b/g. The University of British Columbia has handed out some
officially unsupported Wi-Fi phones as an experiment in functionality, and
others schools are trying to use these phones over 802.11b networks. RIM
recently came out with an 802.11b Blackberry device which is SIP VoIP
capable. There are many issues yet to be resolved for voice services over
wireless, including QoS, prioritization and fair/unfair channel access.

UC Berkeley's representatives said that VoIP should be forgotten about
entirely: "You want voice, use a cell phone." The 3G mobile communication
devices in particular represent the evolution of the cellular phone to more
general mobile communication device. Some participants cited the greater
service and coverage that could be provided with cellular networks. There
was also discussion of hybrid approaches, where cell phones would cutover to
VoIP over WiFi when near a WiFi access point. The advantage for users might
be better coverage in basements, or areas too far from a cell tower. On the
other hand, carriers would seek to recoup costs by charging roaming fees to
users. Whether there's even any influence campuses can exert in this
marketplace is subject to debate.

*Access Point Placement*
Providing effective wireless service starts with a well-planned physical
footprint. Adding closets and network drops at appropriate intervals is an
easy and relatively cheap thing to do when a building is first being
designed, and construction hasn't begun. The participants agreed strongly on
being aggressive in engaging the builders to make sure that proper plans are
in place before ground is broken. Any changes during or after the building
process are prohibitively expensive.

The issues in placing an access point are threefold. They must be in a
location where existing network access can be supplied; they must have
electricity available (either via conventional 110VAC outlet or via DC power
injection over the Ethernet cable); and they must be in a sufficiently
secured location that none of the pieces involved can be tampered with.
Beyond this, nobody present had any standard used for structuring wireless
in buildings. Strategies ranged from doing whatever necessary to mandating
the availability of an XY grid within which individual points would be
activated as necessary to ensure consistent coverage.

Very few people went to any great length to try to retrofit existing
buildings intelligently for the most elegant access point placement. The
costs tend to be so prohibitive that adding a handful more access points is
far cheaper than trying to rewire the building to provide greater coverage.

*Advanced Deployments *
A few advanced wireless deployment techniques were discussed as well. Phased
array antennae were amongst them, which everyone considered very cool -- and
very expensive -- technology. The price/performance ratio is nowhere near
where it needs to be for production deployments.

Mesh networks, which have typically been used in outdoors deployments, may
very well apply to indoor networking as well, particularly in situations
where the wired infrastructure has significant limitations. These networks
utilize 802.11a as a backbone and 802.11g for distribution. The biggest
advantage is that existing cabling may be sufficient because the wireless
network itself is used to expand the range where access points may be
placed. As long as secured locations with power are available this will
work, but in many cases electricity drops cost even more than network drops.
Closets with electricity and physical security are, as always, important.

One requirement unique to the campus environment is the accommodation of
professors who want to disable the wireless network in a region during exams
and other specific times. Strategies have ranged from wireless jammers to
putting access points on light switches. There are even concerns about
students within a single lecture hall forming an ad-hoc wireless network to
conspire during the test, as has been happening in other countries for some
years via cell phone text-messaging. (Cell phone connectivity is the primary
reason attempts to block, jam, or disable WiFi are doomed to fail.)

*Wired network authentication and authorization*
In contrast to the situation with wireless network authentication, most of
the representatives at the conference were still using "free love on the
wired side." While this may provide a clean slate for deployment scenarios,
many campuses were also unclear what sort of end state they even wanted.
Assigning appropriate VLANs to systems seemed like the one common thread,
and one still common to the other discussions; how does the right end
machine receive the right service?

Some schools used authorization in addition to authentication at decision
points when deciding which services to grant a device. Penn State performs
LDAP queries against the user's directory entry when determining whether
they have permission to use the wireless network.
Others use similar checks for VPN, IP address assignment, or to verify that
accounts are still active.

Other schools continue to see all this as somewhat superfluous. The
alternative philosophy is that the network should be assumed insecure all
the way up to the NIC. The world has witnessed a large number of scenarios
where a compromised host was located behind the firewall where most boxes
were soft due to a reliance on the insulation of the firewall for
protection. One medical school even experienced a DoS attack where a server
farm had been compromised and beat on the firewall itself from the inside,
causing all network access for their users to go down.

Network authorization could be viewed in a similar "pointless" light: it's
more important to identify, track down, and quarantine machines behaving
badly than to attempt to prevent illicit access entirely. Virtually every
misbehaving box is a legitimate member of the community anyway, so forcing a
network log-in or authorization based on user identity arguably does nothing
to improve network security. Columbia uses Netflow and a large amount of
other intrusion detection technology to identify hosts that are compromised
or misbehaving. There is still a simple filter in their router simply to
deal with some inherent Windows problems.

*The Indiana Experience*

Indiana's approach exemplifies the philosophy that the network is inherently
evil, good boxes can go bad, and the ability to respond is as critical as
the ability to prevent. The basic question boils down to how bad apples are
identified and dealt with when everyone is assumed equal initially. Most
requests for termination of service come from either internal monitoring
systems or external bodies such as RIAA or ISP's. They have successfully
responded to every subpoena, all of which thus far have regarded file
sharing by a student.

Virtually everyone must use DHCP, although a handful of static addresses are
accepted. Machines granted these addresses can be captured as well if
necessary. There is no requirement that any device be registered because
they can track an IP address or connection down to an individual jack,
riser, floor, room, etc. DNS logging is used and MAC and IP addresses are
quickly recorded. A variety of actionable options become available once the
offending machine and its physical layer location are identified.

Once a host has been identified and quarantined, the individual's web
requests are all routed to a customized help page. This page will display
different information based on the reason (bandwidth allocation, malware,
RIAA, drone, etc.) the individual site was captured. The participants were
taken through a quick tour of currently quarantined sites, most of whom had
badly exceeded extremely generous bandwidth limits, generally located in res
halls and nursing stations, along with an employee in the libraries.

There have been a few instances where an individual who should not have had
any network access at all did receive it, but in general, this approach has
been extremely successful and been a very convenient system for honest
campus users.

*4. 0 Packet Disruption Devices*

The afternoon session on the second day went deep into the ecosystem of
packet disruption and shaping devices, including firewalls, load balancers,
and intrusion prevention systems (IPSs). Deployments and philosophy amongst
campuses showed more variability than was seen with other parts of the
discussion.

There are a lot of deployment concerns that are common to all packet
disruption devices regardless of the purpose of the device. Mitigation
techniques exist to address some of the problems for some devices, but in
general, most of these challenges will have to be the subject of future
research to enhance the devices and protocols used to perform these
functions or reduce the need for them entirely. Three concerns were
dominant.

First, these devices may limit network availability through their own
failure, due to device failure or simple volume of traffic that these
devices can handle relative to what the underlying network would otherwise
be able to carry. This ratio becomes worse for the systems designed to
provide protection at higher levels of the protocol stack, such as intrusion
prevention systems (IPSs), due to the increased processing implied.
Deployments must take these limitations into account.

There are also limitations imposed on the set of architectures that can be
used in conjunction with the devices, since these boxes generally operate as
single points on the network. Virtual and physical networks in particular
may be used to route traffic selectively around or through these devices. If
the system is deployed without full awareness of the infrastructure into
which it's placed, security vulnerabilities may arise. On the other hand, it
is also necessary to structure the deployments such that a set of hosts on
the internal network may be excluded from these devices. Only two schools
present even had complete control of the network from backbone uplink down
to the wall jack throughout their entire network.

Lastly, and perhaps the hardest of the problems is the loss of end-to-end
transparency and diagnostic ability. Firewalls are the most notorious
example, but other packet distruption devices can give misleading or
imperfect information about the state of the network itself. (E.g. one
copyright music detection appliance scans network traffic for music
signatures; when finding such a stream, the appliance emits a TCP reset
command to sever the connection and confuse the user and diagnostician
alike.) If some form of packet disruption is responsible for service
failure, very little beyond thorough knowledge of the fingerprints left by
these devices, the entire network structure, and intuition can be any guide
to diagnosis.

Nevertheless, these systems are important tools and the group spent the
afternoon of the second day discussing in detail the wide variety in packet
disruption devices and their deployment in higher ed.

*Sidecar Selection Mechanism*
The critical "sidecar" capability, providing a route around the device for
special hosts or ranges, is generally the biggest variable in deployment.

One option is to use source-based routing. On Cisco 6500's, source-based
routing can't be done inside a VRF, but Juniper makes boxes capable of doing
this. (Everyone present was running exclusively either Cisco or Juniper
routers, with the exception of a couple scattered Foundry boxes being phased
out.) Uplinks from individual edge routers within the campus can be routed
to a VLAN on a 6500 to allow for multiple devices in the data center on the
routed segment. This model also allows IPv6 to be run on a second box in
parallel with no need for separate fiber to the building.

The interesting part of this model is the division of traffic in the most
intelligent fashion feasible. While the selection mechanism is implemented
using static ACLs presently, there is a unusual proposal on the table at
IETF to use policy-based routing and other functionality to make these
decisions not only dynamic, but signaled.

A FlowSpec draft describing ways to encourage BGP to push more interesting
information that would form a basis for this capability has already received
a support commitment from Juniper. Cisco finds this interesting but
incomplete, with thorough definition of verbs needed that could lead to a
final protocol specification which would be coded to. One of the weaknesses
of currently implemented policy-based routing is the lack of liveness
detection and automatic failover: if a path is down, there is no alternative
routing provisioned. Kevin speculated further about inserting policy modules
into boxes to process packets according to various rules up to level 4
selection before sending the traffic back to the main routing engine.

Cliff was curious what could be done with source-based routing that couldn't
be done with standard VLAN deployment; UC-Berkeley uses VLANs as a way to
handle opt-out of firewalls. David Sinn of the University of Washington
replied that their LAN hardware was such that they could not pervasively
deploy VLANs. Further, in a hypothetical situation with an IPS deployed near
the core while one department consistently performed very large transfers
that would crush it, how could it be architected so some traffic would avoid
the IPS in a flexible way without having drunk MPLS kool-aid? Terry Gray
suggested a hybrid L2/L3 approach with the capabilities of VLANs used where
available to offer multiple classes of connectivity.

*Intrusion Detection/Prevention*
David Sinn gave a presentation on TippingPoint use at the University of
Washington, where there are three IPS device channels used inline with
router egress to the border. Several other brands have been considered,
including McAfee, but TippingPoint is the market leader.

If these boxes fail, there is a backup path avoiding the array entirely.
They would prefer to have the TippingPoints sit as sidecars off the border
router instead. Cliff was curious whether these could be connected in
monitor mode, but to fulfill their prevention mission (rather than just
detection) they must be somewhere in the path of traffic to be able to block
the flow of malicious traffic once it is detected. However, it is possible
to have an IDS insert blocking ACLs into a border router, as the LBL "Bro"
system does.

To avoid the IPS systems entirely, machines must be connected specifically
through a special router. The installation process was somewhat painful due
to incorrect information and bad releases of software, followed by many
upgrade and tuning cycles resulting in a fairly stable system. There is
currently a fear of a slammer attack channeled into the deep inspection
path, and one filter had to be disabled due to mislabeling AIM users as
contaminated with Sasser. The University is working with TippingPoint to
improve the handling of certain packets and overall performance. It's
impossible to select policies based on user group, and the University is
still trying to understand how best to extract reports and notify
appropriately.

IDS systems were generally deployed using Snort on an optical tap or via
span ports. These taps are also used for other security and research
purposes, both potentially contentious and difficult matters. (In the case
of security the issues center around the value and pitfalls of log files in
legal contexts; using the data for research raises issues around
anonymization while preserving the research value of the data.) Schools
generally use multiple taps placed at various intervals with the data
consolidated to one location, where Argus is run to analyze network metrics.
It's even capable of assembling bidirectional flows from multiple streams,
and full support of this ability often means combining data from commodity
and Abilene uplinks for a complete picture of communications. The sheer
volume of traffic flowing back out of this system to multiple endpoints can
pose severe challenges. The commercial version is capable of IPv6 and has
special educational pricing.

Netflow was also used by virtually everyone present, where there was a round
consensus that it was highly desirable to collect every packet rather than a
statistical sample --even though that is very difficult to do at 10Gbps and
above. Several scenarios require complete assurance that this data is
available; most notably, when a system is compromised, it's critical to
determine whether the attacker was a script kiddie, how much data was
transferred, other forensic data, and real-time diagnosis. Minnesota's new
architecture allows them to span both sides of the firewall and most routers
for an extremely detailed picture of the network. Other sites simply run
tcpdump on the same host that serves Snort and other monitoring functions.

*Internet2-Specific Security*
There was a brief discussion of differential treatment of data flowing to
and from the commodity internet and Abilene. The University of Berkeley only
performs IDS on the commodity connection, but not on Abilene traffic, while
Minnesota had seen similar problems with both.
Multicast floods from Abilene caused significant problems for Indiana.
Packeteer traffic shapers are often used in addition to other approaches.

*Load Balancers*
There was tremendous diversity in the approaches and products selected to
handle load balancing on campuses to major applications such as Blackboard,
SMTP, LDAP, and the web hosting environment. Duke evaluated the full set of
load balancers prominent in the marketplace -- F5, Netscaler, a Cisco 6500
module, etc., which were all priced similarly at around $100,000 a pair,
with Cisco being slightly pricier. Each of these products had seen
deployment among the assembled, and others such as Nortel's Alteon were used
as well. Stanford has deployed several Cisco Blades directly in various
routers and assigned load balancing responsibility directly to the
sysadmins.

Microsoft's load balancer was considered pretty poor, and requires its own
dedicated switch. Every switch in the load-balancing cluster needs to see
every inbound packet, resulting in a need to flood traffic to every port.
Multicast mode is supported but immediately results in invalid ARP
responses, which can be fixed only with static ARP and static CAM entries.

Terry expanded on this idea, believing that this sort of layer-violating
appliance in general introduces a technical and organizational " impedance
mismatch" into network support because it's never clear whether they should
be managed by the server people or the network people. Things that should be
application requirements are pushed out to the network itself. They've used
simple DNS rotaries and load-based DNS rotaries with great success and are
eagerly awaiting MS SQL Server 2005, which will provide better balancing
without the need for external appliances. Network-based balancers can cause
a tremendous deal of pain when they fail, especially when failure mode is
subtle and hard to diagnose.

However, DNS-based load-balancing can cause issues for applications designed
in specific ways which expect a degree of statefulness. Webmail, for
example, has issues with attachments being lost when the server is switched
on the client. Kevin had pushed for DNS-based load-balancing, but
applications pushed back for this reason.

The worst situation is when L7 switching is being performed by the
applications themselves; he would rather be in a position to provision a
service that can enable reliable backend server pools instead. The group
felt that, to some extent, network architects should do their best to engage
application developers to ensure their applications' behavior doesn't cause
these approaches to fail, such as not placing images on one server and web
pages on another.

The load balancers seem to be a ready target for blame in many cases, and
the approach used by Washington is to provide a method to verify failure
before contacting network services. The support model requires that all
backend web servers host a simple static web page, and each of these servers
is placed in its own load balancing group that contains no other servers. If
this static service is broken, then it's appropriate to contact the network
team.

*Packet Shaping & Bandwidth*
There are also myriad strategies to prevent single hosts from consuming
excessive network resources. Frequently, the biggest bandwidth drain -- on
the order of two-thirds -- comes from the residential networks and dorms.
Negotiations and desperate pleas have met with no success and most schools
present had been forced to implement some sort of scheme to forcibly limit
the bandwidth available to these networks. This is not always user
misbehavior, however, and often will result from infected hosts chewing up
ghastly amounts of bandwidth as worms attempt to propagate themselves.

Packet shapers were the most common method chosen to selectively limit
bandwidth use. Packeteer in particular was widely used selectively to shape
traffic flowing from the dorms to prevent P2P from absorbing the entire
network, though some schools used Introvert. Simple counts of total data
throughput using Netflow or microflow policers on the 6500's were also used.

Some use this technology only for in-depth classification of packets to give
a better picture of network utilization in addition to providing
application-specific rate limits. However, this is something of an arms race
with the developers of P2P applications and protocols, as these grow
stealthier in their network use to better avoid detection and
categorization. Deep packet inspection is currently providing a sufficient
edge in detection of P2P traffic.

Duke gives students 5 gigabits of bandwidth, and after 5 violations, they're
dropped into a rate-limited category for the rest of the semester. Cliff and
Berkeley have extended this idea to allow the school to charge residential
halls as a whole for the bandwidth they consume. These halls have proved
reluctant to allow individual students to directly purchase additional
bandwidth.

*Firewalls *
Firewalls have had a long and embattled history within higher ed, but have
finally reached a point of limited deployment for specific needs. Regardless
of how the central IT at a university may feel, specific departments and
entities on campus are likely to deploy or opt out of firewall services
entirely. A flexible stance is the only way to deal with this decentralized
deployment of a device that impacts network architecture and visibility so
severely.

Mark disliked inline packet disruption as a general principal, but has moved
anyway due to internal needs to try to support some form of departmental
firewalling. This would be a simple customizable filter placed on the
first-hop router, and if more extensive or stateful firewalling were
necessary, they would be willing to accommodate a box. "If there's a bunch
of different ports, we'll argue long and hard about that." This turned out
to be the norm for most schools. Lea Roberts from Stanford observed that
TCAM memory on Cisco 6500's tends to be limited, and port-based filtering is
most likely to cause this to overload.

An idea from Mark for the management of this sort of system "filled [Cliff]
with horror." He proposed a web-based protected interface to allow for
delegated management of ACLs, but Cliff feels that ACLs don't scale: too
many ACLs on a router will cause it to go "belly-up, and when it will die
can't be quantified." There was a shared fear of all present that users
would create extremely complicated ACL-based rulesets that simply wouldn't
be supportable.

Instead, they created a service based on FWSM and Netscreen. For $600
installation and $300/yr per subnet, using a hidden VLAN, central Berkeley
IT will install and manage this sort of firewall for a department. This
hidden VLAN allows network engineers to manage all the switches in the
department. The department does have the option to manage the FWSM software
themselves, but this is rarely accepted. There are some tricks to extend
this capacity, such as making every statement a permit followed by a deny.

Almost everyone present used FWSM for firewalling to the extent it was done
at their institution. The major problem commonly reported was that, although
level 3 failover was performed well, layer 2 failover didn't always work.
The general preference of the group was for separation of the administration
of FWSM from IOS to enable firewall functionality to be controlled
independently.

Some network protocols such as IPSec/VPN are generally impossible to broadly
block for a campus due to the number of legitimate uses for the technology.
This has already caused significant problems and users frequently take
advantage of these holes to tunnel through the firewall. Firewalls generally
have to be deployed close to the edge so as not to adversely impact other
legitimate campus applications such as VoIP.

There are several tools out there that attempt to either test a set of
policies against a ruleset, or generate a ruleset to match a chosen set of
policies, but tools that handle the first scenario in particular are
lacking. The management tool for FWSM is "a glorified command line" and can
lead to serious expansion of the configuration file. Tom mentioned Netscreen
Standalone Manager as being quite good, and Lea feels the new Netscreen 2000
"is really slick."

*Minimum Device Security Standards*
Most schools have found it necessary to explicitly require a certain level
of security and update for connected boxes as an additional measure to limit
exploits. A current OS, regular updates, some form of virus protection and a
firewall on the box are part of the policy, which also contains a large
number of exceptions. Berkeley's policy, available at
http://security.berkeley.edu/MinStds/, is the most widely known. It
certainly doesn't assure compliance, but it's been sufficient to force some
upgrades of some bitterly entrenched Mac OS 7 systems. Indiana distributes a
CD to all campus users that forces their machine into auto-update mode.
Terry suggested giving all students a CD with Knoppix on it.

Other schools have imposed restrictions directly on the network, such as not
allowing students to bring up servers visible to the 'net at large, although
they can be used intra-campus. The security and network teams at Berkeley
conduct regular scans to identify machines that are not in compliance with
the policy. If these boxes are not brought into compliance after a given
number of warnings, they are disconnected.

*5. Action Items*

While most of the session was spent talking about mutual concerns and
issues, two distinct deliverables resulted from this discussion. Strong
desires were expressed for a development process to be initiated. Ken took
ownership of both follow-ups.

The group found it appalling that no good GUIs had been developed for
management of VLANs, a problem compounded in systems utilizing components
from multiple vendors, which is a fairly common situation. Nobody knew of
any vendor or project working in this space.
Development of a broad toolkit to allow for configuration, visualization,
and monitoring of these networks seemed relatively straightforward and of
tremendous value to the entire community. Support for some form of delegated
management of this infrastructure could follow later.

*6. Participants*

Ville Aikas               University of Washington
Alan Crosswell            Columibia University
Chris Chin                University of California -
                          Berkeley
Mike Contino              Penn State University
Steve Corbato             Internet2
Rich Cropp                Penn State University
Jeremy Dahl               Pacific Northwest National
                          Laboratory
Matt Davy                 Indiana University
David Farmer              University of Minnesota
Cliff Frost               University of California -
                          Berkeley
Terry Gray                University of Washington
Peter Gutierrez           University of Massachusetts
                          Amherst
Marilyn Hay               University of British Columbia
Roy Hockett               University of Michigan
Shumon Huque              University of Pennsylvania
Deke Kassabian            University of Pennsylvania
Ken Klingenstein          Internet2
Nate Klingenstein         Internet2
John Kristoff             Northwestern University
Mike LaHaye               Internet2
Kevin Miller              Duke University
Chris Misra               University of Massachusetts
                          Amherst
Andy Palms                University of Michigan
James Pepin               University of Southern
                          California
Mark Poepping             Carnegie Mellon University
David Richardson          University of Washington
Lea Roberts               Stanford University
Mike Sawyer               University of California -
                          Berkeley
Jeffrey Schiller          MIT
David Sinn                University of Washington
John Streck               University North Carolina at
                          Chapel Hill
Greg Travis               Indiana University
Tom Zeller                Indiana University