Salsa Computer Security Incidents - Internet2 (CSI2)        Phillip Deneault
Working Group
 
draft-internet2-salsa-csi2-renoir-overview-02.html     Worcester Polytechnic
                                                                   Institute
Copyright © 2007 by Internet2 and/or the respective
authors
Comments to: salsa-csi2-comments AT internet2 DOT       Created: 18-Oct-2006
edu                                                             Last Updated
                                                                 11-Jan-2007
                                                              Draft Expires:
                                                                11-July-2007

RENOIR: Research and Educational Networking Operational Information Retrieval
=============================================================================

*Overview*
RENOIR is a reporting system to be used for sharing information regarding
security incidents within an inter-institutional trust community - to aid
inter-institutional incident response, notification regarding compromised
systems, analysis for recognition of attack behaviors and trends, and
awareness for protection. RENOIR will handle security data from a variety of
sources - human and machine - and organize that data into individual
high-level cases which can then be used for response, analysis, and
reporting.

The system depends on a trusted third-party, which for the purposes of this
project,
centers on REN-ISAC (Research and Education Networking Information Sharing
and Analysis Center) [1] as that third party. Through REN-ISAC, incidents
can be centrally coordinated, reported and mitigated.

*Background
*Prior to the formation of the REN-ISAC there was no private trust community
dedicated to security information sharing focused on the unique needs and
environments of universities. Existing communities open to university
participation didn't maintain a tightly controlled private membership, were
oriented to internet service provider rather than university needs, or had
other requirements which prevented universal participation of institutions
of higher education. Operational security incident information sharing, if
done at all, incompletely reached affected institutions, was not timely, and
required a high investment of time due to the work of identifying contact
information, multiple notifications, etc.

With the advent of REN-ISAC institutions of higher education have the
ability to participate in an organization that is designed around their
needs, is trust community composed of a tightly-vetted private membership,
and has the ability to aggregate data from various resources. The data will
come from sensors, REN-ISAC's partnership with Internet2, REN-ISAC's
partnership with the US Department of Homeland Security through the formal
ISAC structure, other ISACs, REN-ISAC's own membership, as well as other
groups who recognize REN-ISAC's role in organizing and mitigating security
problems. This data is both a mixture of machine-generated reports and
events as well as human-generated reporting and commentary.

As identified by the Computer Security Incidents Internet2 working group
(CSI2) [2], one of the much-needed problems in operational security is
moving security reports around to sites in a timely fashion. RENOIR was
proposed as being a system for accepting information in a variety of ways,
transforming it into a standard, machine-readable report format, and
outputting the report in a variety of ways which each receiving site could
specify and integrate. REN-ISAC would be the trusted third-party from which
this effort was organized.

CSI2 set the following high-level goals for RENOIR:

  · The system will accept human input along with structured data to form
    reports which are stored in an appropriate format.
  · The system will allow for input from users from a variety of roles
    (Reporting party, affected site, researchers, administrators, etc)
  · The system will use useful, widely-accepted, transport mechanisms
    (HTTP, SMTP) and use encrypted channels whenever possible either in the
    transport layer, and/or by encrypting message content.
  · The system will use a central repository of contact information in
    order to facilitate automated notifications of affected sites.
  · The system will be extendable to include new security problems and
    reported incident types as they occur in the future.

*Identified Problems*

*Problem A: Human V. Automated Reporting:*
RENOIR is intended to be an organizational tool for security data, generated
both by humans and by machines. This means that the human-entered data needs
to be organized into a mechanism that can be read later by computer systems
with minimal parsing, and machine-reported data needs to be minimized and
reported in a format which is useful for reporting problems in a succinct
way (i.e. this is not a system for storing flows, but the summary results of
the flows or what they represent).

*Problem B: Data Access Levels and Encryption:*
An information sharing system which uses access levels and encryption to
hide data could be considered self-defeating, but is necessary to facilitate
various levels of data sharing. If handled properly, sensitive incidents can
be shared via RENOIR with different levels of access and encryption to
minimize the number of parties with knowledge of the report. This can help
solve a problem as to how to report to other sites without a high-liability
to negative public relations. It also solves the problem of storing
potentially sensitive information on a non-university server since a proper
encryption mechanism would only allow authorized users with the proper keys
and access to view the report. The problem becomes how best to implement an
encryption system for N-parties. This is very closely tied to Problem C.

*Problem C: Reporting Policies and Procedures:*
RENOIR depends on the REN-ISAC membership to send data to it for analysis.
This will require sites to decide how best or how much to participate. This
can be mitigated by adding access levels, encryption and other features
which give a greater breadth of options for information release. For
example, one site might only wish to see the open-access, unencrypted
reports for their own informational purposes, while another site might wish
to use RENOIR's encrypted limited-access reports as a mechanism between many
affected sites.

*Problem D: Data Input and Output:*
RENOIR will need to accept a structured data format for handling both human
and machine data. There will need to be mechanisms to better handle data
input, either at the client, or to fill in by RENOIR. On output, the data
might need to be converted from this structured format to a human-readable
format for consumption by a non-security professional. This requires a
modular interface to allow flexible access to the system.

*Problem E: Data Retention:*
Although RENOIR contains much time-sensitive data, the information that is
gathered does not `expire' once critical dates have passed. Incidents stored
within RENOIR can be mined and accessed later to build an operational
knowledge base and case system for detecting trends. This data can be kept
indefinitely to build a security history of university space however; data
should not be kept indefinitely for both legal and technical reasons. It
will be necessary to purge data on a regular basis, but there will need to
be some criteria established, or some way to summarize data for future use.
This will require a data handling policy as well as a technical methodology,

*Problem F: Service Disruption:*
Any system which is relied on for security purposes needs to be robust. This
problem is rarely encountered on a single site, since disruptions can be
easily found and corrected (and they usually need to be corrected before
anything else can happen anyway). In an inter-school system, RENOIR will
need to have redundancies built in, and engineered for fairly
straightforward to recover from any type of system failure. It will also be
necessary to structure the system in such a way that downtime means only a
delay in time of reporting and not loss of data.

*Proposed Solutions:*

*XML Data with IODEF* *(Incident Object Description and Exchange Format)*:
The structured format of an XML document is exactly what is needed to solve
the human/machines reporting problem. It's structured enough to be generated
and parsed by machines with relatively little programming and it can be
flexible enough to hold any format decided on.
;
For the purposes of RENOIR, a good XML format to use would be the Incident
Object Description and Exchange Format (IODEF) [3]. IODEF is an IETF
proposed standard for reporting security incidents is a standardized way.
This is not a catch-all for security events (i.e. firewall logs, netflows,
syslog entries, etc) and is instead a format for describing an incident from
a human perspective. It can be used as a container format for machine
generated events, but it would be ideal if only the data related to an
incident were stored to both save space and eliminate excess processing.
;
IODEF is also extensible. It's flexible enough to handle most existing
incidents and has several proposed extension to it already including one for
phishing/spamming and another for web applications and grid computing.

*Report Types:*
A small collection of report types has been brainstormed. These all store
their information in XML and are controlled in different fashioned. These
are outlined below:

  · Encrypted Reports
    XML reports are encrypted on a per message basis and have access
    controls to only allow access by involved parties that require high
    security

  · 'Limited Access' Reports
    XML reports have no encryption but have access controls for involved
    parties only. REN-ISAC is also able to view these reports.

  · Normal Report
    XML reports which have no encryption or access controls. Member sites
    can search/access the reports, only limited by client scope or view

  · Semi-Anonymous Reports
    XML reports produced by REN-ISAC saying "A member institution has had
    problem X..." This allows for open reporting without giving away the
    source.

  · Non-Incident Reports
    XML reports produced by REN-ISAC for informational purposes like general
    announcements.

*Encryption:*
All reports need various levels of encryption and verification. All reports
should be signed to verify that the reports have not been modified. Strong
encryption is required for Encrypted Reports. This is a problem because the
typical solution of asymmetric key encryption becomes much more difficult
with more than two parties involved. This is solved by using per-report
symmetric keys distributed using symmetric methods between RENOIR and each
site.

*Data Expiration:*
Since each report is a collection of information from both the submitter and
other parties and can be used to build statistical trends of incidents, each
report can be kept for a very long time to build a very useful background of
historical information. Standard information handling procedures dictate
that this information should be purged from the system eventually. Some
middle ground must be found to try to save useful content and purge excess
data. There are currently several proposed mechanisms within the CSI2 group.
The final solution will most likely include several of these.

  · A deletion flag (which may or may not be set by default at creation
    time) which would mark the record as deletable after X number of days.
  · Segmentation of the data into record types and giving each one a
    different policy.
  · An archive which would not really purge the data, just move it into
    longer term storage and accessible only by REN-ISAC. This could be
    useful to store everything except encrypted data and would possibly
    involve implementation of an 'archive/do not archive' flag.
  · Possibly sending a notice email once a month to sites which reports will
    be purged and forcing site to either access the reports or set them to
    not be deleted.
  · Purging only parts of messages like flows, evidence, etc and maintaining
    the comments, dialog, etc. This is can be done for all but encrypted
    reports.

*Key Metrics:*

*Time from Detection to Reporting:*
It important that in any system which reports data to RENOIR do so as easily
and cleanly as possible in order to minimize the time from detection at a
site to reporting to other sites. Much of the time sites don't report
problems at all, but for a system like RENOIR to succeed, those sites must
not only report, but report quickly. It is imperative it be as easy as
possible.

This improvement can be implemented any number of ways. Sites could script
reporting into existing incident workflow systems or intrusion detection
systems. Sites can input data into REN-ISAC-based security initiatives like
the shared darknet system and let REN-ISAC do the reporting. Or, sites can
do their own write-ups using client interfaces to RENOIR with wizards or web
interfaces.

*Time from Reporting to Remediation/Detection:*
Just as important as it is to accurately and quickly perform reporting, it
is also important to get that information into the hands of affected parties
to either include that data into an intrusion detection system, or for quick
remediation. Data cannot be held onto by REN-ISAC alone because that will
quickly become the bottleneck to RENOIR.

A possible mechanism to use would be EDDY [4] (*E*nd-to-end *D*iagnostic
*D*iscover*y*). A system using EDDY would help keep this metric low, since
EDDY could automatically notify various sites via a preferred method for
various types of incidents. Also, if RENOIR automatically outputted
interesting bits-of-interest like IRC Command and Control servers to
pre-existing lists that are being used all the time, then this would help
identify problems faster and allow sites to only report once.

*Planned Project Stages:*

*Phase 1:*
*Step 1 - Building the Storage Engine:*
Before any further testing and experimentation can be done, there needs to
be a storage engine in place to get reports in and out of the system.

*Step 2 - Building Input/Output Agents for Automated Reporting:*

  · Input agents should be built for machine-generated events that already
    exist or are close to fruition. An example of such input agents would be
    an agent that takes denial of service reports from an Arbor system and
    reports from the REN-ISAC Shared Darknet Project and feeds those reports
    into the storage engine.
  · Output agents - starting with an SMTP output agent - should be built and
    tested using real events in a limited fashion.
  · Define an API for sites to implement their own tools.

*Step 3 - Building a Routing Agent:*

  · A routing agent should be built to move the messages around and send
    them to the output agents for reporting. Eddy could perform this task.

*Step 4 - Build a Portable Client:*

  · A portable agent should be built to allow flexible access to reports.

The goal for Phase 1 is to build up the system and start collecting data
which is quickly replaced so developers can see how well the system handles
rapidly expanding load. Building the system in this way will also begin to
automate the reporting process both to REN-ISAC and from REN-ISAC to
individual sites, which is a near-term goal for REN-ISAC.
*
Phase 2:*

  · Work on feedback mechanisms for existing outputs to tie the user
    feedback into the reports which spawned them.
  · Work on human-generated reports. Get data entered by humans into
    normalized formats which can be used for more reporting.
  · Build out visualization mechanisms or at least agents to generate data
    for analysis. There should be enough data in the database after Phase 1
    to generate useful information.
  · Continue to build-out inputs and output types

Goals for Phase 2 are to use the data generated from Phase 1 and include the
more difficult to deal with human-generated data.

None of these phases have a set timeline however the sections in Phase 1 are
parts that need to happen in order. Once those major pieces are done,
REN-ISAC requirements can help decide what gets built next. Some development
can also be done concurrently.

*References:*

[1] http://www.ren-isac.net

[2] http://security.internet2.edu/csi2/

[3] http://www.cert.org/ietf/inch/inch.html

[4] http://middleware.internet2.edu/e2ed/

This project was supported by Grant No. 2006-DD-BX-K271 awarded by the
Bureau of Justice Assistance. The Bureau of Justice Assistance is a
component of the Office of Justice Programs, which also includes the Bureau
of Justice Statistics, the National Institute of Justice, the Office of
Juvenile Justice and Delinquency Prevention, and Office for Victims of
Crime. Points of view or opinions in this document are those of the author
and do not represent the official position or policies of the United States
Department of Justice.