An interruption in the eduroam service occurred in the morning of 6/13/22 beginning at approximately 11:24 AM EST and was resolved by 12:34 PM EST.

The interruption was caused by an issue in the service's rate limit logging that filled up the disk. Once the disks were filled, the eduroam service became unavailable. Since the disks were filled with system logs, a privileged user was needed to purge the logs and reclaim disk space. The servers are configured to disallow direct logins to privileged accounts, so the usual method is to access unprivileged accounts, and use the 'sudo' privilege escalation system which became unavailable.

The resolution of the issue required the server on which rate limiting was running to be rebooted into a single user mode, the log files deleted, and the server started again. 

Going forward, the service has been modified in several ways to remediate the issues that led to the incident. First, the relevant logs have been configured to rotate, archive, and be deleted on a periodic basis. Next, the virtual console environment has been configured so that an administrator can access a privileged account from it, without needing to boot the server into single user mode. In the event that the server should need to be booted into single user mode again, the virtual console and boot media are now also in place. Finally, in a future service update, the service will be modified to limit the rate at which these logs are generated.

  • No labels