Metadata Compression

One simple alternative to consider, if metadata file size is causing metadata distribution to become excessively slow, might be to compress the metadata before transfering it.

For example, the current InCommon metadata file is 7,139,737 octets in length

If compressed with bzip2 -9 (see http://en.wikipedia.org/wiki/Bzip2 ), the current file drops in size to 1,238,006 octets, just 17.3% of the size of the original file. Everything else being unchanged, I'd therefore expect the file transfer time to be proportionately less.

If compressed with xz -9 (see http://en.wikipedia.org/wiki/Xz ), the current file drops in size still further, to just 1,027,212 octets, just 14.4% of the size of the original file

While manual compression could easily be built into the process of preparing the metadata file, another option to consider might be mod_deflate, as discussed at
http://www.devside.net/articles/apache-performance-tuning , which would enable compression to be negotiated between client and the web server on the fly

  • No labels

5 Comments

  1. As far as I'm aware, we already have gzip enabled on the fly for any clients that support it.

    (As well as HTTP caching, of course.)

    1. Not sure the process is currently working.

      Just as a baseline, without requesting compression:

      % curl --verbose "http://wayf.incommonfederation.org/InCommon/InCommon-metadata.xml" > temp.txt
      * About to connect() to wayf.incommonfederation.org port 80 (#0)
      *   Trying 207.75.165.125... connected
      * Connected to wayf.incommonfederation.org (207.75.165.125) port 80 (#0)
      > GET /InCommon/InCommon-metadata.xml HTTP/1.1
      > User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
      > Host: wayf.incommonfederation.org
      > Accept: */*
      > 
      < HTTP/1.1 200 OK
      < Date: Thu, 27 Jun 2013 18:54:35 GMT
      < Server: Apache
      < Last-Modified: Wed, 26 Jun 2013 21:44:34 GMT
      < ETag: "c80cf-6cf199-4e0158d283480"
      < Accept-Ranges: bytes
      < Content-Length: 7139737
      < Connection: close
      < Content-Type: application/xml
      < 
      { [data not shown]
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100 6972k  100 6972k    0     0   630k      0  0:00:11  0:00:11 --:--:--  687k* Closing connection #0
      

      11 seconds...

      With compression requested:

      % curl --verbose --compressed "http://wayf.incommonfederation.org/InCommon/InCommon-metadata.xml" > temp.txt
      * About to connect() to wayf.incommonfederation.org port 80 (#0)
      *   Trying 207.75.165.125... connected
      * Connected to wayf.incommonfederation.org (207.75.165.125) port 80 (#0)
      > GET /InCommon/InCommon-metadata.xml HTTP/1.1
      > User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
      > Host: wayf.incommonfederation.org
      > Accept: */*
      > Accept-Encoding: deflate, gzip   <-- NOTE
      > 
      < HTTP/1.1 200 OK
      < Date: Thu, 27 Jun 2013 18:58:36 GMT
      < Server: Apache
      < Last-Modified: Wed, 26 Jun 2013 21:44:34 GMT
      < ETag: "c80cf-6cf199-4e0158d283480"
      < Accept-Ranges: bytes
      < Content-Length: 7139737
      < Connection: close
      < Content-Type: application/xml
      < 
      { [data not shown]
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100 6972k  100 6972k    0     0   625k      0  0:00:11  0:00:11 --:--:--  699k* Closing connection #0
      

      Same elapsed time/file size.

      Contrast that with what I see, for example, from NOAA. Uncompressed:

      % curl --verbose "http://radar.weather.gov/ridge/Conus/full.php" > temp.txt
      * About to connect() to radar.weather.gov port 80 (#0)
      *   Trying 23.59.191.19... connected
      * Connected to radar.weather.gov (23.59.191.19) port 80 (#0)
      > GET /ridge/Conus/full.php HTTP/1.1
      > User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
      > Host: radar.weather.gov
      > Accept: */*
      > 
      < HTTP/1.1 200 OK
      < Server: Apache/2.2.15 (Red Hat)
      < Content-Type: text/html; charset=UTF-8
      < Cache-Control: max-age=1257
      < Expires: Thu, 27 Jun 2013 19:29:25 GMT
      < Date: Thu, 27 Jun 2013 19:08:28 GMT
      < Transfer-Encoding:  chunked
      < Connection: keep-alive
      < Connection: Transfer-Encoding
      < 
      { [data not shown]
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100 36009    0 36009    0     0   891k      0 --:--:-- --:--:-- --:--:-- 2150k* Connection #0 to host radar.weather.gov left intact
      
      * Closing connection #0
      

      Versus:

      % curl --verbose --compress "http://radar.weather.gov/ridge/Conus/full.php" > temp.txt
      * About to connect() to radar.weather.gov port 80 (#0)
      *   Trying 23.59.191.40... connected
      * Connected to radar.weather.gov (23.59.191.40) port 80 (#0)
      > GET /ridge/Conus/full.php HTTP/1.1
      > User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
      > Host: radar.weather.gov
      > Accept: */*
      > Accept-Encoding: deflate, gzip
      > 
      < HTTP/1.1 200 OK
      < Server: Apache/2.2.15 (Red Hat)
      < Content-Type: text/html; charset=UTF-8
      < Vary: Accept-Encoding
      < Content-Encoding: gzip
      < Content-Length: 8199
      < Cache-Control: max-age=1207
      < Expires: Thu, 27 Jun 2013 19:29:04 GMT
      < Date: Thu, 27 Jun 2013 19:08:57 GMT
      < Connection: keep-alive
      < 
      { [data not shown]
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100  8199  100  8199    0     0  92768      0 --:--:-- --:--:-- --:--:--  802k* Connection #0 to host radar.weather.gov left intact
      
      * Closing connection #0
      

      Note that the file size in the NOAA case did change.

    2. > As far as I'm aware, we already have gzip enabled
      > on the fly for any clients that support it.

      No, the InC metadata server does not support HTTP compression. We considered this at one point but decided against it because 1) increases in throughput were deemed marginal, and 2) AFAIK no other federation in the world is compressing metadata (which introduces unacceptable risk at the client). One or both of these considerations may have changed in the interim, so we should definitely reconsider.

      > (As well as HTTP caching, of course.)

      Yes, the InC metadata server supports HTTP Conditional GET.

      1. Tom mentioned:

        > compressing metadata (which introduces unacceptable risk at the client)

        Can you talk more about the risk you see in this regard? The metadata would be cryptographically signed, right? So if anything went wrong during the download (whether do to a compression hiccup or something else), the newly downloaded metadata would be flagged as corrupted and would not get used, wouldn't that ameliorate the risk?

        Scott then commented, and Tom replied:

        >> (As well as HTTP caching, of course.)

        > Yes, the InC metadata server supports HTTP Conditional GET.

        I'd take it a step beyond that and suggest that it would be interesting to see if using something like Varnish Cache would further improve performance (see https://www.varnish-cache.org/ )

        1. >> compressing metadata (which introduces
          >> unacceptable risk at the client)
          >
          > Can you talk more about the risk you see
          > in this regard?

          Since no federation supports HTTP Compression, that particular Shibboleth feature has never been exercised in production. I don't want to be the first to do that (smile)

          >> the InC metadata server supports HTTP Conditional GET.
          >
          > I'd take it a step beyond that and suggest
          > that it would be interesting to see if using
          > something like Varnish Cache would further
          > improve performance

          I don't think Varnish will have any effect on metadata refresh. OTOH, the Federation Info pages might benefit from Varnish but there's no chance the TSG will support it.