Page History

Panel
STILL BEING EDITED, NOT FINAL

Meeting Minutes from 2012 Joint Techs in Stanford

Agenda Overview

Welcome -- Ken Miller

July 16, 2012

Welcome from Ken Miller, Co-Chair of Performance Working Group

==================== Mobile Testing project - Professor Byun

Wireless Broadband Measurement in California
YoungJoon Byun, Cal State, Monterey Bay

see slides

wants recommendations, suggestions, and general impression of activities

...

See Slides

Overview of the Wireless Broadband Measurement Project

sponsor is CPUC (California public utilities commission)
part of ARRA grant, administered by NTIA

...

state-wide testing 2x/yr through 2014

...

tool developed to measure wireless performance

...

goal is to objectively evaluate major providers of mobile wireless across state of California
currently analyzing results
updating software for second field test in fall.
all data available
see http://calbroadbanddrivetest.blogspot.com/

Comments / Questions

Q: Are there any issues with server placement?

A: We

california state university - chico
real measurements throughout state
1200 different locations

--
purpose
att t-moible, sprint, verizon are major carriers

objective eval major providers across state of california
want to provide californians w/info (so public)

at end of project: map, and summarized white paper, and code is open/available
www.broadbandmap.ca.gov

goal, put address in, get results.
data shown is not thiers. they want to provide better info.

use two servers to measure: one in CA, one in Virginia
developed phone and data card
4 different types of data card
4 different types of android phones

click button as tester, go to west server, and east server
results pushed to database server

viewer for pc and phones to see results

Amazon server, EC2

Wiki Markup
measure: lat/long date/time provider network type (LTE, UMTS, HSDPA, etc) RTT tcp up/down speed UDP jitter and loss traffic shaping \[?\]    via glastnost

put device in car, software, measure lat/long
fixed or driving?
(or get from android phone, but GPS info not always correct, was issue, have to go external)

timer to kill app if its' so weak
-> because iperf goes on forever

--
think about cost. drive 40 min, go to one loation, measure one time, expensive
tester, test as many as possible.

test seq

connectivity, iper TCP to 2 servers
if no response in 4 sec, give up no signal.
ping to west server
iperf TCP testing to 2 servers
pign to east
iperf UDP testing
upload test results to server

how about glastnost? takes too much time.
10 min to complete
5%, 60 locations, do it.
not do ever

q: why tcp twice
alpha test, 3 times
one test, 10 locations...
if try 3 times, takes too much time

q: test all carriers at same time?
yes
sequential and random

q: meas performance of phone rather than of network
good question;
use same phone handset around
user experience, though

q: impression that new samsung phone, s3, be able to run app with more cpu leftover.

try to use very powerful devices

Wiki Markup
gui  \[tester gui, developer gui\]

unique location numbers
first version, laptop cannot automatically recognize provider
automatically identify

--
calbroadbanddrivetest..blogspot.com

from initial testing

students were testing

antenna inside car, but outdoor
not w/in bouding

8 testers, 35,000 miles
4 smartphones and 4 dat cards
10 sits
day
1200 randomly selected locations
urban 23%, rural 67%, tribal 11%

results...

issues
gps locations (sometimes can't)
testing difficults... drive and measure, drive and measure
humans make mistakes
q: davis at 3, palo alto at 5, time of days have different congestion
   yes, is issue.
no effective service area
   (can't ping for 4 sec)
   is that the right thing.

server congestion
at morning, everyone test at 8am. oops.

many data cards in single laptop, hard to configure

currently analyzing results
no accidents :)
updating software, for second field test in fall.

all data available.

any issues with server placemant?
looks pretty good
just used EC2, with placement in east and west coast

final results could be averaged

...

when experience server congestion, in virtual environment, spin up another one.

...

looking for more in second field trial.

q: could do one per tester.
overall capacity fine. just start ofday.

q: Q: Are you doing anything to control who can access testers?
no A: No protection today

where collecting data?
get stored files, and push back

Wiki Markup
look at any server data? just client data \[less accurate?\]

up/down
up then down

...

Q: Are you looking at any server data?
A: No, just client data

curious to see if you need to do east/west thing

...

Could speed up by factor of 2 if only do one.

ybyun@csumb.edu

...

Any suggestions, comments, please contact
ybyun@csumb.

...

want a chance to meet

ESnet lookup service - Sowmya Balasubramanian

gather requirements, look at use cases, and revamp design.

designed several years ago
but increased scale has stressed and looks like the trajectory is bad

add security.

list of requirements;
based on use cases and current load

Wiki Markup
10,000 to 100k records \[next ten years easily\] query time < 1sec.   \[else user gives up\] \[200ms?\] registration time <1hr \[4-6 hrs to propogate today\] validate services have not been forged

q: 1sec, recall studies 250ms
as long as first resutls 250ms...

edu

=========

ESnet - Simple Lookup Service (for perfSONAR and beyond)

Sowmya Balasubramanian, ESnet

see slides

Simple Lookup Service Goals:

meet needs of growing perfSONAR community
Simple API
Extend to non-perfSONAR services
Security

Design:

REST/JSON API
Backend: MongoDB
Flexible Architecture

Q: Assume one query?
A: Noq: assume one query?
no, but want to make sure simple query < 1sec (heard "on average")--
To simplify API, going with REST and JSON record management (regisr/edit)
query api (get stuff)

http get (pull)
pub/sub with http streaming (push) Wiki Markup[http://odev-vm-7.es.net/lookup-service-examples]  \[dev vm right place\]design - data represeantion (so... change that design?) well defined set of key/value pairs, but users can add too

mongo d

--
testing

...

new Lookup Service and older one are on same host

...

new Lookup Service ls is 95% faster than old one
1min-> under

...

1sec

...

ESnet is using

...

the new Lookup Service
watch for alpha http://ps4.es.net:8085/lookup/services

timeframe...
few weeks

GLS.

what need Q: What is needed to do to move current isntallationinstallation? store same stuff

A: Stores the same data, different formatreally how

How it works today:
index servers

...

pulls from lookup servers, create csv

...

and use csv for initial location finding

...

can modify script

...

to talk to both

...

(or convert index servers to

...

new)

...

q: how do poeple do it today
old is soapy, this is json

ps tkit, registration,
upgrade switch
since consumption dont' calt directly, can move.

...

old approach is SOAP, new approach is JSON

GENI doing it the same way

...

new pushes to new

...

; old ones pull old one

...

Have compatible API for GENI uses

==== too.-===-
sFlow: data network viz and control

now what sflow is?
sflow monitoring servers and apps.

where it's evolving, and go to questions.

probably have in network, think about turning it on
space for cisco

...

SFLOW Data Network Visibility and Control

Neil McKee, InMon Corporation

See Slides

http://www.sflow.org/

http://blog.sflow.com/

sFlow: widely supported industry standard
based on virtual network and switches

servers, hypervisors, virtual switches

2 mechanisms w/sflow that help
- de-synchronized, parallel push
auto push a full set of SNMP ifTable stuff

- packet/transaction sampling

monitors all protocols
captures packet path
senders all open source & free
replaces counter polling
allows you to do lots of things

...

IP address,

...

URLs,

...

app attributes...
things impossible to get all together

...

, but

...

needed for situational analysis

sFlow samples packet headers
collector decides what to analyzie
hence can get new stuff really quickly
no firmware on switches, just software collector

Wiki Markup
\-\- captures packet path   where in and out of device   thread to find phys topo, and locate hosts to swtich ports w/in one min    \[???\]

--
arch
agents as simple as possibe, move stuff to collector

senders all open source & free

host sflow, sends mac addrs, so can join with packets
apps: get socets, underlying hypervisor load, and packet paths
enough stuff to join and ..

host stuff:
host-sflow.sourceforge.net

app monitring
that's the new stuff.

nfs/cifs. filepath, bytes, how long, soct)
web requrest. apache, nginx...
memcached lookups... memcache clusters...
database queries.
some playing, but add if you're motivated

have json-api. fashionable, and easy to add
so app can add information.
fire and forget.

XenMotion bandwidth, how does it look

see response time in perforamance of memcache cluster

Brian T's netprobe(?)

monitoring web farm
and see tranaction detail
and see correlations

carve by app response time, is way to correlate app performance & delivery, with underlying infrastructure conspiring to deliver that.

dip when app stops.
way to pull things apart, w/o overloading anything.

Wiki Markup

why mon everything, 2 good 1 real reaon
1. troubleshooting - always have context
2. putting network and server teams on same page
&nbsp;&nbsp; so see\!&nbsp; \[cloud services\!\!\]
3. full observability required for automated control
&nbsp;&nbsp;&nbsp;&nbsp; control theory 101.
&nbsp;&nbsp;&nbsp;&nbsp; to automate closed loop, have to report all
guy who designed sflow is control engr

...

now monitors more than just packets; there is also server instrumentation and it can be extended to applications

Why monitor everything with sFlow?
1. troubleshooting - always have context
2. putting network and server teams on same page
   (cloud services)
3. full observability required for automated control
     control theory 101.
     to automate closed loop, have to report all

sFlow and OpenFlow are complementary.
OpenFlow can control
if you have viz at same time, opportunity to close loop

...

, research topic, but looks promising

danger w/openflow featuers to use for accountinga nd Q: Is there any danger using OpenFlow features for accounting and control?
much better A: Better to use wildcards when possible to openflow instead of OpenFlow controls.3

open standards that work well
netconf xmpl standards to set up /

...

configure
forwarding

...

OpenFlow controls make sense

...

=====

The Challenge

Ken reminded the group of the challenge regarding needs for next-generation tools:

https://spaces.at.internet2.edu/display/PerformanceWG/Challenge+of+January+2012+re+Needs+for+Next+Gen+NOC+and+End+User+tools

=====

Pennsylvania State University WAN Metrics Project

see slides

====

Next meeting of Performance Working Group

Internet2 Fall Member Meeting

Thursday, Oct 4, 7:30am - 8:30am

http://events.internet2.edu/2012/fall-mm/agenda.cfm?go=session&id=10002569&event=1149

cisco has a similar story, with a proprietary system

blog.sflow.com -> peter files musing
sflow.org to see if equpment supports it.

bgp stuff at oboarder, vs netflow?
sflow allows full bgp + as paths to be sent
very high-value measuremnt, to look at as paths and peering arrangment
allow you to break down by ip addr, subnets, protocols, min myb min
pull for accounting and routing perf

if routers don't support htat, can peer with router and pull aspaths in
and splice in to sflow/netflow feed, and do similar analysis

wan pov, realtime access allows for attack analysis

so a reason to find into wan routers as well as l2 switches.

PS PXE booting (brief)

     to The Challenge - Ken
    Mentions -
    - I2 description of the proposed speed test tool
    - Penn State PXE booting pS-Toolkit

Community Updates / Open Forum

any other updates
any other questions

how many people lookng at sflow

how planning/getting ready for big data challenges on networks
jim/ussc
use statseeker for all counter data
gobs of netflow

ericp: q about sflow
100G, how do that. single device how fast can go

running on brocade 100G today
turned on at SC, and it worked.

much easier for device to d sflow
sampling, decodding, aggregating, then flush out
sflow - sampling and send.

q: standard sampling rate, or all over the place
not faster than you need to
1/1000 and tweak

high level 1/40000 stillg et good data

security guys faster and faster
everyone else smooth and setady

6500s in core
10g across network
10g campuses
killing cpu when turned up more interfaces
so switched to brocade at borders, sflow
l2-7 info. before l3 only.
see more what's going on
initiative to look at core

sflow enabled dev on core, go to 100G.
can scale easily. afraid of what netflow would do if couldn't handle
10G inks

turn netflow reshalls and some things
put switch inline to do sflow to make work :)

XMRs at boarder
XLMs in core
juniper allu

sup720s out of gas
yes. prototype some new cards too, more power but still a lot of cpu

sec group, has mirrored port off of border routers
use bro cluster
get every packet off router

use sampling to trigger for security too now

Space shortcuts

Child pages

Versions Compared

Old Version 5

New Version Current

Key

Meeting Minutes from 2012 Joint Techs in Stanford