Remote Monitoring RMON Network Manglement Jim Binkley 1
Remote Monitoring (RMON) Network Manglement Jim Binkley 1
Outline u general introduction – overview – rmon 1 and 2 groups – control theory u rmon 1 groups (some) u conclusion/summary Jim Binkley 2
RMON – means what u remote – – – monitoring aggregate stats for a network aggregate stats for a host for host X talking to host Y layer 1 and layer 2 and more u question: do we have the right information? u related question: how are networks evolving? u one more question: is SNMP the right approach? Jim Binkley 3
bibliography u rfc 1513, 1993 - token-ring extensions u rfc 1757, 1995, MIB 1 u rfc 2021, 1997, MIB 2 u rfc 2074, 1997, protocol identifiers (directory) u David Perkin’s RMON book u SNMP, v 2, v 3, RMON 1/2, Stallings Jim Binkley 4
rmon and OID tree iso(1) directory(1) X. 500 org(3) mgmt(2) mib-2(1) dod(6) internet(1) Jim Binkley system(1). . . rmon(16) rmon 1 & 2 5
rmon intro u rmon - remote monitoring u rmon I - stats at ethernet layer (MAC addresses, but not upstairs) u rmon II - stats at network and transport layers (IP addresses and tcp/udp ports) Jim Binkley 6
network analysis picture (trad) analyzer: in promiscous mode A Jim Binkley router (or switch) B analyzer: can hear A, B, to/from router traffic on traditional 10 BASE shared link 7
manager/probe manager sends get database item (OID) probe sends response Jim Binkley MIBS (sampled data) probe 8
basic idea/s: u all kinds of stats - but gathered on per link basis as aggregate – not by manager from every host on link u ethernet focus (token-ring support too) u rmon probe can run SOMEWHAT by itself and gather information – however manager needed for more complex functions (may have to suck out data on periodic basis due to lack of space) Jim Binkley 9
rmon 1 functions - overview u sample stats for all devices on ethernet link – ethernet level - e. g. , how many collisions – basic and history u derived statistics – for each host – top N talkers (who sent most bytes? ) – matrix of conversations SRC x RCV Jim Binkley 10
rmon 1, cont u threshold events – look for N events in elapsed time T – if found, send trap to manager – e. g. , N errors in one minute (too many) u packet data capture – filtering mechanism + capture – must work with higher level GUI in manager – goal: capture packets of interest/nice decode display Jim Binkley 11
rmon 1 - { mib-2 16 } u statistics(1) - ethernet stats > interface, roughly equal to dot 3 (but global) u history(2) - snapshots based on stats(1) u alarm(3) - ability to set threshold, generate alarm on interesting event u host(4) - per i/f host stats (global interface) u host. Top. N(5) - store/sort by top N hosts u matrix(6) - X talks to Y ( a few stats ) Jim Binkley 12
rmon 1, cont. u filter(7) - filter pkts and capture/or cause event u capture(8) - traditional packet analyzer u event(9) - table of events generated by probe u token. Ring(10) - never mind, but like ethernet stats Jim Binkley 13
rmon 2, still { mib-2 16} u protocol. Dir(11) - protocols understood by probe u protocol. Dist(12) - per protocol stats (bytes/pktcnt) u address. Map(13) - ip/mac mappings u nl. Host(14) - per host octet/byte counts u nl. Matrix(15) - host X talks to host Y u al. Host(16) - per host application octet/byte counts u al. Matrix(17) - application Z/X to Z/Y u usr. History(18) - sampling of any INT OID probe. Config(19) - info for manager on probe Jimu Binkley 14 setup/config
rmon 2: notes u application means “above the network layer” u both matrix groups have top N functions as well u note both protocol directory and probe configuration are there to help odds on manager/probe interoperability Jim Binkley 15
do we need a manager? u mostly . . . u simpler stats in rmon 1 could be gathered via net-snmp say but u higher level functions require complex manager with better than average GUI – rmon-2 in general (you want graphical histograms) – packet capture facilities in probe are lowerlevel and need higher level manager sw Jim Binkley 16 function
examples: u commercial (just one example, others exist) – cisco traffic director on workstation (manager) – cisco netscout probe on link – cisco mini-rmon in some switches u freeware versions ? ! – BTNG (it’s dead Jim) – there aren’t any. is this a surprise? – ourmon …(not SNMP-based) Jim Binkley 17
software complexity notes: u higher-level functions (e. g. , rmon 2 or rmon 1 data packet capture) – require copious memory/CPU – 100 mbit ethernet link. . . lots of data u easy to ask too much of system u probably best to not assume that manager A will interoperate with probe B Jim Binkley 18
possible rmon uses u what kind of questions might you ask? – how much IP vs IPX traffic? – how much traffic is web/news/ftp, whatever? – how utilized (full) is the pipe? – who talks to server X? – we have a problem with DHCP, we need to capture the packets and look? – global ethernet errors on this link are what? Jim Binkley 19
rmon control theory u in general rmon groups (except for stats group) consists of control rows and per control row data rows u e. g. , one interface might have a control row that specifies HOW to sample data on a delta T time basis (every 30 secs make a snapshot) u one or more data rows will be built up and stored in the probe, associated with that control row u note control row per i/f and possible to have more than one (different sample times) Jim Binkley 20
control rows(tables)/data rows(tables) abstract control row: index i/f time owner status associated data samples: index data #1 index more data, etc. . . Jim Binkley data #2 data #3 21
notes: u index mechanism must exist to tie together control and data rows u in snmpv 2, one may have index that is not in table (an array of structures say with an integer index and no such int in table) (true of RMON 2 groups) u view mechanism exists in RMON to allow additional time-based table thus – manager need only suck out NEW samples plus efficient access as index is creation time u manager must sometimes insert/enable control Jim Binkley 22 row (this is what status field is for)
notes, cont: u memory needs can be quite large u in some cases, samples will wrap u control tables limit # of buckets (number of sample sizes) u manager may need to show up and suck out data in a timely fashion Jim Binkley 23
statistics {rmon 1} u ether. Stats. Table/ether. Stats. Entry u ether. Stats. Index u ether. Stats. Data. Source - which i/f u ether. Stats. Drop. Events u ether. Stats. Octets - byte count, includes bad pkts u ether. Stats. Pkts, includes bad pkts u ether. Stats. Broadcast. Pkts u ether. Stats. Multicast. Pkts u ether. Stats. CRCAlign. Errors u ether. Stats. Undersize. Pkts (runts) Jim Binkley 24
stats, cont ether. Stats. Oversize. Pkts (giants) u ether. Stats. Fragments u ether. Stats. Jabbers - giants with problems (e. g. , CRC errs) u ether. Stats. Collisions - estimate of # of collisions u ether. Stats. Pkts 64 Octets u ether. Stats. Pkts 65 to 127 Octets u ether. Stats. Pkts 128 to 255 Octets u ether. Stats. Pkts 256 to 511 Octets u ether. Stats. Pkts 512 to 1023 Octets u ether. Stats. Pkts 1024 to 1518 Octets Jim Binkley 25 u
stats, cont. u ether. Stats. Owner u ether. Stats. Status Jim Binkley 26
statistics, notes: u simplest rmon group u note histogram mechanism for counts u one entry per interface on probe u no separate control table u similar to dot 3 in some ways, but dot 3 is per interface, not per network – can approximate by adding values together in hub or switch (? ) Jim Binkley 27
history { rmon 2 } u history. Control. Table (1) – history. Control. Entry (1) » row entries u ether. History. Table (2) – ether. History. Entry (1) » row entries Jim Binkley 28
history { rmon 2 } u history. Control. Table/history. Control. Entry u history. Control. Index - 1 -1 with values in data table u history. Control. Data. Source - which interface u historycontrol. Buckets. Requested - request for data slots u history. Control. Buckets. Granted - how many did you get u history. Control. Interval - per bucket sample time, seconds u history. Control. Owner u history. Control. Status Jim Binkley 29
notes: u each row when enabled causes sampling to begin on a certain interface – gathering of “buckets” (samples) in associated data table u u u note you can have more than one sample time on same interface (short period and long period, 1 minute, 1 hour) samples are stored during Interval, and then new entry is created once buckets. Granted is used up, the buckets will wrap and start rewriting the oldest buckets (circular buffer scheme) Jim Binkley 30
history data table u ether. History. Table/ether. History. Entry u ether. History. Index - matches control table u ether. History. Sample. Index - unique per sample u ether. History. Interval. Start - sys. Up. Time at start of sample u ether. History. Drop. Events u ether. History. Octets u ether. History. Pkts u ether. History. Broadcast. Pkts u ether. History. Multicast. Pkts u ether. History. CRCAlign. Errors Jim Binkley 31
history data table, cont. u u u ether. History. Undersize. Pkts ether. History. Oversize. Pkts ether. History. Fragments ether. History. Jabbers ether. History. Collisions ether. History. Utilization - function of ether. Stats. Octets and ether. Stats. Pkts Jim Binkley 32
utilization u this is fairly common in packet capture systems roughly over time T, how full was the pipe? utilization = packet overhead + bytes sent * 100% -----------------interval * bits possible on link on 10 BASE, bits possible would be 10**7 packet overhead due to preamble & interframe gap u packet overhead = packets * (96+64) u u u bytes Jim Binkley sent = octets * 8 33
utilization question/s: u how long should the period be? u how should this be interpreted with switches – interswitch (or switch to router) – servers – hosts – in light of full-duplex wires? » which should show NO collisions. . . Jim Binkley 34
hosts { rmon 4 } u host. Control. Table – host. Control. Entry » control rows u host. Table – host. Entry » data rows u host. Time. Table – host. Time. Entry Jim Binkley» data rows 35
host control table u host. Control. Table/host. Control. Entry – hostcontrol. Index – hostcontrol. Data. Source – host. Control. Table. Size – hostcontrol. Last. Delete. Time - last time data deleted – host. Control. Owner – host. Control. Status Jim Binkley 36
host. Table (data, not time sorted) u host. Table/host. Entry – host. Address - mac address – host. Creation. Order 1. . N, relative creation order – host. Index – host. In. Pkts – host. Out. Pkts - packet count – host. In. Octets - byte count – host. Out. Octets – host. Out. Errors – host. Out. Broadcast. Pkts && host. Out. Multicast. Pkts Jim Binkley 37
time table u host. Time. Table/host. Time. Entry – host. Time. Address – host. Time. Creation. Order – host. Time. Index – host. Time. In. Pkts – host. Time. Out. Pkts – host. Time. In. Octets – host. Time. Out. Octets (same as data table. . . here on out) Jim Binkley 38
notes: u one entry per host (mac) per interface u basically counts of bytes/packets in/out u time table is view (same data underneath) and is simply indexed by creation order – data table indexed by mac address Jim Binkley 39
host. Top. N { rmon 5 } u host. Top. NControl. Table – host. Top. NControl. Entry » rows u host. Top. NTable – host. Top. NEntry » rows Jim Binkley 40
host control table u host. Top. NControl. Table/host. Top. NControl. E ntry – host. Top. NControl. Index – host. Top. NHost. Index – host. Top. NRate. Base - one of seven variables (next slide) – host. Top. NTime. Remaining - time left in sample period – host. Top. NDuration - absolute time of sample period – host. Top. NRequested. Size – host. Top. NGranted. Size – host. Top. NStart. Time - when sample time started Jim Binkley 41 – owner/status
rate. Base - possible variables u host. Top. NIn. Pkts u host. Top. NOut. Pkts u host. Top. NIn. Octets u host. Top. NOut. Errors u host. Top. NOut. Broadcast. Pkts u host. Top. NOut. Multicast. Pkts Jim Binkley 42
data table u host. Top. NTable/host. Top. NEntry – host. Top. NReport - matches host. Top. NControl. Index (which report) – host. Top. NIndex - per host in report – host. Top. NAddress - host mac address – host. Top. NRate - amount of change in selected variable for this report period » variable selected in host. Top. NRate. Base Jim Binkley 43
matrix group (in brief) u basically source by dest mac – count of pkts/octets (pkt count/byte count) Jim Binkley 44
alarm { rmon 3 } u alarm. Table/alarm. Entry – alarm. Index – alarm. Interval - data sample period – alarm. Variable - OID of variable being sampled – alarm. Sample. Type - absolute or delta (previous sample) – alarm. Value - value during last sample period – alarm. Startup. Alarm - rising/falling or both – alarm. Rising. Threshold – alarm. Falling. Threshold Jim Binkley 45
alarm { rmon 3 } u alarm. Table/alarm. Entry –. . . cont. . – alarm. Rising. Event. Index – alarm. Falling. Event. Index – alarm. Owner – alarm. Status Jim Binkley 46
how this works (overview) u if value (counter/gauge) crosses rising threshold (and rising specified) – then generate alarm u if value crosses falling threshold (and falling specified) – then generate alarm u delta threshold sampled once period u use to look for too many errors during Jim Binkley period X (or your idea here. . . ) 47
event group (summary) u can generate – traps sent to monitor – events stored in local event table (log history of events) u both packet capture and alarm group can cause events stored here Jim Binkley 48
conclusion - summary of capabilities u remember that measurement may have two poles, relative to length of time samples: – 1. baseline of data over time – 2. measurement of what is going on NOW u snmp focus generally on set of objects at one node - rmon focus on wire itself u over-generalization, but rmon helps you focus on NOW and the general LINK Jim Binkley 49
and the problem is: SWITCHES u switches, of course and the “death of promiscuous mode” u instead of link focus, we can have all ports on switch focus, or vlan X on switch focus, or ports 1, 2, 3 on switch focus u however we won’t be able to see all traffic on a broadcast domain u rmon too expensive for cheaper switches at this time have to focus on key backbone switches Jimu Binkley 50
bigger cisco switches u have mini-rmon; e. g. , ethernet stats/rmon 1 u SPAN function to allow you to hookup external sniffer/rmon probe and suck down packets – aka port mirroring (ports/vlan, etc) – NOT inter-switch Jim Binkley 51
keep in mind: u rmon has LARGE # of function points u other tools exist that may have rmon-like feature sets (but not all of it) u e. g. , packet capture freebies – tcpdump, snoop, etherfind (latter 2 on sun) – trafshow, arpwatch (show traffic of various kinds in some kind of real-time display) Jim Binkley 52
some general tools in this area u Cisco netflow – aggregate flow stats, UDP-based collection u HPOV event generation u ntop – open-source tool – like ourmon in some ways but details differ u ourmon – open-source tool – network mgmt/anomaly detection Jim Binkley 53
what is the real problem? u too much data not enough analysis – I don’t want all the flows – networks are evolving » p 2 p/skype/irc/games etc. » meaning protocols are not IETF-based – security problems are evolving too » today TCP worms rule » agobot/phatbot/rxbot – black hats have tools Jim Binkley 54
- Slides: 54