Netflow Collection Processing David Ripley David A J
- Slides: 33
Netflow Collection & Processing David Ripley
David A. J. Ripley MSc. , ARCS daripley@indiana. edu Lead Network Security Developer, Advanced Network Management Laboratory Indiana University Network security infrastructure development and research for the ANML. Background in physics, image processing, satellite remote sensing, system administration. 2
Overview • What is a “flow”? • What is Netflow specifically? • Netflow collection infrastructure. • Netflow processing, problems and issues 3
Netflow Recap • Q. What is a flow? • A. In a general sense, a flow is a series of packets with some attribute(s) in common. 4
Netflow Recap • Common attributes define a flow • • Source and/or destination of the traffic. Protocol - TCP, UDP, ICMP? Timing - start, end, and duration of the traffic. Routing information - interfaces, AS, etc. 5
Netflow Recap • Flows can be unidirectional or bidirectional - the latter adds possible information. • Aggregated flows. • Application flows - classify packets by inspecting their contents • We’re not going to worry too much about these cases. 6
Netflow Recap • As far as we’re concerned, a flow is a series of packets with the same: • • • IP Protocol (UDP, TCP, ICMP) Source and destination ports Source and destination addresses 7
Netflow Recap • The recording of a flow is subject to idiosyncrasies of sampling frequency and sampling window • Bucket timeout - systems typically consider one minute windows. • • • Flows longer than one minute will appear as two flow records Multiple flows (with the same characteristics) within a single one minute window will appear as a single flow record Sampling rate • Router will only consider one out of every N packets; N=? ? ? - data loss vs. expensive operations. 8
An example • Host A gets a web page from Host B • This will show up as two flows (usually) • • Host A, port 12345 Host B, port 80 Host A, port 12345 9
Why Netflow? • What kinds of information can we gather? • • • What percentage of traffic on the network is web traffic? ssh? IRC? What is the average transfer rate for network communications? Who uses the network the most? Have usage patterns changed over time? For the Chicago region, how much of the traffic of the region is staying in the region? Many others 10
Why Netflow? • Historically, traffic accounting, acceptable use enforcement; • • Researchers and engineers needed to answer all kinds of questions about network traffic. Traffic accounting in the form of flow records provided that information. 11
Why Netflow? • Traffic Engineering/Accounting • How traffic is shared with competitors; how customers are billed. • Security/Policy monitoring • Do. S/DDo. S detection • Research • • Measuring the growth of networks Identifying how the network is being used. 12
What data is there? • It depends. • We keep talking about “flows” - we really mean Cisco’s Version 5 flow records • • A Cisco-defined “standard” Used on Abilene - so that’s what we use. 13
Netflow Version 5 • Cisco-defined de-facto standard • Efforts are underway in the IETF to make this standard official • Flows are exported as UDP packets • • Each packet contains a number of flow records plus a header with information common to these records Delivery is not guaranteed! • There are sequence numbers so we know how many packets we’ve lost. 14
Netflow V 5 Header Byte 1 Byte 2 Byte 3 Byte 4 Version Count Sys. Up. Time UNIX Seconds (seconds since Epoch) UNIX Nanoseconds (residual nanoseconds) Flow Sequence Number Engine Type Engine ID Reserved 15
Netflow V 5 Record Byte 1 Byte 2 Byte 3 Byte 4 Source IP Address Destination IP Address Next Hop IP Address Input if. Index Output if. Index Packets Bytes Start time of flow End time of flow Source port Padding Destination port TCP Flags IP Protocol Source AS Source Mask Length TOS Destination AS Destination Mask Length Padding 16
Convenience, or lack of it • Flow records are exported in a format that is convenient for the router, not for us. • • • e. g. The flow start and end times are in a form that is not immediately useful, milliseconds since system boot. We have to combine data from individual flow records with header data. • Seconds since epoch is the Right Thing • Flow Start Time = Unix Seconds + Unix Nanoseconds - sys. Up. Time + flow_start • (After we’ve converted all these to the right units) ICMP Type is stored in the destination port field 17
Examining Netflow • Part of our job is using netflow data to see what’s happened/is happening on the network • We spend a significant amount of time processing the archived data looking for particular behaviors. • Typically in response to institutional requests 18
Netflow Collection • We collect flow data from Abilene core routers. • • Archives raw records (up to 3 months) (Redirect to other lab machines) • Primary data source for research & responses to operational issues. 19
Problems with Preprocessing • We can do all kinds of pre-processing ahead of time. • • You rarely know what kind of behaviour you’re going to be looking for ahead of time. You can’t cover all the bases Waste time generating products that you’ll never use. But there are some simple things that are very useful.
MS-RPC (Attempts) 21
MS-RPC Infections (Maybe) 22
Traffic Graphing • Something as simple as graphing traffic volume can be a pain in the neck • • How much traffic went to/from a given range of addresses, IP Ports, etc. Often done using counters on routers • There are serious performance issues with this; the number of counters is limited. • It’s relatively easy if you know what you’re looking for • But we need perspective; we have to be able to turn back the clock • Using counters on routers just doesn’t work for this.
Traffic Graphing • Even with services running on known ports, there are too many in use to record all of them using routers • “bad” traffic has a habit of turning up on odd ports • It’s kind of obliged to.
Traffic Graphing • 2^16 Source ports, 2^16 destination ports; • • A lot. We can get this information from the netflow archive; But it’s a lot of detailed data to plough through, takes a long time. We can aggregate the data as it comes in. • Even more hosts/networks than ports • It’s hard to estimate the number of hosts; • Somewhere around 9 or 10 million on Abilene
Traffic Graphing • Simple aggregation of flow records • • • 15 minute intervals (convenient given archive granularity) Break data into ICMP/TCP/UDP Aggregate by source port, destination port, source address, destination address, and AS number
Traffic Graphing • How do we go about this? • • Some cron and some Perl scripts aggregate new flow records and put them into the database every half hour There’s a web front end so we can take a look at the graphs.
Traffic Graphing
Traffic Graphing • This is not exactly rocket science; • • And yet not many people do this kind of thing. We get requests all the time • “Can I see the traffic on ports X, Y and Z for the last couple of weeks?
Traffic Graphing • Upside: • We can generate a historical view of traffic to or from any source or destination port; any Autonomous System; or any IP address or prefix. • Downside: • Aggregation means loss of data; • • Plot traffic to a given port, you lose IP info and vice versa. It still takes a while (but only a few minutes)
Traffic Graphing
Traffic Graphing
Vague Questions • Why is this important? • Perspective matters. History teaches us, even if it’s just the history of network traffic over the past couple of weeks. • Why isn’t it more common? Why doesn’t everyone do it? • Because they don’t think it’s especially important • It’s rather broad, isn’t it? • Macro and micro.
- Cisco show top talkers
- Ipfix probe
- Netflow probe hardware
- Hadoop netflow
- Collecteur netflow
- Traffic accounting
- Moloch netflow
- Cisco netflow to xml
- The talented mr ripley chapter summary
- Ripley's k
- Ripley dynamics
- Ripley law
- Expositori
- Tears of a tiger summary
- Numero ripley despacho
- Martin ripley
- The talented mr ripley chapter summary
- Roch ripley
- Landsat collection 1 vs collection 2
- D/a 30 days after sight
- Top.down processing
- Bottom up processing vs top down processing
- Bottom-up processing example
- Neighborhood processing in image processing
- Primary processing vs secondary processing
- Define point processing
- Histogram processing in digital image processing
- Parallel processing vs concurrent processing
- Neighborhood processing in digital image processing
- Image processing
- Morphological dilation
- Bottom up vs top down psychology
- Interactive processing
- What is harvest