Hyperion High Volume Stream Archival Divya Muthukumaran Area
- Slides: 24
Hyperion : High Volume Stream Archival Divya Muthukumaran
Area n Network Monitoring n Identify problems due to overloaded and/or crashed servers, network connections or other devices n Example: To determine the status of a webserver, monitoring software may periodically send an HTTP request to fetch a page
Live Monitoring n Packets are examined in real time n Compute and continually update traffic statistics n Discard the captured packet headers once examined n Why the need to store packet headers?
Live Monitoring n Packets are examined in real time n Compute and continually update traffic statistics n Discard the captured packet headers once examined n Why the need to store packet headers? n Example: Network forensics n n To go back and examine the root cause of a problem Ex: See how an intruder gained entry, How a worm infection happened
What is the need of such a system? Querying and examining live data n Data Archival Capture the data at wire speeds, Index and store them n Efficiently support retrieval and processing of archived data n Specifically designed to handle needs of high volume stream archival n
Why not traditional databases? n Some statistics n A single GB link can generate over 100, 000 packets and tens of MBs of archival data. n A monitor may record from Multiple links.
Design Principles n Support Queries not reads n Implies the need to maintain indexes n Writes n Sequential and Immutable n Archive locally , summarize globally n Scalability Vs Need to avoid flooding n n Scalability: Favors local archiving and indexing to avoid network writes Need to answer Distributed queries: favors sharing information across nodes
Hyperion Three Key components n Stream File System n High volume archiving and querying n Multi-level index structure n High update rates + reasonable lookup performance n Distributed index layer n Distributes a summary of local indices to enable distributed querying
Design choices for the Hyperion Storage System n Storage of multiple high-speed traffic streams without loss n Support for concurrent read activity without loss of write performance n Re-use of storage in a buffer-like fashion
Stream File System n Stores Streams as opposed to files n Characteristics n Recycled : When storage is full new data replaces old data. n In a GP File system new data is lost old is retained Immutable n Record-oriented: data is written in fixed or variable length records n
Can we use a GP FS? n Need to map streams <=>files
Log. File Rotation
Stream FS
Stream FS Organization n Los-structured FS n What problem? n Cleaning/Garbage collection n Stream. FS solves the cleaning problem n Guarantee : Storage guarantee for each stream n Small segment size n Check if next segment is a surplus. If yes then overwrite , otherwise skip.
Stream FS Organization Los-structured FS n What problem? n Cleaning/Garbage collection n Stream. FS solves the cleaning problem n stream n Small segment size (1 or ½ MB) n Guarantee : Storage guarantee for each n n Check if next segment is a surplus. If yes then overwrite , otherwise skip. Advantages? n n Storage Reservation Best effort use of remaining storage
Reads n First get index n Use index to get data n Persistent Handles n Returned from each write operation n Passed to read op to retrieve data n What does the handle contain? n n Disk location , approximate length Allows data to be retrieved directly
Handle issues n Validate the handle. How? n Self certifying record header n Id of the stream n Permissions of the stream n Record length n Hash (used for validating the handle)
Stream FS Organization n Record n Variable length n On-disk record + header n Block n Fixed length n Multiple records of the same stream n Block Map n Every nth block n (stream ID + in-stream sequence number for each of the preceding n-1 blocks) n Used for easy write allocation
Stream FS Organization
Indexing n Uses signature based Indices n Signature for each segment Ø Can check if a record with a key k is present in the segment or not Ø Does not tell you where the record is present in the segment
Multi-level Indices
Multi Level Indices n Uses a Bloom Filter n Hash (key) -> b bits n In b bits k bits are set to 1 = Hs (Signature) n How to check for presence of a record? n H(key 1)||H(key 2)…||H(keyn) Compute hash of its key kr, H(kr) n If a bit in H(kr) is set but not set in Hs then the value is not present n False positives n
Distributed Index n How to handle distributed queries without flooding? n Maintain distributed index n Integrated view of all nodes n Coarse-grain summary of data at each node is needed n Can use the top level index in the Hyperion n One index node per time interval n All nodes send their top-level indices to this node n Temporally–distributed index
- Divya muthukumaran
- Hyperion strategic finance
- Stephanie ann archival
- What is archival data
- Encoded archival context
- Archival resource key
- Differentiate byte stream and character stream
- Divya menon md
- Dr divya menon
- 108 mangalasasana divya desams
- Divya k konoor
- Characterize
- Divya sengar
- Divya kamboj
- Mse iitd
- Divya lobo
- Divya bhaskaar
- High gradient stream
- Hyperion enterprise
- Hyperion timeline
- Hyperion planning workflow tutorial
- Hyperion hr
- Hyperion migration from version 9 to 11
- Hyperion brio
- Anaplan vs sap businessobjects bi platform