Stanford Linear Accelerator Center Oracle Archiver Past Experience

  • Slides: 20
Download presentation
Stanford Linear Accelerator Center Oracle Archiver Past Experience Lessons Learned for Future EPICS Channel

Stanford Linear Accelerator Center Oracle Archiver Past Experience Lessons Learned for Future EPICS Channel Archiver RDB Development R. D. Hall EPICS Collaboration Mtg Oct. 13 -14, 2007 1

Topics • • • Background System Elements Key Technologies Retrieval Considerations Table Structure Summary

Topics • • • Background System Elements Key Technologies Retrieval Considerations Table Structure Summary R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 2

Background • Designed and developed from late 2001 through 2002 by Lee Ann Yasukawa

Background • Designed and developed from late 2001 through 2002 by Lee Ann Yasukawa and Bob Hall utilizing version 1 of the EPICS Channel Archiver • Thomas Birke of BESSY developed the Oracle Lib. IO C++ classes used to support retrieval of Oracle data • Was in production use at SLAC from February 2003 until the end of 2004 R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 3

Factors in Discontinuing Use • The new version 2 of the Channel Archiver became

Factors in Discontinuing Use • The new version 2 of the Channel Archiver became available with many improvements. We could also use the Archiver Viewer with this version, which gave our users an improved browser. • The retrieval speed of the file-based version 2 channel archiver was better than the Oracle archiver. • Discovered in late 2004 that the Stanford license for Oracle did not cover our use of the Oracle partitioning feature at SLAC, which would cost our group approximately $ 10 K. R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 4

System Elements • Oracle Channel Archiver Engine based on the version 1 EPICS Channel

System Elements • Oracle Channel Archiver Engine based on the version 1 EPICS Channel Archiver • Channel Archiver Monitor program running every 5 minutes to check the health of the Archiver Engine by reading the Engine log. • Archive Browser written in Python and using SWIG to be able to call Lib. IO C++ routines. Capable of browsing scalar data in a plot window and storing data of interest into Matlab files for further analysis. • Data may also be viewed using the version 1 EPICS Channel Archiver CGIExport facility. • Matlab Archiver Retrieval facility written in C using Matlab MEX to retrieve Oracle Archiver data via function calls from within Matlab. R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 5

Oracle Archiver SCS AFS File Server MCC Computer Room Web Browsers (for CGI Export)

Oracle Archiver SCS AFS File Server MCC Computer Room Web Browsers (for CGI Export) Web Server 2 -T 3 Disks Retrieve Oracle 9 i Client Web Browsers (for CGI Export) RAID Controller SQL Listener Oracle 9 i Instance Apache Listener Archive Browser TARF Archiver (opi 00 gtw 04) R. D. Hall PEPII Archiver (slcs 2) EPICS Collaboration Mtg 2007 8 Pack Archiver (opi 00 gtw 05) Oct. 13 -14, 6

Key Technologies • Oracle Application Direct Path Storage • Oracle Partitioning R. D. Hall

Key Technologies • Oracle Application Direct Path Storage • Oracle Partitioning R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 7

Direct Path Storage • The Oracle direct path mechanism bypasses the normal, conventional-path Oracle

Direct Path Storage • The Oracle direct path mechanism bypasses the normal, conventional-path Oracle buffering mechanism used with conventional SQL INSERT statements to write directly to Oracle data files. • Results in dramatic increases in data storage rates when loading large amounts of data into Oracle. • During scalar EPICS data performance testing over 16, 000 samples per second could be stored. R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 8

Direct Path Storage • Oracle Archive Engine calls C routines that call Oracle Call

Direct Path Storage • Oracle Archive Engine calls C routines that call Oracle Call Interface (OCI) routines to store data into Oracle tables using the direct path mechanism. • OCI routines are low-level, C callable routines optimized for performance and provide the only API available for using direct path storage. • OCI routines are quite low-level and require considerable programmer time and effort to call correctly. R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 9

Direct Path Storage • Blocks of data (e. g. , 2 K rows) are

Direct Path Storage • Blocks of data (e. g. , 2 K rows) are stored together, not individual rows. This means that data is not stored until a block is full or logic is added to flush partially filled blocks periodically. • Direct path loading locks the storage table for any other Oracle operations other than non-indexed SELECT statements. Therefore, active data storage tables cannot be indexed and other methods must be used to improve data retrieval performance for these tables (e. g. partitioning). • A restricted number of column datatypes are supported for direct path load operations. R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 10

Partitioning • Partitioning allows active data storage and archive data storage tables to be

Partitioning • Partitioning allows active data storage and archive data storage tables to be subdivided by time ranges. • This feature can dramatically speed data retrieval requests due to partition elimination. That is, requests for data for a specified time span from a partitioned table subdivided by time ranges allows Oracle to search only those partitions that may contain data for the specified time range rather than the whole table. • This is critical to retrieval performance when data tables become very large. R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 11

Partitioning • Partitioning can allow simplified data maintenance. Maintenance operations that need to be

Partitioning • Partitioning can allow simplified data maintenance. Maintenance operations that need to be performed on only a subset of partitions do not need to be applied to an entire table. • Data from old partitions may be conveniently stored to table and dropped from online storage. R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 12

Partitioning • At SLAC relatively small 10 minute partitions were used for the active

Partitioning • At SLAC relatively small 10 minute partitions were used for the active storage data tables to allow fast retrieval of the most recent data, which tends to be the data most requested by users. • After midnight each night a job ran which effectively moved the data in the 10 minute partitions of the previous day for each data storage table into a day long archive partition, which was then indexed to improve data retrieval performance. • This strategy allowed reasonably fast data retrieval while keeping the number of partitions manageable. R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 13

Daily Processing of Partitions • This was accomplished by copying all partitions for the

Daily Processing of Partitions • This was accomplished by copying all partitions for the previous day into a temporary table, validating the data in the table and creating indexes, and then exchanging the temporary table (along with the indexes) with a partition in the archived data table. • This method allowed the use of the Oracle PL/SQL “alter table exchange partition” command, which is very fast and does not involve the physical copying of data. R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 14

Daily Processing of Partitions (Scalar Data) arch_data_f arch_data_i temp_data_f temp_data_i daily partitions ~10 min

Daily Processing of Partitions (Scalar Data) arch_data_f arch_data_i temp_data_f temp_data_i daily partitions ~10 min partitions All partitions for yesterday are copied to a temporary table R. D. Hall archive_data_f archive_data_i The data is validated and indexes are created EPICS Collaboration Mtg 2007 The temporary table, along with indexes, is exchanged with a partition in the archived data tables Oct. 13 -14, 15

Retrieval Considerations • Data retrieval is accomplished by SQL SELECT statements made with OCI

Retrieval Considerations • Data retrieval is accomplished by SQL SELECT statements made with OCI routine calls. • The SELECT statement contains the channel identification and the time range of interest. • There are two tables for each supported data type: an active data table and an archive data table. • Direct path loading is done in an active data table, where indexing is not allowed. R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 16

Retrieval Considerations • Since there are two tables for each supported data type, an

Retrieval Considerations • Since there are two tables for each supported data type, an Oracle view is created to retrieve data for each data type. An Oracle view is a logical table comprised of one or more physical tables. • The retrieval SQL queries the view instead of querying the active and archive data tables directly. The views are created as read-only. • Views allow flexibility as to what data the user has access to. That is, the definition of the view may be changed to retrieve data in a different way without changes to retrieval code. R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 17

Retrieval Considerations • In addition to the use of timestamp range partitioning, the use

Retrieval Considerations • In addition to the use of timestamp range partitioning, the use of indexes for archive data is crucial to retrieval performance. • The archive data tables are indexed by a combination of timestamp and channel identification for best retrieval performance. R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 18

Table Structure Overview arch_pvname pv_id arch_meta arch_table_data arch_data_i arch_data_f (scalar integer) (scalar float) pv_id

Table Structure Overview arch_pvname pv_id arch_meta arch_table_data arch_data_i arch_data_f (scalar integer) (scalar float) pv_id stat value sevr timestamp ostat pv_name pv_starttime pv_id pv_endtime pv_active pv_id data_start data_table data_end archive_wave_i archive_wave_f (waveform integer) (waveform float) nanosecs R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 19

Summary • There were some limitations to the Oracle Archiver that we did not

Summary • There were some limitations to the Oracle Archiver that we did not have time to address (e. g. , the ability to store strings). • Also there were many ideas for new features that were not implemented. A database related example is the creation of an Oracle management web interface to specify the partition scheme and manage user connections. • The SLAC Oracle Archiver project was successful. However, the retrieval performance was only “acceptable” and not nearly as fast as version 2 of the EPICS file-based Channel Archiver. This was the area of greatest concern. At the time the project was abandoned, we did not run out of ideas to experiment with to improve retrieval performance, such as creating better indexes and changing the partition characteristics. R. D. Hall EPICS Collaboration Mtg 2007 Oct. 13 -14, 20