1 Indexing Innovations 14 2 Seminar Filing Procedures

  • Slides: 23
Download presentation
1 Indexing Innovations 14. 2 Seminar Filing Procedures

1 Indexing Innovations 14. 2 Seminar Filing Procedures

2 Session Agenda • Filing and word breaking procedures • Indexing procedures - new

2 Session Agenda • Filing and word breaking procedures • Indexing procedures - new features: – Parallel processing – Updating a group of indexes • New indexing routines Filing Procedures

3 Parallel Processing Filing Procedures

3 Parallel Processing Filing Procedures

4 Parallel Processing Problems with indexing batch routines in the early versions of ALEPH:

4 Parallel Processing Problems with indexing batch routines in the early versions of ALEPH: • Long run time • Computer resources not fully utilized single process per stage • No recoverability – if indexing failed, the whole building process needed to be rerun Filing Procedures

5 Parallel Processing • In 14. 2 all the index creation jobs (with the

5 Parallel Processing • In 14. 2 all the index creation jobs (with the exception of p_manage_27) enable parallel processing. Filing Procedures

6 Parallel Processing • Optimal utilization of computer resources (Large databases - multiple processors)

6 Parallel Processing • Optimal utilization of computer resources (Large databases - multiple processors) • Certain stages of index creation can be split into several cycles – this allows you to divide the workload among different processors • Indexing is much quicker Filing Procedures

7 Parallel Processing – Tracking • Assignment progress table: good control of indexing stages

7 Parallel Processing – Tracking • Assignment progress table: good control of indexing stages 0001 0002 0003 0004 0005 0006 0007 0008 0009 + + + ? ? ? - + ? - - 00001 000020001 000030001 000040001 000050001 000060001 000070001 000080001 0000100002000030000400005000060000700008000090000 + success ? in process - not processed Filing Procedures

8 Parallel Processing - Recovery If: • • database tables need to be enlarged

8 Parallel Processing - Recovery If: • • database tables need to be enlarged not enough disk space - intermediate files not enough disk space - sort general disaster You do not have to rerun the whole process! Filing Procedures

9 Parallel Processing - Recovery stages: • identify last successful section • change “in

9 Parallel Processing - Recovery stages: • identify last successful section • change “in process” signs (? ) to “not processed” sign (-) • rerun discrete stage scripts: For example: – – p_manage_01_a p_manage_01_c p_manage_01_d 1 Filing Procedures

10 Parallel Processing – Main Features Indexing is quicker 1. Tracking is easier 2.

10 Parallel Processing – Main Features Indexing is quicker 1. Tracking is easier 2. Recoverability is possible Filing Procedures

11 Updating a Group of Indexes Filing Procedures

11 Updating a Group of Indexes Filing Procedures

12 Updating a Group of Indexes • p_manage_01 and p_manage_02 have a new feature

12 Updating a Group of Indexes • p_manage_01 and p_manage_02 have a new feature allowing you to update a specific group of indexes. • Col. 8 defines a group of headings/word indexes for updating: 11 11 W W 008 LOC## 041## F 07 -04 F 35 -03 -o abdefg 01 A 03 41 A Filing Procedures WRD WRD WYR WLN WCL WLN

13 Updating a Group of Indexes • This option is only available when the

13 Updating a Group of Indexes • This option is only available when the program is run from the prompt command line. It is not available from the Web Services. • The following is an example of the way in which the program should be run for fields that belong to group B: csh -f p_manage_02 USM 01, 1, 00000, 99999, B, 1, 0, 00, csh -f p_manage_01 USM 01, 1, 00000, 99999, B, 1, 0, 00, Filing Procedures

14 Z 0102 – COUNTERS FOR LOGICAL BASES Filing Procedures

14 Z 0102 – COUNTERS FOR LOGICAL BASES Filing Procedures

15 z 0102 Pre-14. 2 – Problem: – Scanning logical bases which are less

15 z 0102 Pre-14. 2 – Problem: – Scanning logical bases which are less than 50% of the total database is very inefficient (slow, irrelevant unlinked headings ) Solution: There is a new index z 0102 which ‘divides’ z 01 into sections in accordance with the existing logical bases. Filing Procedures

16 z 0102 Example of z 0102 record: Filing Procedures

16 z 0102 Example of z 0102 record: Filing Procedures

17 z 0102 When a logical base is being browsed, the system uses the

17 z 0102 When a logical base is being browsed, the system uses the Z 0102 table to “decide” whether to display the heading (Z 01) without having to retrieve the documents attached to the heading, Read them, and then “decide”. Filing Procedures

18 z 0102 Structure: A record is built for each Z 01 and each

18 z 0102 Structure: A record is built for each Z 01 and each logical base, giving the filing text and sequence (in order to make the SCAN more efficient) and a counter of the number of relevant docs. Records are built for "see" reference headings, as well as for preferred headings. The record does not include pointers to the doc records; this is still done by Z 02. Filing Procedures

19 z 0102 Building the table: Run p_manage_32 to create z 0102 Run p_manage_34

19 z 0102 Building the table: Run p_manage_32 to create z 0102 Run p_manage_34 to update z 0102 - p_manage_32 runs on all Z 01 records and builds Z 0102. When p_manage_02 is run, p_manage_32 should be run directly afterwards. - p_manage_34 runs on Z 01 records that have been "touched" since the last time 32 or 34 were run. It should be run on a regular basis -- i. e. nightly, listed in the job_list (UTIL E/15/1). Filing Procedures

20 z 0102 Z 01 records that have been "touched“… - Z 01 has

20 z 0102 Z 01 records that have been "touched“… - Z 01 has a new field, Z 01 -UPDATE-Z 0102. - p_manage_02 set this flag to "Y". - p_manage_32 and _34 set this flag to "N". - update of z 01 sets Z 01 -UPDATE-Z 0102 is set to"Y". - p_manage_34 re-indexes Z 01 records that have Z 01 -UPDATE-Z 0102 = "Y". Filing Procedures

21 z 0102 Restrictions: – Z 0102 is used only for the WEB OPAC

21 z 0102 Restrictions: – Z 0102 is used only for the WEB OPAC browse A new switch in the WEB OPAC defines which tables are involved in BROWSE. If TAB 10 -Z 0102 -IN-USE = ‘Y’ – browse is performed by z 0102 If TAB 10 -Z 0102 -IN-USE = ‘N’ –z 0102 does not participate in BROWSE Presently, there is no online update of z 0102. Filing Procedures

22 New Batch Jobs for AUT Enrichment Pre – 14. 2 : AUT enrichment

22 New Batch Jobs for AUT Enrichment Pre – 14. 2 : AUT enrichment and correction of BIB after initial conversion or re- indexing is very time-consuming (it takes up to several days). Solution: New batch jobs for AUT enrichment and correction of BIB libraries. These batch jobs will replace the background running of ue_08 after a re-indexing of the z 01 indexes. p_manage_102: enrich the BIB z 01 index from the entire AUT library p_manage_104: reset the Z 01 created from regular indexing to "-CHK-" status p_manage_103: send Z 07 records to all potential "corrected” BIB docs. Filing Procedures

23 New Batch Jobs for AUT Enrichment p_manage_102: enrich the BIB z 01 index

23 New Batch Jobs for AUT Enrichment p_manage_102: enrich the BIB z 01 index from the entire AUT library p_manage_104: reset the Z 01 created from regular indexing to "-CHK-" status p_manage_103: send Z 07 records to all potential "corrected" BIB docs. Filing Procedures