Ten More Database Mysteries Chris Lawson Database Specialists
Ten More Database Mysteries Chris Lawson Database Specialists, Inc. www. dbspecialists. com clawson@dbspecialists. com May 25, 2000
1 The Case of the Unwanted Services • DBA detects listener starting extra services NOT defined! > lsnrctl status Services Summary… database 1 has 1 service handler(s) database 2 has 1 service handler(s) databasez has 5 service handler(s) • They appear to be harmless, but what are these extra services? • Restarted Listener. At first all OK, then 5 extra services activated. • Confirmed that listener. ora file does NOT list the unwanted database. • DBA confirmed using correct listener. ora file. • Puzzle: There is no entry whatsoever for databasez.
Unwanted Services: Solution • Multiple listener services is indicative of MTS (Multi-threaded Server) Dispatchers; however, MTS was not being used! • DBA recalled that databasez was actually on a different server. • Solution: Database on different server started MTS services. • Key init. ora parameter: MTS_LISTENER_ADDRESS = "(ADDRESS=(PROTOCOL=tcp)(HOST=myhost)(PORT=1526))" • Services match init. ora parameter MTS_DISPATCHERS. • How it happened: The remote init. ora file had originated on ‘our’ server, and the MTS parameter entries were not removed. • Resolution: On other server, DBA corrected the init. ora entry, bounced database. Extra services automatically stopped. • But - that’s not all. . .
Unwanted Services: The Rest of the Story • These extra services turned out to be not so “harmless” after all. • The ‘unwanted’ services are fully functional; they will intercept connection requests and redirect them to a different server! • Production users were redirected to a “clone” testing database on another server, and used it for several hours! • This occurs because each MTS service points to a particular dispatcher on a particular server. > lsnrctl services Summary. . . demo has 3 service handler(s) D 001 <machine: hohp 2, pid: 3385> (ADDRESS=(PROTOCOL=tcp)(DEV=17)(HOST=[1. 2. 3. 4)(PORT=1179)) • Perhaps could be used for rapid failover to standby database?
2 Incognito DBA Privileges • Application testers need to “refresh” the test database often. • They will need to start/shut database as the oracle user. • DBA, being suspicious (justly) by nature, is reluctant to provide oracle account. • As punishment, users frequently ask DBA to run the refresh. • Question: How can developers perform tasks as oracle, without actually having the oracle account?
Incognito DBA Privileges: Solution • Create script to refresh database, including startup, shutdown, etc. • UNIX setuid feature allows programs to run with another identity. But setuid feature does not propagate to commands within the script. • Trick: Use a ‘wrapper’ C program that runs as oracle and calls the refresh script: # include <stdio. h> main () { system (“. /refresh. ksh”); } • Activate setuid for the wrapper program: chmod 6711 refresh • Users simply enter refresh to run the refresh program as oracle. • Thanks to Brian Keating of Database Specialists for this solution.
3 NT Remote Control • NT box is running an important database. It appears that the Listener is down. • NT server is remote. • Question: How can you check the listener remotely?
NT Remote Control: Solution • Use ‘SC’ command. It is part of the NT Resource Kit. • The SC utility is very similar to the NET commands. • First enable a security "context" to the remote box; e. g. map a network drive to the remote server, and provide the administrator account/password. • The network drive may be disconnected after SC use is done. • Note: A mapped network drive is not required if you are connected locally as administrator, and the remote server uses the same administrator password. • Run SC using the format: SC \[SERVER] [CMD] SERVICE
NT Remote Control: Solution (continued) Some common SC commands are: query----Queries the status for a service start----Starts a service pause----Sends a PAUSE control request continue-----Sends a CONTINUE request stop-----Sends a STOP request to a service config-------Changes the configuration of a service qc------Queries the configuration information delete-------Deletes a service (from the registry) create-------Creates a service (add to registry)
NT Remote Control: Solution (continued) EXAMPLE: Check status of the Oracle 8 LISTENER service. NT server named ‘TSUNAMI’ SC \TSUNAMI QUERY ORACLETNSLISTENER 80 SERVICE_NAME: ORACLETNSLISTENER 80 TYPE : 10 WIN 32_OWN_PROCESS STATE : 4 RUNNING (STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN) WIN 32_EXIT_CODE : 0 (0 x 0) SERVICE_EXIT_CODE : 0 (0 x 0) CHECKPOINT : 0 x 0 WAIT_HINT : 0 x 0 j
4 The Sad Case of the Failing Failover • Very critical Customer Support application for large HMO. • 24 x 7 crucial. Millions of monthly customers interactions. • IBM RS-6000 High-availability cluster (HACMP) • Scenario: Sudden hardware glitch causes failover to backup node. • Secondary node fails to start database! Database complains of missing datafile. • Critical application is down! • Objective: Early-rising DBA (from different project) not familiar with the setup needs to get database up!
The Failing Failover: Solution • Investigation shows all file systems mounted. Bad News: file in question is simply not there. • Good News: File is not supposed to be there--only a symbolic link. • Each node has symbolic links pointing from “nice-looking” paths to the actual location of raw device: /u 04/oradata/data 01. dbf > /dev/r 123 • The primary node had all the links; the backup node was missing 2 symbolic links. Simply creating the links allowed database startup.
The Failing Failover: Solution (continued) • Moral of the story: High reliability hardware only as good as the weakest link--including maintenance requirements. • This strategy was very susceptible to human error. Every new. dbf file required new symbolic link on both nodes. One omission completely destroyed the high-availability plan.
5 The Case of the Impatient Insurance Agent • Query to find details for expired insurance policies only returns 50 rows, but takes 2 minutes to finish. • Query is simple join of 2 tables: Response and Policy RESP policy_id 5 where-clause ‘filters’ result set: 20, 000 rows POL 1 ‘filter’ result set: 15, 000 rows result: only 50 rows • Puzzle: How can query ever be quick, since the ‘super filter’ to reduce set to 50 rows is spread across two tables ?
Impatient Agent: Solution The Trick: 1) Perform pre-processing that uses filters, but retrieves only the key (policy_id) that is used as the join column. 2) This field is retrieved via index read only--thus avoiding the table reads that account for the delays. 3) Use this result set as the starting point for the original join. pre-processing driving “table” RESP index reads only POL original join RESP POL table reads on small set only
6 Daffy Database Links • Version 7. 3. 4 database. Database links working normally. Database has been running straight for 3 weeks. • Suddenly, the links fail: ORA-12154: TNS: could not resolve service name • Investigation shows tnsnames. ora file unchanged. Link definition unchanged. • Listener is running Oracle 8. 0. 4. Adding entry to tnsnames. ora file in Oracle 8 directory causes the links to work again! • Tests with new links show that the tnsnames. ora file for 7. 3. 4 is being ignored! • Further, if Oracle 8 tnsnames file is removed, the one in Oracle 7 is used instead!
Database Links: Solution • Clue: Listener had been restarted recently. But why should the Listener process (a server side function) affect finding the tns alias? • Metalink analyst suggested checking TNS_ADMIN; but how is that relevant? We are not creating a UNIX session. • Oracle Note 37808. 1 clears-up the confusion: For database links, TNS_ADMIN takes its value from the value defined when the listener was started • That is, a client process (link) is influenced by a server-side process! • The listener had been started most recently with TNS_ADMIN set to 8. 0. 4, causing links to look in 8. 0. 4 tns file. • Unsetting TNS_ADMIN, then restarting listener caused all operations to run as expected.
7 All Primary Keys are Equal (But some are more equal than others) • Background: Application tuning often requires DBA to ‘lure’ optimizer into using certain indexes. Hints not always successful or possible. • Optimizer ‘likes’ certain indexes, because they typically are faster than others. The favorite: an index on Primary Key. • Scenario: 20 gigabyte insurance billing system. Oracle 8. 0. 4. • Particular query joins to a table called POLICY. Join uses the PK index (policy_ID). • Query speed-up requires that join use new index on (PK, other col). • Created new double index--but optimizer would not cooperate. It foolishly insists on using the PK index, even if hint used. • Question: How can we get the optimizer to obey the DBA?
All Primary Keys are Equal: Solution • We need a way to ‘trick’ the optimizer into not using the PK index, but instead, use our ‘extra column’ index. • This is difficult, because the PK index is the #1 choice. • Trick: ‘Disguise’ the PK index as a unique index. • Oddity: If unique index already exists, addition of a matching PK will use the existing index. • In determining execution plan, optimizer will treat the new index as if it were a unique index, not a PK index. • So, drop the PK, build a matching unique index, then rebuild the PK. • Optimizer no longer stubbornly insists on using this ‘PK’ index, because it is not a true, pedigreed PK index--merely a unique index.
8 The Sad Case of the Homeless Archive Logs • Scenario: Large medical application; hundreds of connections into database. This is a critical, 24 x 7 server. • Users suddenly complain of database ‘hanging. ’ • Alert log shows database unable to write archive log. ORA-00255: error archiving log 1 of thread 1, sequence # 200 ORA-00270: error creating archive log /demo/arch/1_200. dbf ORA-19504: failed to create file "/demo/arch/1_200. dbf" • Investigation reveals disk crash on disk housing archive logs. • No. dbf files on that disk--only archive logs. • Problem: How can database operation be resumed, with minimal disruption to hundreds of users?
Homeless Archive Logs: Solution • DBA decided to dynamically redirect the archive logs, thereby avoiding need to shutdown database. • To change destination of archive logs: alter system archive log start to ‘[new path]’ • For example: SVRMGRL> alter system archive log start to '/demo/arch 2/' • Once the ‘backlog’ of archive logs is corrected, database automatically resumes normal operation. • Alert log now shows: Fri Feb 11 09: 00: 47 2000 ARCH: Archival started Archiver process freed from errors. No longer stopped.
9 The Puzzling Affair of the Old Archive Logs • With database in Archive Mode, old archive logs must be purged. • On Unix, this is simple, using the find command (in cron). • For example, to purge archive logs greater than 6 days old: find /logdir/arch -name “arch*” -mtime +6 -exec rm {} ; • Problem: How can we similarly purge the old archive logs on NT?
Archive Logs: Solution • On NT, there is apparently no ‘find’ command, as in Unix. • The archive logs can be found indirectly, because the database ‘knows’ when/where they were written. • Logs listed in V$ARCHIVED_LOG • Example, to remove logs older than 2 days: spool del_logs. sql select 'del ‘ || name from v$archived_log where completion_time < (sysdate - 2); spool off host del_logs • Similar query could be used to copy archive logs to backup.
10 SQL*Plus Madness • Scenario: Connection ‘hangs’ upon trying to connect in SQL*Plus. • Also get error message re DBMS_APPLICATION_INFO • DBA investigates. He finds: 1) Connect via Svrmgrl ok; 2) System user connects ok • Problem: How can DBA ‘fix’ SQL*Plus?
SQL*Plus Madness: Solution • Package DBMS_APPLICATION_INFO is used to ‘register’ a running application, for tracking/analysis purposes. • Once an application is ‘registered’ it will show up in v$session in ‘module’ field. • SQL*Plus is one of few applications that try to ‘register. ’ This explains why svrmgrl still works. • This all points to problem with the ‘register’ of SQL*Plus. • The set-up script to build necessary tables is called ‘pupbld. sql’
SQL*Plus Madness: Solution (continued) • Action: Try to run pupbld. sql as SYSTEM but it hangs! • Investigation shows: product_profile table missing, but synonym still there! • Solution: drop public synonym for product_profile, then rebuild pupbld. All OK now.
Contact Information Chris Lawson clawson@dbspecialists. com http: //www. dbspecialists. com Database Specialists, Inc. 388 Market Street, Suite 400 San Francisco, CA 94111
- Slides: 27