shelln bash n script sh ksh n script
脚本调试 ——使用shell的执行选项(-n) • 读取脚本但不执行 – 测试脚本是否存在语 法错误 • bash -n script. sh • ksh -n script. sh • csh/tcsh -n script. sh #!/bin/bash if [ "$#" != "1" ]; then echo "usage : $0 filename" elif [ ! -f $1 ] then echo "$1 not exists" else tail $1 fi bash -n. /exp 01. bash exp 02. bash: line 8: syntax error near unexpected token `else' exp 02. bash: line 8: `else'
脚本调试 ——使用shell的执行选项(-x) #!/bin/bash echo "Hello $USER, " echo Today is `date +"%Y-%m-%d"` echo bye-bye bash -x. /exp 02. bash + echo 'Hello jliu' Hello jliu + echo Today is `date +%Y-%m-%d` Today is 2015 -04 -26 + echo bye-bye #!/bin/bash echo "Hello $USER, " set -x echo Today is `date +"%Y-%m-%d"` set +x echo bye. /exp 03. bash Hello jliu + echo Today is `date +%Y-%m-%d` Today is 2015 -04 -26 + set +x bye-bye
脚本调试 ——对“-x”的增强选项 • BASH – PS 4='+${BASH_SOURCE}: ${LINENO}: ${FUNCNAME[0]}: ' • KSH – PS 4='+${LINENO}: '
脚本调试 ——对“-x”的增强选项 #!/bin/bash is. Root () { if [ $UID -ne 0 ]; then return 1 else return 0 fi } is. Root if ["$? " -ne 0 ]; then echo "Must be root to run this script" exit 1 else echo "welcome root user" fi
脚本调试 ——对“-x”的增强选项 bash -n. /exp 04. bash: line 10: [1: command not found welcome root user export PS 4='+${BASH_SOURCE}: ${LINENO}: ${FUNCNAME[0]}: ' bash -x. /exp 04. bash +exp 04. bash: 9: : is. Root +exp 04. bash: 3: is. Root: '[' 502 -ne 0 ']' +exp 04. bash: 4: is. Root: return 1 +exp 04. bash: 10: : '[1' -ne 0 ']' exp 04. bash: line 10: [1: command not found +exp 04. bash: 14: : echo 'welcome root user' welcome root user
脚本调试 ——使用shell的执行选项(-e) #!/bin/bash ls dummy_$$ echo "Hello $USER" echo Today is `date +"%Y-%m-%d"` echo bye-bye bash. /exp 05. bash ls: cannot access dummy_25097: No such file or directory Hello jliu Today is 2015 -04 -26 bye-bye bash -e. /exp 05. bash ls: cannot access dummy_25107: No such file or directory
脚本调试 ——使用 trap命令 #!/bin/bash ERRTRAP() { echo "[LINE: $1] Error: Command or function exited with status $? " } foo() { return 1 } trap 'ERRTRAP $LINENO' ERR abc foo echo "End" bash. /exp 07. bash: line 11: abc: command not found [LINE: 11] Error: Command or function exited with status 127 [LINE: 8] Error: Command or function exited with status 1 End
脚本调试 ——使用 trap命令 #!/bin/bash trap 'echo "before execute line: $LINENO, a=$a, b=$b, c=$c"' DEBUG a=1 if [ "$a" -eq 1 ]; then b=2 else b=1 fi c=3 bash. /exp 08. bash before execute line: 3, a=, b=, c= before execute line: 4, a=1, b=, c= before execute line: 5, a=1, b=, c= before execute line: 9, a=1, b=2, c= before execute line: 10, a=1, b=2, c=3 end
脚本调试 ——使用“调试钩子” #!/bin/bash _DEBUG="on" function DEBUG () { [ "$_DEBUG" == "on" ] && $@ } fun 01 () { echo "BUT HERE I am inside the function fun 01() body" } echo "HERE I am outside the function fun 01() body!" sleep 2 debug fun 01 echo "End" bash. /exp 09. bash HERE I am outside the function fun 01() body! BUT HERE I am inside the function fun 01() body End
程序调试 ——系统的相关文件 • 标准输入/标准输出/标准错误 – stdin/stdout/stderr 文件描述符 stdin stdout stderr Unix 0 1 2 5 6 0 Fortran 通常约定 F 2003 INPUT_UNIT ISO_FORTRAN_ENV C stdin OUPUT_UNIT ERROR_UNIT stdout stderr
程序调试 ——输出重定向 PROGRAM TEST WRITE(0, *) "Error" WRITE(6, *) "Good" WRITE(*, *) "Error Good" END PROGRAM TEST #include <stdio. h> int main() { fprintf(stderr, "Errorn"); fprintf(stdout, "Goodn"); printf("Error Goodn"); return 0; }
程序调试 ——输出重定向. /a. out Error Good bash & ksh. /a. out 1>out. log Error a. out > out. log Error . /a. out 2>error. log Good Error Good . /a. out >& out. log . /a. out > out. log 2>&1 csh/tcsh (a. out > out. log) >& err. log sh -c 'a. out > out' 2>& err. log . /a. out >out. log 2>err. log
程序调试 ——错误信息的查找(示例) • PBS E文件 – 任务脚本中没有重定向的标准错误的内容 – 并行环境的标准错误的内容 – 作业调度系统自己的标准错误的内容 [proxy: 0: 0@c 01 n 10] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb. c: 902): assert (!closed) failed [proxy: 0: 0@c 01 n 10] HYDT_dmxu_poll_wait_for_event (tools/demux_poll. c: 76): callback returned error status [proxy: 0: 0@c 01 n 10] main (pm/pmiserv/pmip. c: 206): demux engine error waiting for event [mpiexec@c 01 n 10] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait. c: 75): one of the processes terminated badly; aborting ………
程序调试 ——错误信息的查找(示例) • 应用程序自己的输出文件 – WRF RSL文件 • rsl. out :WRF输出到标准输出的内容 • rsl. err :WRF输出到标准错误的内容 Timing for main: time 2010 -06 -09_11: 48: 00 on domain 1: 110. 25500 elapsed … 1 points exceeded cfl=2 in domain d 01 at time 2010 -06 -09_11: 48: 00 hours MAX AT i, j, k: 39 16 2 vert_cfl, w, d(eta)= 2. 0284483 … … … 6 points exceeded cfl=2 in domain d 01 at time 2010 -06 -09_11: 48: 00 hours MAX AT i, j, k: 39 16 3 vert_cfl, w, d(eta)= 3. 4704070 ……… Timing for main: time 2010 -06 -09_11: 49: 00 on domain 2: 31. 64400 elapsed …
程序调试 ——编译器的调试选项 • PGI – -g -O 0 -traceback – -Mbounds -Mchkfpstk -Mchkptr -Ktrap=fp – -Minfo=all • INTEL – -g -O 0 -traceback – -check all -check bounds -check pointers -check uninit -ftrapuv – -warn all • GFORTRAN – -g -O 0 -fbacktrace -ffpe-trap=list – -fbounds-check -fcheck-array-temporaries – -Wall
程序调试 ——定位错误(示例一) program loop implicit none real, allocatable : : u(: ) integer i allocate(u(10)) do i=1, 11 ! off-by-one error u(i)= i * 1. 0 enddo print *, "i=", i, "u=", u(i) deallocate(u) end 不加边界检查编译选项 • PGI无运行时错误,给出不正确结果 i= 12 u= 0. 000000 • Intel、Gfotran产生运行时错误 *** glibc detected ***. /a. out: free(): invalid next size (fast): 0 x 000001 ec 5 b 40 *** 添加边界检查编译选项后的运行情况 At line 6 of file loop. f 90 Fortran runtime error: Array reference out of bounds for array 'u', upper bound of dimension 1 exceeded (11 > 10)
程序调试 ——定位错误(示例二) • 无-traceback选项编译运行情况 [jliu@log 01 interpolation_grib_api]$. /test. sh Interpolate product number 1 forrtl: severe (174): SIGSEGV, segmentation fault occurred • 添加-traceback选项编译运行情况 [jliu@log 01 interpolation_grib_api]$. /test. sh Interpolate product number 1 forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source interpolation_exa 00000497 AE 6 hntfauh_ 161 hntfauh. F interpolation_exa 000004979 C 6 hntfau_ 124 hntfau. F interpolation_exa 00000417711 intf_ 380 intf. F interpolation_exa 00000413 BD 0 Unknown interpolation_exa 000004106 D 8 Unknown interpolation_exa 00000406 B 37 MAIN__ 123 interpolation_example. F interpolation_exa 000004063 FC Unknown libc. so. 6 00000030 AE 61 ECDD Unknown interpolation_exa 000004062 F 9 Unknown
程序调试 ——定位错误(示例二) interpolation_example. F 122 INLEN = IREC 123 IRET = INTF 2(INGRIB, INLEN, NEWFLD, NEWLEN) intf. F 379 380 381 382 IF( LUSEHIR ) THEN IRET = HNTFAU(FLDIN, INLEN) ELSE IRET = INTFAU(FLDIN, INLEN) hntfau. F 124 IRET = HNTFAUH(INGRIB, INLEN) hntfauh. F 160 DO LOOP = 1, INLEN 161 ZNFELDI( LOOP ) = FLDIN( LOOP ) 162 ENDDO
程序调试 ——定位错误(示例三) • 使用head/tail查看rsl. out的头/尾信息 • 包含大量的‘Error in mapping flagsoap to start_ind’ 信息 • 使用grep在chem中查找包含“mapping flagsoap”的代码 – grep -i "mapping flagsoap" chem/*. F 5081 ! 20130807 acd_alma_bugfix start 5082 do iv = start_ind, ngas_ioa + ngas_soa 5083 if (flagsoap(iv-start_ind+1). eq. 2) then 5084 xsumfresh(ibin)= xsumfresh(ibin)+aer(iv, jtotal, ibin) 5085 elseif (flagsoap(iv-start_ind+1). eq. 1) then 5086 xsumaged(ibin)= xsumaged(ibin)+aer(iv, jtotal, ibin) 5087 elseif (flagsoap(iv-start_ind+1). eq. 0) then <-------5088 print *, 'Error in mapping flagsoap to start_ind' 5089 endif 5090 enddo
程序调试 ——错误的修正(示例) • 示例三中,假设数据就是会有这种情况,能不能简单注释 掉行5088的print语句? – 不能 – 一种安全、保守的修改方式为 elseif (flagsoap(iv-start_ind+1). eq. 0) then if ( err_sum. lt. 10 ) & print * ' Error in mapping flagsoap to start_ind ' err_sum = err_sum +1 endif endo if ( err_sum. gt. 10 ) & print * ' Too many errors in mapping flagsoap to start_ind '
程序调试 ——其他调试 具 • Total. View – IBM Graphical Symbolic Debugger • idb – Intel Debugger for Linux • pgdbg – PGI Graphical Symbolic Debugger • gdb – GNU Debugger • 使用版本控制管理 – SVN, CVS等
问题或建议 hpc@nuist. edu. cn
- Slides: 37