Applying pattern Regular Expression on C Copyright by
Applying pattern Regular Expression on C Copyright by Sun. Young Kim 작 성 자: 김선영 메 일: sunyzero@gmail. com 버 전: 1. 04
Preface v Regular Expression (정규표현식)의 약칭 REGEX v string pattern 은 문자열의 조합되는 규칙 v meta charater 는 다른 의미를 수식하는 문자 v Pro*C 는 Oracle. 사의 상표입니다. Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
REGEX on C v POSIX style § 표준화 작업의 산물 § 호환성이 높음 § 직관적이며 다른 언어와 API 가 비슷함 v BSD style § BSD 초기 방식의 API (old fashion) v PCRE(Perl Compatible Regular Expression) § perl 확장 포함 § perl 관련 라이브러리 필요 Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
API: regcomp v int regcomp(regex_t *preg, const char *regex, int cflags) § regex_t *preg : 패턴버퍼 § const char *regex : 컴파일할 패턴 문자열 § int cflags : 컴파일 플래그 » REG_EXTENDED : POSIX 확장 정규표현식 사용 » REG_ICASE : 대소문자 무시 » REG_NOSUB : 서브스트링을 무시 » REG_NEWLINE: [. . . ], [^. . . ] 등이 New line 과는 매칭하지 않음(라인단위 매칭) § return value : 0 (성공), 이외의 값(실패) Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
API: regerror v size_t regerror(int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size) § int errcode : regcomp() 가 에러발생시 반환값 § regex_t *preg : 패턴버퍼 § char *errbuf : 에러를 출력해줄 버퍼 § size_t errbuf_size : errbuf 인수의 크기 (Byte) § return value : errbuf 에 출력한 에러메시지의 크기 if ((ret = regcomp(&re_expr, p_regex_str, REG_EXTENDED|REG_NEWLINE))){ regerror(ret, &re_expr, errbuf, sizeof(errbuf)); printf("Error regcomp() : %sn", errbuf); /* 에러처리 */ } Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
API: regexec v int regexec(const regex_t *preg, const char *string, size_t nmatch, regmatch_t pmatch[], int eflag) § regex_t *preg : 패턴버퍼 § char *string : 패턴 매칭할 대상 문자열 § size_t nmatch : 매칭테이블 pmatch 배열의 개수 § regmatch_t pmatch[] : 패턴 매칭 결과의 offset 을 저장해줄 매칭 테이블 § eflags » REG_NOTBOL: (not beginning-of-line) 라인 시작 패턴인 ^ 을 사용못함 » REG_NOTEOL: (not end-of-line) 라인 마지막 패턴인 $ 를 사용못함 § return value : 0 (성공), 이외의 값(실패) Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
API: regexec v regmatch_t 타입의 형태: <regex. h> 에 선언됨 typedef struct { regoff_t rm_so; regoff_t rm_eo; } regmatch_t; Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
API: regfree v void regfree(regex_t *preg) § regex_t *preg : 패턴버퍼 Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
Example of REGEX v Usage : . /posix_regex [dest_string pattern_string] #define MAX_EXPR_SUB_MATCH 5 #define DEFAULT_REGEX_STR "(</. +>). * " #define DEFAULT_DEST_STR "<center>align to center</center> align to left New Line <p>" int main(int argc, char **argv) { int i, ret; char *p_regex_str; /* pattern string */ char *p_dest_str; /* destination string to apply pattern */ regex_t re_expr; /* POSIX REGEX pattern buffer */ regmatch_t rm_matchtab[MAX_EXPR_SUB_MATCH]; /* matching table*/ char errbuf[0 xff]; Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
Example of REGEX (con't) if (argc != 3) { printf("Using default string!!n"); printf("* Dest str : %sn", DEFAULT_DEST_STR); printf("* Regex str: %sn", DEFAULT_REGEX_STR); p_dest_str = strdup(DEFAULT_DEST_STR); p_regex_str = strdup(DEFAULT_REGEX_STR); } else { p_dest_str = strdup(argv[1]); p_regex_str = strdup(argv[2]); } if ((ret = regcomp(&re_expr, p_regex_str, REG_EXTENDED|REG_NEWLINE))){ regerror(ret, &re_expr, errbuf, sizeof(errbuf)); printf("Error regcomp() : %sn", errbuf); exit(EXIT_FAILURE); } Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
Example of REGEX (con't) printf("regcomp : %sn", p_regex_str); memset(rm_matchtab, 0 x 00, sizeof(rm_matchtab)); if (regexec(&re_expr, p_dest_str, MAX_EXPR_SUB_MATCH, rm_matchtab, 0)) { printf("fail to matchn"); } else { printf("* All Match offset : (%d -> %d), len(%d) : %. *sn", rm_matchtab[0]. rm_so, rm_matchtab[0]. rm_eo - rm_matchtab[0]. rm_so, &p_dest_str[rm_matchtab[0]. rm_so]); Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
Example of REGEX (con't) for (i=1; i<MAX_EXPR_SUB_MATCH; i++) { if (rm_matchtab[i]. rm_so == -1) break; printf("* Submatch[%d] offset : (%d -> %d), len(%d) : %. *sn", i, rm_matchtab[i]. rm_so, rm_matchtab[i]. rm_eo - rm_matchtab[i]. rm_so, &p_dest_str[rm_matchtab[i]. rm_so]); } /* end: for */ } /* end: else */ regfree(&re_expr); /* freeing pattern buffer memory */ return 0; } Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
Example of REGEX (con't) v execution $. /posix_regex Using default string!! * Dest str: <center>align to center</center> align to left New Line <p> * Regex str: (</. +>). * regcomp : (</. +>). * * All Match offset: (23 ->66), len(43): </center> align to left New Line * Sub Match offset: (23 ->62), len(39): </center> align to left New Line $. /posix_regex. exe http: //news. naver. com/news/read. php "http: //([^/]+)(. *)" regcomp : http: //([^/]+)(. *) * All Match offset : (0 -> 35), len(35) : http: //news. naver. com/news/read. php * Submatch[1] offset : (7 -> 21), len(14) : news. naver. com * Submatch[2] offset : (21 -> 35), len(14) : /news/read. php v Todo: URL 으로부터 hostname 과 URI 를 분리하는 것을 실습해봅시 다 Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
apply to Pro*C: raw data v 배치 처리할 데이터 # Raw data seqeunce: [Name] [Age] [Gender] [LOCALE] -------- KOR -------Korean staff list Yeong. Hee Lee| 25| Female| Korea Hoon Kim, 29, Male, Pusan Korea -------- USA -------Steve, 29, Male, USA Ken Jacobs, 48, Male, birmingham USA Dave Roberts, 32, Male, New York USA -------- JAP -------Rikako, 42, Female, Nagano Japan Lily, 35, Female, Osaka Japan -------- CHI -------Xiangping, 32, Male , Hong. Kong China Chao Jien, 41, Male|Hong. Kong China Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
apply to Pro*C: scheme v DB scheme WHENEVER SQLERROR CONTINUE; DROP SEQUENCE SEQ_EMPLIST; CREATE SEQUENCE SEQ_EMPLIST START WITH 1 INCREMENT BY 1 NOCYCLE NOCACHE; DROP TABLE EMPLIST CASCADE CONSTRAINTS; CREATE TABLE EMPLIST ( SNO NUMBER(5), NAME VARCHAR(30), GENDER NUMBER(1), AGE NUMBER(3), LOCALE VARCHAR(20) ); Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
apply to Pro*C v pr 7_regex. c : (todo) 필터링한 결과를 DB insert 하는것이 목적 § 예제는 불완전 하므로 기능을 완성하도록 합시다. typedef struct my_record { short sno; char name[30+1]; short gender; unsigned short age; char locale[20+1]; } MY_RECORD; int set_record(MY_RECORD *, const char *, const regmatch_t *); int insert_rec(const MY_RECORD *s); #define MAX_EXPR_SUB_MATCH 10 #define DEF_FILENAME "regdata. txt" #define REGEX_STR "^([a-z. A-Z ]+)[|, ]([0 -9 ]+)[|, ]([a- z. A-Z ]+)[|, ]+([a-z. A-Z ]+)" int main(int argc, char **argv) { char *p_filename; regex_t re_expr; /* posix regex patern buffer */ regmatch_t rm_matchtab[MAX_EXPR_SUB_MATCH]; /* pattern matching table */ int i, ret; FILE *fp; char errbuf[0 xff], buf[0 xff]; MY_RECORD a_rec; Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
apply to Pro*C (con't) if (argc != 2) { printf("Using default filename!n"); p_filename = DEF_FILENAME; } else { p_filename = strdup(argv[1]); } if ((fp = fopen(p_filename, "r")) == NULL) { perror("FAIL: fopen"); exit(EXIT_FAILURE); } if ((ret = regcomp(&re_expr, REGEX_STR, REG_EXTENDED|REG_NEWLINE))) { regerror(ret, &re_expr, errbuf, sizeof(errbuf)); printf("Error regcomp() : %sn", errbuf); exit(EXIT_FAILURE); } printf("regcomp : %sn", REGEX_STR); /* (연습) DB 연결을 만든다 */ Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
apply to Pro*C (con't) while (!feof(fp)) { if (fgets(buf, sizeof(buf), fp) == NULL) break; memset(rm_matchtab, 0 x 00, sizeof(rm_matchtab)); if (regexec(&re_expr, buf, MAX_EXPR_SUB_MATCH, rm_matchtab, 0)) { printf("fail to match: (%. 30 s. . . )n", buf); } else { if (set_record(&a_ rec, buf, rm_matchtab)) {/* inser to db */ fprintf(stderr, "[%s: %d] FAIL: set_record()n", __FILE__, __LINE__); break; } if (insert_rec(&a_rec)) { fprintf(stderr, "[%s: %d] FAIL: insert_rec()n", __FILE__, __LINE__); break; } EXEC SQL COMMIT; } /* end: else */ } /* end: while */ Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
apply to Pro*C (con't) regfree(&re_expr); /* free memory */ /* 연습: commit 하면서 DB연결 해제 */; return 0; } /* end: main() */ /* Macro: copy string indicated by the match table offset */ #define COPY_RMTAB(dest, src, matchtab) memcpy(dest, &src[matchtab. rm_so], matchtab. rm_eo - matchtab. rm_so); dest[matchtab. rm_eo - matchtab. rm_so] = 0 x 0 Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
apply to Pro*C (con't) int set_record(MY_RECORD *d, const char * sbuf, const regmatch_t *rmtab) { char buf[40]; /* temp. buffuer */ COPY_RMTAB(d->name, sbuf, rmtab[1]); /* name : 1 st field => rmtab[1] */ COPY_RMTAB(buf, sbuf, rmtab[3]); /* gender: 3 rd field => rmtab[3] */ if (strncmp(buf, "Male", 4) == 0) { /* with logical error! Why? */ d->gender = 1; } else { d->gender = 2; /* always '2'. !! */ } COPY_RMTAB(buf, sbuf, rmtab[2]); d->age = atoi(buf); /* age: 2 nd field=>rmtab[2] */ COPY_RMTAB(d->locale, sbuf, rmtab[4]); /* locale: 4 th field => rmtab[4] */ return 0; } /* end: set_record() */ int insert_rec(const MY_RECORD *s) { EXEC SQL INSERT INTO EMPLIST. . . 생략. . . ; return SQLCODE; } /* end: insert_rec() */ Copyright by Sun. Young Kim <sunyzero (at) gmail (dot) com>
- Slides: 22