Python regular expressions http cs umbc educourses331 Regular
Python regular expressions http: //cs. umbc. edu/courses/331/
Regular expression • A special sequence of characters § Match or find other strings or sets of string § Using a specialized syntax held in a pattern
A simple example regular expression a p p l e Scrapple from the apple. match 1 match 2 [from Prof. Andrzej (AJ) Bieszczad ]
. regular expression • Match any character regular expression o. For me to poop on. match 1 match 2 [from Prof. Andrzej (AJ) Bieszczad ]
[ ] regular expression • Match any specific set of characters regular expression b [eor] a t beat a brat on a boat match 1 match 2 match 3 [from Prof. Andrzej (AJ) Bieszczad ]
match • A: 매칭된 결과를 가지는 match 클래스의 인스턴 스 >>> r 1 = re. search("a*b", "fooaaabcde") >>> r 1. group() # group returns string matched 'aaab' >>> r 1. start() # index of the match start 3 >>> r 1. end() # index of the match end 7 >>> r 1. span() # tuple of (start, end) (3, 7)
Email match • 간단한 email 주소에 매칭되는 패턴 예시 w+@(w+. )+(com|org|net|edu) >>> pat 1 = "w+@(w+. )+(com|org|net|edu)" >>> r 1 = re. match(pat, "finin@cs. umbc. edu") >>> r 1. group() 'finin@cs. umbc. edu’ • email 이름이나, 호스트와 같이 원하는 부분만 추출가능
group • 괄호를 통해 group 생성 >>> pat 2 = "(w+)@((w+. )+(com|org|net|edu))" >>> r 2 = re. match(pat 2, "finin@cs. umbc. edu") >>> r 2. group(1) 'finin' >>> r 2. group(2) 'cs. umbc. edu' >>> r 2. groups() ('finin', 'cs. umbc. edu', 'umbc. ', 'edu’) • 문자열의 앞에서부터 그룹의 순서, 숫자로 참 조 가능
label • 그룹에 label 추가 >>> pat 3 ="(? P<name>w+)@(? P<host>(w+. )+(com|org|net| edu))" >>> r 3 = re. match(pat 3, "finin@cs. umbc. edu") >>> r 3. group('name') 'finin' >>> r 3. group('host') 'cs. umbc. edu’ • label과 매칭 되는것으로 참조 가능
re functions • re. split() 패턴을 통해 문자열 분할 >>> re. split("W+", “This. . . is a test, short and sweet, of split(). ”) ['This', 'a', 'test', 'short’, 'and', 'sweet', 'of', 'split’, ‘’] • re. sub 패턴에 맞는 문자열을 대체 >>> re. sub('(blue|white|red)', 'black', 'blue socks and red shoes') 'black socks and black shoes’ • re. findall() 모든 매칭되는 것을 검색 >>> re. findall("d+”, "12 dogs, 11 cats, 1 egg") ['12', '11', ’ 1’]
Compiling regular expressions • re object 생성하여 정규식 표현을 여러번 사용 • 매칭의 속도를 향상 >>> capt 3 = re. compile(pat 3) >>> cpat 3 <_sre. SRE_Pattern object at 0 x 2 d 9 c 0> >>> r 3 = cpat 3. search("finin@cs. umbc. edu") >>> r 3 <_sre. SRE_Match object at 0 x 895 a 0> >>> r 3. group() 'finin@cs. umbc. edu'
Pattern object methods Pattern objects have methods that parallel the re functions (e. g. , match, search, split, findall, sub), e. g. : >>> p 1 = re. compile("w+@w+. +com|org|net|edu") >>> p 1. match("steve@apple. com"). group(0) email address 'steve@apple. com' >>> p 1. search(”Email steve@apple. com today. "). group(0) 'steve@apple. com’ >>> p 1. findall("Email steve@apple. com and bill@msft. com now. ") ['steve@apple. com', 'bill@msft. com’] sentence boundary >>> p 2 = re. compile("[. ? !]+s+") >>> p 2. split("Tired? Go to bed! Now!! ") ['Tired', 'Go to bed', 'Now', ’ ']
- Slides: 18