Python regular expressions http cs umbc educourses331 Regular

• Slides: 20

Python regular expressions http: //cs. umbc. edu/courses/331/

Regular expression • A special sequence of characters § Match or find other strings or sets of string § Using a specialized syntax held in a pattern

A simple example regular expression a p p l e Scrapple from the apple. match 1 match 2 [from Prof. Andrzej (AJ) Bieszczad ]

. regular expression • Match any character regular expression o. For me to poop on. match 1 match 2 [from Prof. Andrzej (AJ) Bieszczad ]

[ ] regular expression • Match any specific set of characters regular expression b [eor] a t beat a brat on a boat match 1 match 2 match 3 [from Prof. Andrzej (AJ) Bieszczad ]

match • A: 매칭된 결과를 가지는 match 클래스의 인스턴 스 >>> r 1 = re. search("a*b", "fooaaabcde") >>> r 1. group() # group returns string matched 'aaab' >>> r 1. start() # index of the match start 3 >>> r 1. end() # index of the match end 7 >>> r 1. span() # tuple of (start, end) (3, 7)

Email match • 간단한 email 주소에 매칭되는 패턴 예시 [email protected](w+. )+(com|org|net|edu) >>> pat 1 = "[email protected](w+. )+(com|org|net|edu)" >>> r 1 = re. match(pat 1, "[email protected] umbc. edu") >>> r 1. group() '[email protected] umbc. edu’ • email 이름이나, 호스트와 같이 원하는 부분만 추출가능

group • 괄호를 통해 group 생성 >>> pat 2 = "(w+)@((w+. )+(com|org|net|edu))" >>> r 2 = re. match(pat 2, "[email protected] umbc. edu") >>> r 2. group(1) 'finin' >>> r 2. group(2) 'cs. umbc. edu' >>> r 2. groups() ('finin', 'cs. umbc. edu', 'umbc. ', 'edu’) • 문자열의 앞에서부터 그룹의 순서, 숫자로 참 조 가능

label • 그룹에 label 추가 >>> pat 3 ="(? P<name>w+)@(? P<host>(w+. )+(com|org|net| edu))" >>> r 3 = re. match(pat 3, "[email protected] umbc. edu") >>> r 3. group('name') 'finin' >>> r 3. group('host') 'cs. umbc. edu’ • label과 매칭 되는것으로 참조 가능

re functions • re. split() 패턴을 통해 문자열 분할 >>> re. split("W+", “This. . . is a test, short and sweet, of split(). ”) ['This', 'a', 'test', 'short’, 'and', 'sweet', 'of', 'split’, ‘’] • re. sub 패턴에 맞는 문자열을 대체 >>> re. sub('(blue|white|red)', 'black', 'blue socks and red shoes') 'black socks and black shoes’ • re. findall() 모든 매칭되는 것을 검색 >>> re. findall("d+”, "12 dogs, 11 cats, 1 egg") ['12', '11', ’ 1’]

Compiling regular expressions • re object 생성하여 정규식 표현을 여러번 사용 • 매칭의 속도를 향상 >>> pat 3 ="(? P<name>w+)@(? P<host>(w+. )+(com|org|net|edu))" >>> capt 3 = re. compile(pat 3) >>> cpat 3 <_sre. SRE_Pattern object at 0 x 2 d 9 c 0> >>> r 3 = cpat 3. search("[email protected] umbc. edu") >>> r 3 <_sre. SRE_Match object at 0 x 895 a 0> >>> r 3. group() '[email protected] umbc. edu'

Pattern object methods Pattern objects have methods that parallel the re functions (e. g. , match, search, split, findall, sub), e. g. : >>> p 1 = re. compile("[email protected]+. +com|org|net|edu") >>> p 1. match("[email protected] com"). group(0) email address '[email protected] com' >>> p 1. search(”Email [email protected] com today. "). group(0) '[email protected] com’ >>> p 1. findall("Email [email protected] com and [email protected] com now. ") ['[email protected] com', '[email protected] com’] sentence boundary >>> p 2 = re. compile("[. ? !]+s+") >>> p 2. split("Tired? Go to bed! Now!!") ['Tired', 'Go to bed', 'Now!!' ]

Regular Expression Tools • • • http: //www. regextester. com https: //regex 101. com http: //regexr. com http: //rubular. com http: //erik. eae. net/playground/regexp. html http: //myregexp. com

Regular Expression Tools • https: //regex 101. com