String Matching KnuthMorrisPratt algorithm Heather Takeguchi What is

  • Slides: 9
Download presentation
String Matching: Knuth-Morris-Pratt algorithm Heather Takeguchi

String Matching: Knuth-Morris-Pratt algorithm Heather Takeguchi

What is String Matching? • Used in word find in document, as well as

What is String Matching? • Used in word find in document, as well as in the spell checker and in internet keyword searches • Looking for an exact string match • Reality of algorithms are more complicated; search string ‘string’ results in ‘String’ as well as ‘stringbean’

How do you match strings? • Finite-State-Automota • Brute-Force • Knuth-Morris-Pratt (KMP) • visualization

How do you match strings? • Finite-State-Automota • Brute-Force • Knuth-Morris-Pratt (KMP) • visualization tool for Brute Force and KMP www. dcc. ufmg. br/~cassia/smaa/english/

Virus Detection • Detection of virus is simply searching for a pattern string in

Virus Detection • Detection of virus is simply searching for a pattern string in a larger text. 1 ) viral signature (contagious seg. ) matching 2 ) code enumeration (cmp. to old known file) 3 ) checksum methods (see size of file)

Variation-tolerant matching • Fast substring matching • approximate string matching – voice recognition –

Variation-tolerant matching • Fast substring matching • approximate string matching – voice recognition – dna sequencing

Example: x = GATAA and y = CAGATAAGAGAA and k = 1

Example: x = GATAA and y = CAGATAAGAGAA and k = 1

Example: x = GATAA and y = CAGATAAGAGAA and k = 1

Example: x = GATAA and y = CAGATAAGAGAA and k = 1

Summary • Exact string matching good for grep & sed • String matching used

Summary • Exact string matching good for grep & sed • String matching used in word find and in internet key word searches • KMP alg. is slightly better than Brute Force • approximate string matching and fast substring matching can be used for a wider use to practical applications.

Acknowledgements • Virus detection: www. cse. uta. edu/~holder/courses/cse 5311/lectures/applets/ je/a 24. html • Speech

Acknowledgements • Virus detection: www. cse. uta. edu/~holder/courses/cse 5311/lectures/applets/ je/a 24. html • Speech recognition: www. kom. e-technik. tudarmstadt. de/pr/workshop/chair/ACMMM 98/electronic_pr oceedings/robertson/ • Approximate string matching: http: //www-igm. univmlv. fr/~lecroq/seqcomp/node 3. html • Cormen, chaper 34