Autocomplete Based on Ternary Search Tree Niu Wenhao
Autocomplete Based on Ternary Search Tree Niu Wenhao, Lu Yan, Wang Sangtian
Part 1 Motivation and Introduction ——By Niu Wenhao
Motivation • To improve the performance of autocomplete function of Acemap.
Original autocomplete function There are some problems in original autocomplete function. Large memory costs Redis Very slow for some keywords
Our solution Ternary Search Tree (TST) Memory cost of 100, 000 keywords Redis TST 210 MB 136 MB Might be very Response time long! 0. 38 ms
Demo • 202. 120. 36. 137: 8123/Autocomplete_v 1. 5 • Return a JSON
Java Web • Environment: Java SE Development Kit 8. 0 • Server: Apache Tomcat 7. 0 • IDE: Eclipse
Java Web Server Request JSP Page Servlet Response
Part 2 Details about TST ——By Lu Yan
Ternary Search Tree Assign a character to each node. Give each node three links: • Left link if key’s next character < node’s character • Middle link if key’s next character == node’s character • Right link if key’s next character > node’s character 10
After inserting “sam” s a m a s w a l s d m e p
After inserting “sam”, “sad” s a m d a s w a l s d m e p
After inserting “sam”, “sad”, “sap” s a m d a s w a l p s d m e p
After inserting “sam”, “sad”, “sap”, “same” s a m d a s w a l e p s d m e p
After inserting “sam”, “sad”, “sap”, “same”, “a” s a a m d a s w a l e p s d m e p
After inserting “sam”, “sad”, “sap”, “same”, “a”, and “awls” s a a m w l s d a s w a l e p s d m e p
Performance Time Complexity Space Complexity O(N * L) Redis O(L) (should be better since there could be many collisions) Trie O(K * L) O(N * L * R) Trie with hashmaps O(K * L) O(N * L) TST O(N * L) O(K * (L + log N)) N = # number of words L = # average length of words K = # number of matches we wish to return R = # size of alphabet 17
Keep TST in memory forever. • Very important! • Build a TST: long time • Search in TST: no more than 1 ms
Part 3 Improve user experience ——By Wang Sangtian
Improve user experience Sorry, I type in wrong prefix! What’s your middle name? What about abbreviation? They have same name…… Useless data in database.
Sorry, I type in wrong prefix! Users may type in wrong prefix. Xinbing Wang xinbing xinbnig Wrong xinbiing
Sorry, I type wrong prefix! Exhaustion method • Exchange adjacent characters xibnnig xinnbig xinbnig xinbing xinbngi
Sorry, I type wrong prefix! Exhaustion method • Delete one character xibiing -- delete “n” xiniig -- delete “b” xinbing -- delete “i” xinbiing … --delete “i”, “n”, “g”
What’s your middle name? Some users don’t know the author’s middle name. Author’s Full name User’s input “Obert P. Castleberry” “obert cas” “Brian B. Avants” “brian ava”
What about abbreviation? Users are more familiar with abbreviations of affiliations. Affiliation Abbreviation Shanghai Jiao Tong University SJTU University of California Los Angeles UCLA
They have same name… There are 16 authors named “Tao Wang” in our database. tao wang: 7 D 1 EAAC 6 Name + ID tao wang: 7 DAE 408 A tao wang: 7 DC 34 BA 9
Useless data in database There are some useless data in database. Author name Author ID Paper count “Zhang” 81 ED 2 B 9 C 8059 “Wang” 85 AA 6692 4512
Q&A
- Slides: 28