Aligning Grass Protein Sequences Using PAMModified Global Alignment
Aligning Grass Protein Sequences Using PAM-Modified Global Alignment Yifei Zhang May 7, 2018
Ut ili zin gt he PA M 25 0 M atr ix 1. Obtain the table 2. Build a dictionary 3. Modify the global. Align Fun
Gramineae, or the grasses family Evolutionary History of the Grasses Elizabeth A. Kellogg Plant Physiology Mar 2001, 125 (3) 1198 -1205; DOI: 10. 1104/pp. 125. 3. 1198 Indica rice: long-grained, Harder after cooked Japonica rice: short-grained, Softer after cooked
Three proteins analyzed: • Granule-bound starch synthase, which is related to the stickiness of the seed after cooked • GS 3 protein/seed length and weight protein, regulates grain size. • Betaine aldehyde dehydrogenase/badh 2/fragrance protein. An allele located on the gene is a major factor associated with aroma.
Finding and processing data Sample : Betaine aldehyde dehydrogenase [Zea mays L. ] NCBI Reference Sequence: NP_001105781. 2 506 mmasqamvplrqlfvdgewrppaqgrrlpvvnptteahigeipagtaedvdaavaaaraa lkrnrgrdwarapgavrakylraiaakvierkqelaklealdcgkpydeaawdmddvagc feyfadqaealdkrqnspvslpmetfkchlrrepigvvglitpwnypllmatwkvapala agcaavlkpselasvtcleladickevglppgvlnivtglgpdagaplsahpdvdkvaft gsfetgkkimaaaapmvkpvtlelggkspivvfddvdidkavewtlfgcfwtngqicsat srllvhtkiakefnekmvawaknikvsdpleegcrlgpvvsegqyekikkfilnaksega tiltggvrpahlekgffieptiitdittsmeiwreevfgpvlcvkefstedeaielandt qyglagavisgdrercqrlseeidagiiwvncsqpcfcqapwggnkrsgfgrelgeggid nylsvkqvteyisdepwgwyrspskl Remove spaces and number: def clean(s 1): result = ''. join(i for i in s 1 if not i. isdigit()) result = result. split() result = ''. join(result return result
Define get. Pam function that builds a dictionary from the PAM 250 text table(white space eliminated) Same procedure as HW 6. 2 Initialize table and backtrack Fill in scores and directions From backtrack start reverse alignment Reverse sequence to get alignment Modify Global alignment Ex. int(pam[string 1[a-1]][string 2[b-1]]) replaces match
Results A= Zea mays L. B= Oryza sativa indica group C= Oryza sativa japonica group 1= granule- bound starch synthase-stickiness 2=GS 3 -grain size 3=badh 2 -fragrance Scores for the second and third sequences are always higher than either one of them scoring with the first sequence : Two Oryza sativa cultivars are more closely related. Average indels for pair 1&2: 5 Average indels for pair 4&5: 20 Average indels for pair 7&8: 3 (As expected) 》〉》〉》〉》 GS 3 as the most different protein in the three
What comes after • Analyze more species from the grass family and construct a simple phylogenetic tree using alignment results • Dig into different proteins and find out more about the similarities across species. • Develop a simple version of BLAST for protein alignment, (applying it to multiple pairs of sequences at the same time).
End • Thank you.
- Slides: 9