Tour of Bio BIKE Motif Discovery Bio BIKE
Tour of Bio. BIKE Motif Discovery Bio. BIKE (Biological Integrated Knowledge Environment) combines: Knowledge: All known genomes of interest to a specific scientific community. Analytical Tools: A powerful graphical language that permits creative expression to those with no programming experience Various Bio. BIKEs are available through: http: //biobike. csbc. vcu. edu This demonstration is best viewed as a slide show, enabling you to simulate a session and make changes in cursor more Click anywhere to position go on to theobvious. next slide To do this, click Slide Show on the top tool bar, then View show.
Tour of Bio. BIKE Motif Discovery In this tour, you'll see how to: Slide 4 • Log onto Cyano. BIKE 11, 18 • Find a gene from a short description of it 15 • Speak Bio. BIKE (the language of Cyano. BIKE) 29 • Find orthologs of a gene 44 • Obtain upstream sequences of a gene or list of genes 49 • Search a set of sequences for common motifs You can go to any slide in this tour at any time by typing the slide number and pressing Enter. Or go to the next slide by clicking the mouse.
Coming Attractions! If you like this tour, you might also try: Sequence Analysis • Display a sequence • Find similar sequences amongst metagenomes, known viruses, everything in Gen. Bank • Make a sequence alignment from a set of similar sequences • Construct a phylogenetic tree Analysis of Metagenome Aggregates • Find the number of contigs in a metagenome • Find the average contig size in a metagenome • Find the average GC content within a metagenome • Visualize the distribution of GC content amongst the contigs of a metagenome
To get to Cyano. BIKE, click a link to one of the public sites/ To see more tours like this one, click Guided tours of Bio. BIKE Access this site at htpp: //biobike. csbc. vcu. edu
Your login name (no spaces) - Enter anything you like as a login name, but no spaces or symbols. - EMail address is optional but may be useful if you want to send in questions or complaints. - Click New Login
Function palette Workspace The Bio. BIKE environment is divided into three areas as shown. You'll bring functions down from the function palette to the workspace, execute them, and note the results in the results window Results window
The buttons of the Function Palette can be loosely categorized as follows: • Green buttons, sources of functions • Blue buttons, sources of data. (You won’t see all the buttons until you’ve used Bio. BIKE a while) • Black buttons, sources of various Bio. BIKE actions • Help: On-line help (general)
Special emphasis on a very important button: HELP! On-line help (general) The Help search facility might be a good first choice. Type in a word or two and press Enter.
Many important actions are available from the Session menu. Here are some: • Workspace – save: Allows you to save what you have done and continue working at a later date • Workspace – list: Shows you the sessions you’ve saved. • Execution log - current: View a record of what you have done, moment by moment, in the current session. • Execution log - all: View a list of all your sessions to view if you like.
A couple of possibly useful buttons in the workspace: Undo (return to workspace before last action) Redo (Get back the workspace you undid)
Our Story The gln. A gene in the cyanobacterium Anabaena PCC 7120 encodes glutamine synthetase, a critical enzyme in nitrogen metabolism. The transcription of this gene is regulated by the availability of a nitrogen source. Suppose you want to understand the molecular mechanism by which the regulation takes place.
Our Story Your strategy is to presume that this highly conserved gene possesses the same upstream regulatory sequences in related organisms. You will collect orthologs of gln. A in related organisms, collect their upstream sequences, and examine them for a conserved sequence motif. The first step is to get in hand one gln. A gene, the one you already know about in Anabaena. Mouse over the GENES-PROTEINS button.
Mousing over a button in the function pallette causes a menu to appear. You know the unofficial name of the gene, "gln. A", and from that you want to get the official name of the gene described by "gln. A". Mouse over Genes-Proteins and click GENESDESCRIBED-BY.
A GENE-DESCRIBED-BY function box is now in the workspace. Before continuing with the problem, let's consider what function boxes mean.
General Syntax of Bio. BIKE Function-name Argument (object) Keyword object The basic unit of Bio. BIKE is the function box. It consists of the name of a function, perhaps one or more required arguments, and optional keywords and flags. A function may be thought of as a black box: you feed it information, it produces a product. Flag
General Syntax of Bio. BIKE Function-name Argument (object) Keyword object Flag Function boxes contain the following elements: • Function-name (e. g. SEQUENCE-OF or LENGTH-OF) • Argument: Required, acted on by function • Keyword clause: Optional, more information • Flag: Optional, more (yes/no) information
General Syntax of Bio. BIKE Function-name Argument (object) Keyword object Flag … and icons to help you work with functions: • Option icon: Brings up a menu of keywords and flags • Action icon: Brings up a menu enabling you to execute a function, copy and paste, information, get help, etc Clear/Delete icon: Removes information you entered or removes box entirely •
Back to our story. Click on the Argument box to open it for entry…
…then type in the description you know, "gln. A".
A very common error is to forget to close an entry box. A function can't be executed until all entry boxes are closed, either by pressing Enter or Tab. Do one or the other.
Left to it's own devices, Bio. BIKE will search every organism it knows about for genes described by "gln. A". You'll get a much faster response if you modify the function to search only Anabaena. Do this by mousing over the Option Icon…
… and clicking the IN option, and then click Apply.
Then open the IN value box for entry by clicking on it.
You could type in the official name or nickname of the organism, but if you don't happen to know it, find it by mousing over the Organism button…
Anabaena PCC 7120 is a nitrogenfixing cyanobacterium. Mouse over that choice.
That causes the name to appear in the selected box. The function is now ready for execution. Mouse over the Action Icon…
… and click Execute.
A result now appears in the Result Window and more intelligibly in a popup window. With the name of the gene in hand, you want to find all orthologs of it in cyanobacteria, to extract their upstream sequences. Mouse over the GENES-PROTEINS button…
… and click ORTHOLOG-OF.
Open the argument box of the function for entry by clicking on it…
And type in the nickname of Anabaena's gln. A gene, alr 2328.
Close the entry box by pressing Enter or Tab…
… and execute the function.
You could execute this functions as is, but it would take over a minute to calculate all the orthologs. Instead, mouse over Options, click Lookup-only, and click Apply. This tells the function to look up precalculated orthologs. You’ll get the answer almost instantaneously. You’ll miss out on lots of orthologs, but that’s all right. Execute the function (as usual), and…
Lots of orthologs! (You’ll also get a lot of warnings about organisms not in the lookup-table, but don’t worry about that) It would be helpful to be able to refer to them as a group. To define such a group, mouse over the DEFINITION button…
… and click the DEFINE function.
The DEFINE function asks for two things: the name of the variable to be defined and the value it is to be given. The value will be all those orthologs. The name is up to you. Click on the variable argument box to open it up for entry…
… and type a name that makes sense to you, closing the box afterwards by pressing Tab.
Tab closes the entry box and automatically opens the next one (if it exists). There are many ways of getting that list of orthologs. You could copy and paste that list from the Result pane to the open value box, but it might be more clear to cut/paste the function that produced it. Let me show you. Click on the Action icon of ORTHOLOG-OF.
Click Cut. The function box will disappear but will be retained in the Bio. BIKE clipboard.
… then mouse over the Action Icon of the value argument box and click Paste.
The definition is now complete (and reads well for future reference). But it will not take effect until the function is executed Click the Action icon (of DEFINE, not ORTHOLOG-OF), and click Execute.
Notice that a new VARIABLES button appears (unless you’ve previously defined a variable). We'll use it later to access the newly defined list. For now, we need to get upstream regions from all those genes. Mouse over the GENES-PROTEINS button…
… then mouse over Genes-neighborhood and then click SEQUENCE-UPSTREAM-OF.
The function seems to call for a gene as the argument. However, like most Bio. BIKE functions, this one has the following useful property: - Give it a single item, it returns a single answer - Give it a list of items, it returns a list of answers. Open the argument box for input.
We want the function to act on the group of genes we just defined. Mouse over the VARIABLES button…
… and click the name of the group you just defined. That will bring the group into the selected box.
We could execute the completed function, and then take those upstream sequences and look within them for sequence motifs. Alternatively we could skip the intermediate step and have the sequences go directly into the motif finder. To do that, we surround the function with the motif finder. To surround, mouse over the Action Icon…
… and click Surround with.
The entire function is now selected. We need to specify that we want to surround the function with a function that searches for motifs within sequences. Mouse over the STRINGS-SEQUENCES button…
… mouse over Bioinformatic-Tools and click MOTIFS-IN. (By the way, if the categories aren't sufficiently intuitive, you can always find functions alphabetically, through the ALL button on the Function Palette)
The upstream sequences returned by SEQUENCES-UPSTREAM-OF will now be given to the MOTIFS-IN function. Executing that function will execute everything inside of it. You might think it's time to go over to the Action Icon of MOTIFS-IN and execute …
… but hold that mouse! MOTIFS-IN, unless told otherwise, looks for amino-acid motifs. Eventually we'll get around to teaching it how to distinguish DNA from protein sequences automatically, but for now, mouse over the Options Icon…
… and click the DNA option and then Apply.
Now execute the function.
Notice “Submitted!" in the message bar. MOTIFS-IN might take 10 -20 seconds to execute. Don't try to do any other function during that time. MOTIFS-IN formats the sequences in a way a motif-finding program (Meme) likes to see and supplies its results in a separate window.
A new window opens, which you can save to your own computer if you like. For now just scroll down.
Meme has found a motif with a very good E-value. It provides a histogram, showing the information content of each position of the motif. The higher the bar, the more conserved the position. Scroll further.
You get the sequence of the motif for each upstream sequence in which it was found. Scroll further.
Meme also found a second good motif. Scroll to the end of the file.
At the end you get a map of all the motifs found and where in the upstream sequences they appear. Evidently, Motif 2 and 3, when present, generally precedes Motif 1.
Bio. BIKE You've seen a knowledge environment in which: • Knowledge and tools are integrated. Data conversion is seldom necessary. • The language is uniform, facilitating access to many popular tools through a common interface. • The language is as flexible as any general purpose language, permitting construction of new tools. • The programming language is easy to pick up, using graphical conventions familiar to those who don't program. • The environment is well suited for teaching the concepts of molecular biology through computational experiment.
Collaborators Michael Chaplin Johnny Casey (Sequoia Cons. ) Sarah Cousins (now Wistar Institute) Michiko Kato (now UC Davis) Hailan Liu JP Massar (Berkeley) James Mastros (now Philip Morris) Bogdan Mihai John Myers (Sequoia Cons. ) Nihar Sheth Jeff Shrager (Carnegie Inst. ) Arnaud Taton Hien Truong Andy Whittam (Washington & Jefferson) … and many participating students Development of Bio. BIKE was funded by a grant from the National Science Foundation Contact Jeff Elhai, Center for the Study of Biological Complexity, Virginia Commonwealth University (E-mail) Elhai. J@VCU. Edu, (Tel) 804 -828 -0794
- Slides: 63