GCG vs EMBOSS Gary Williams Which is better
GCG vs EMBOSS Gary Williams
Which is better GCG or EMBOSS? n n You must decide for yourselves You may find other packages that do what you want Use the tools that do the job This is a comparison of GCG and EMBOSS to help you decide
Interfaces n Web u W 2 H available for both u EMBOSS W 2 H still has rough edges u PISE u Others under development n X-Windows u GCG - Seqlab u EMBOSS - SPIN, (+ others coming) n Telnet/xterm/Character-based u emnu
Command line is very similar n n n The UNIX command line interfaces of GCG and EMBOSS are very similar. You type the name of the program You can add any options you want to the command-line Press the RETURN key Any mandatory information that was not on the command-line will be prompted for.
GCG command-line % name -other=thing This is the name program that reads a sequence and writes out something. NAME what sequence ? embl: hsfau 1 Begin (* 1 *) ? End (* 2016 *) ? Reverse (* No *) ? What should I call the output (* hsfau. name *) ?
EMBOSS command-line % name -other thing Reads in sequences and writes a thing Input sequence(s): embl: hsfau 1 Output data [hsfau 1. name]: n Use ‘-ask’ to make EMBOSS programs prompt for the start and end of sequences
Some common options n Running in scripts, don’t prompt, just fail if command-line is insufficient u GCG: -default u EMBOSS: -auto n Help on options u GCG: -check u EMBOSS: -help or -help -verbose n Boolean options (Yes/No, True/False) u GCG: -thing, -nothing u EMBOSS: -thing, -nothing, -thing=T, -thing=F, -thing=1, -thing=0, -thing=Y, -thing=N
Sequence options in EMBOSS "-sequence" related qualifiers -sbegin -send -sreverse -sask -slower -supper integer bool first base used last base used, def=seq length reverse (if DNA) ask for begin/end/reverse make lower case make upper case -sformat string input sequence format -ufo string UFO features
Sequence options in EMBOSS "-outseq" related qualifiers -osformat -ossingle string bool output sequence format separate file for each entry
EMBOSS general options -debug bool -auto bool -stdout bool -filter bool -options bool -verbose bool -help bool write debug output to program. dbg turn off prompts write standard output read standard input, write standard output prompt for required and optional values report some/full command line options report command line options
Data files n n n GCG uses ‘. . ’ to divide comments from data EMBOSS does not use ‘. . ’ In general, EMBOSS uses ‘#’ to mark a comment line Use ‘embossdata’ to extract and check on data files. As in GCG, data files copied into the current or home directory are used in preference to the originals.
List files (files of file names) n n n Similar to GCG lists files, but no ‘. . ’ Comment lines start with ‘#’ Can contain the names of other list files: # This is my list file embl: hsfau embl: ggg* myfile. seq: clone 10 file. seq @list 2
File formats n GCG u only n GCG format, MSF and RSF EMBOSS u many formats u automatically recognised u can specify using ‘: : ’ or ‘-osf’ u eg: clustal: : globin. aln -osf gcg
One file, many sequences n GCG u Only n one sequence per GCG file EMBOSS u One or more sequences per file u Default is to write all sequences to one file u -ossingle will change to writing many files u GCG, Staden and plain format files can only hold one sequence per file.
Features n GCG u No n concept of feature tables EMBOSS u Many programs now write out results as GFF u Soon, all programs that find things will write the results as GFF u GFF will become another sequence format u Programs to manipulate and display sets of features are planned u c. f. showfeat, coderet, maskfeat, diffseq
Databases n n n EMBOSS is poor at grouping many databases under one name E. G. Need a way of referring to ‘embl’ and ‘emblnew’ as one database. This will be done, but currently, a list file containing the following seems best: embl: * emblnew: *
Command line wildcards n GCG: u embl: * n - no problem EMBOSS: u embl: * - UNIX complains it can’t find the files u solution is to quote it: u “embl: *” u or: u embl: *
HELP n GCG: u genman, n EMBOSS u tfm genhelp
What program does what? n See David Martin’s list of equivalences: http: //www. no. embnet. org/Programs/SAL/EMBOSS/from. GCG. php 3 n NB this doesn’t list EMBOSS programs with no equivalent in GCG!
What EMBOSS does NOT do n n n The major deficiencies in the EMBOSS package are: BLAST, FASTA, ASSEMBLY You should use the publicly available software: u Blast - NCBI, HGMP, many other sites u Fasta - HGMP u Assembly - Staden package
What EMBOSS does do n n Giving ‘stdout’ as the output file name makes output go to the screen. Much effort is put into removing arbitrary limits. u E. g. Max. sequence length: 2 Gb u Many programs limited only by available memory n n Source code available for inspection, change and writing your own programs EMBOSS is FREE! u GNU Public Licence u Open Source Software
THE END
- Slides: 22