Bioinformatics Outline What is bioinformatics Who are bioinformaticians

Bioinformatics Outline What is bioinformatics? Who are bioinformaticians? Hardware Software

What is bioinformatics?

What is bioinformatics? Someone to analyze my data The betw boring Some o e s n t e e u n t Ap f o f a e h I km b e x r o d l p p u e a o t e my d a d e thin rime com rson a k in ta ta nts ple wh g n a i t d t xa ow i g s n i e lgo l r z i p y t l e o per r a e s i n t P h a l py ms m C++ tho o o s r w o n n k bas R li o h w h ru nuxperson y m by HA javan HMM is Som s e a x t i TM f ha. L w o web eone w h w site h e o bu k s o l ilds b t Tha puter com

Who are bioinformaticians? Scientists trying to get tenure, get grants, publish papers, train students Scientists trying to help others analyze their data

Who are bioinformaticians? YOU!

Hardware

Torrent Server Recommended Torrent Server Processors - Two Six-core processors RAM - 48 GB RAM HDD Capacity - Eight 2 TB Hard drives in RAID 5 with 12 TB usable Network – Quad port gigabit NIC GPU - NVIDIA Graphic Processor Unit Chassis – Dell Precision T 7500 tower. No rack mount available. $12, 500

Computers My cluster 51 node cluster most nodes: 16 cpus, 8 cores each, 132 GB RAM, 1 TB local storage (/usr/data), infiniband interconnects (6, 528 cores; 6, 732 GB RAM; 50 TB scratch storage) 192 TB lustre FS connected to most nodes via infiniband

Computers rambox 24 processors with 6 cores each 198 MB RAM edwards. sdsu. edu lab web server 24 processors, 6 cores each 50 M RAM 19 TB RAID 6 storage 18 TB USED

Computers file servers and back up servers 4 secret servers! 48 TB backups and archival storage

Software

Software Locally installed software Remote (web) software

Local Software bioperl groopm Muscle biopython idba_ud PEAR bowtie 2 jellyfish phylip cdhit jellyfish prinseq crass last qiime diamond masurca qudaich fast. QC mauve rapsearch focus metabat scaffold_builder FOCUS metagenemark seed-servers Frag. Gene. Scan mira spades genemark MUMmer tagcleaner t. RNAscan-SE

Metagenomics Processing ad re den d ire pa ge er ts en nm ig ss l. A M na tio Pre pro ces sin g nc s Fu g in ed ic Pr Bin nin g rea ds e er st lu C en tig on G Contamination removal C tio n Taxonomic assignments

Metagenomics Quality control – Prinseq Deconseq Annotation Statistics STAMP Population genomes cr. Ass FOCUS metabat Real time metagenomics Contig. Clustering mg-rast Super FOCUS

Metagenomics Processing Contig clustering Preprocessing Abundance. Bin Compost. Bin concoct cr. Ass tetra FASTQC Fast. X Toolkit fit. GCP NGS QC Toolkit Non-pareil Prinseq QC-Chain Streaming Trim Taxonomic assignment CARMA FOCUS KRAKEN LMAT MEGAN Metaplan my. Taxa Phylopythia. S phymmbl RAIphy TACOA Taxy Gene Prediction Frag. Gene. Scan Glimmer. MG Meta. Gene. Annotator Meta. Gene. Mark Meta. Gun Orphelia Prodigal Functional assignment CLAMS Sequedex Di. Sc. RIBin. ATE SORTITEMS genometa SPANNER GSMer SPHINX PPLACER Tax. SOM RTMg Treephyler
- Slides: 16