DNA STRUCTURE DNA Deoxyribonucleic acid DNA is a

DNA STRUCTURE

DNA - Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information. DNA is often compared to a set of blueprints or a recipe, since it contains the instructions needed to construct other components of cells, such as proteins and RNA molecules. The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in regulating the use of this genetic information. DNA is a long polymer made from repeating units called nucleotides.

Nucleotide = monomers that make up DNA and RNA polymer = large molecule consisting of similar units (nucleotides in this case) four different nucleotides distinguished by the four bases: adenine (A), cytosine (C), guanine (G) and thymine (T) a single strand of DNA can be thought of as a string composed of the four letters: A, C, G, T ctgctggaccgggtgctaggaccctgactgcc cggggccgggggtgcggggcccgctgag… Each base has a slightly different composition, or combination of oxygen, carbon, nitrogen, and hydrogen. ~3. 2 billion base pairs . in every cell build the human genome

Complementary base pairs from opposite strands are bound together by weak hydrogen bonds. A pairs with T (2 H-bonds), and G pairs with C (3 H-bonds). These bases are classified into two types; adenine and guanine are fused five- and six-membered heterocyclic compounds called purines, while cytosine and thymine are six-membered rings called pyrimidines. A fifth pyrimidine base, called uracil (U), usually takes the place of thymine in RNA and differs from thymine by lacking a methyl group on its ring. Uracil is not usually found in DNA, occurring only as a breakdown product of cytosine. The DNA double helix is stabilized by hydrogen bonds between the bases attachedto the two strands.

Purines NH 2 Adenine N N N O CH 3 (DNA) N Guanine NH N Thymine O NH 2 Uracil (RNA) NH N O N N Pyrimidines NH O NH 2 Cytosine N N O

Base Pairing Guanine And Cytosine N O - H N H e osin N N H ne ani - H Cyt N Gu N H + + + N -O N

Base Pairing CH N O + O N Thymine N N- N - H H + H Adenine N N 3 Adenine And Thymine

Base Pairing Adenine And Cytosine e osin Cyt H H H+ + H N N e n i n e d A N N -O N N

Base Pairing O H H N N H ne ani Gu N + e n i m Thy N O + N O N - - CH 3 Guanine And Thymine N H +

Three components of DNA (Nucleotide ) 1. Pentose (5 -carbon) sugar DNA = deoxyribose RNA = ribose (compare 2’ carbons) 2. Nitrogenous base Purines Adenine Guanine Pyrimidines Cytosine Thymine (DNA) Uracil (RNA) 3. Phosphate group attached to 5’ carbon

The four bases found in DNA are attached to the sugar and phosphate to form the complete nucleotide, for adenosine monophosphate. Although each individual repeating unit is very small, DNA polymers can be enormous molecules containing millions of nucleotides. For instance, the largest human chromosome, chromosome number 1, is approximately 220 million base pairs long Individual nucleotides are linked through the phosphate group, and it is the precise order, or sequence, of nucleotides that determines the product made from that gene

A Nucleotide Adenosine Mono Phosphate (AMP) Phosphate HO H+ Nucleotide OH P O Base N H O 5’ CH 2 4’ NH 2 H N O Sugar 3’ OH N N 1’ 2’ HOH Nucleoside

So far we study Chemically, DNA consists of two long polymers of simple units called nucleotides, with backbones made of sugars and phosphate groups joined by ester bonds. These two strands run in opposite directions to each other and are therefore antiparallel. It is the sequence of these four bases along the backbone that encodes information. This information is read using the genetic code, which specifies the sequence.

DNA C G C T P P P base S S S S sugar C G C T P P P G C G A phosphate S S P P P Sugar Phosphate Backbone 3´ 5´ 3´ A A C T T G Base pair T C C T G A G T A T C C A G G T A G G A C T C C T A G G T C C A 5´ hydrogen bond [read as 3 prime and 5 prime]

Nucleotides are linked by phosphodiester bonds to form polynucleotides. Phosphodiester bond Covalent bond between the phosphate group (attached to 5’ carbon) of one nucleotide and the 3’ carbon of the sugar of another nucleotide. This bond is very strong, and for this reason DNA is remarkably stable. 5’ and 3’ The ends of the DNA or RNA chain are not the same. One end of the chain has a 5’ carbon and the other end has a 3’ carbon.

5’ end 3’ end

In living organisms, DNA does not usually exist as a single molecule, but instead as a tightly-associated pair of molecules. These two long strands entwine like vines, in the shape of a double helix. The nucleotide repeats contain both the segment of the backbone of the molecule, which holds the chain together, and a base, which interacts with the other DNA strand in the helix. In general, a base linked to a sugar is called a nucleoside and a base linked to a sugar and one or more phosphate groups is called a nucleotide. If multiple nucleotides are linked together, as in DNA, this polymer is called a polynucleotide

DNA: terminology Nucleoside = Nucleobase + Pentose Nucleotide = Nucleobase + Pentose + Phosphate Group free base nucleoside nucleotide Adenine (A) Adenosine monophosphate (AMP) Guanine (G) Guanosine monophosphate (GMP Cytosine (C) Cytidine monophosphate (CMP) Thymine (T) Thymidin monophosphate (TMP) base sugar nucleoside base phosphate(s) sugar nucleotides (nucleoside mono-, di-, and triphosphates)

- - 1 nm G A T - - - • - 3. 4 nm - • - - - • G T A C G C each strand of DNA has a “direction” T A at one end, the terminal carbon Minor atom in the backbone is the 5’ carbon atom of the terminal sugar groove at the other end, the terminal G C carbon atom is the 3’ carbon atom of the terminal sugar T A therefore we can talk about the 5’ C and the 3’ ends of a DNA strand in a double helix, the strands are antiparallel(arrows drawn from the 5’ end to the 3’ end go in opposite Major directions) groove Base pairs are 0. 34 nm apart. One complete turn of the helix requires 3. 4 nm (10 bases/turn). C Sugar-phosphate backbones are not equally-spaced, resulting in G C major and minor grooves. DNA binding proteins can contact bases T A through the major groove. - • - Model Of DNA - Basis Of The Watson - Crick - A T G - - 0. 34 nm

James D. Watson & Francis H. Crick - 1953 Double Helix Model of DNA Two sources of information: 1. Base composition studies of Erwin Chargaff • indicated double-stranded DNA consists of ~50% purines (A, G) and ~50% pyrimidines (T, C) • amount of A = amount of T and amount of G = amount of C (Chargraff’s rules) • %GC content varies from organism to organism Examples: %A %T %G %C %GC Homo sapiens Zea mays Drosophila Aythya americana 31. 0 25. 6 27. 3 25. 8 31. 5 25. 3 27. 6 25. 8 19. 1 24. 5 22. 5 24. 2 18. 4 24. 6 22. 5 24. 2 37. 5 49. 1 45. 0 48. 4

Chargaff’s Rule • Erwin Chargaff showed that there was an approximate equality between the molar amounts of adenine and thymine as well as cytosine and guanine. • This did not prove that A base pairs with T or that G bases pairs with C.

James D. Watson & Francis H. Crick - 1953 Double Helix Model of DNA Two sources of information: 2. X-ray diffraction studies - Rosalind Franklin & Maurice Wilkins Conclusion-DNA is a helical structure with distinctive regularities, 0. 34 nm & 3. 4 nm.

summay The backbone of the DNA strand is made from alternating phosphate and sugar residues. The sugar in DNA is 2 deoxyribose, which is a pentose (five-carbon) sugar. The sugars are joined together by phosphate groups that form phosphodiester bonds between the third and fifth carbon atoms of adjacent sugar rings. These asymmetric bonds mean a strand of DNA has a direction. In a double helix the direction of the nucleotides in one strand is opposite to their direction in the other strand. This arrangement of DNA strands is called antiparallel. The asymmetric ends of DNA strands are referred to as the 5′ (five prime) and 3′ (three prime) ends, with the 5' end being that with a terminal phosphate group and the 3' end that with a terminal hydroxyl group. One of the major differences between DNA and RNA is the sugar, with 2 -deoxyribose being replaced by the alternative pentose sugar ribose in RNA.

The double helix is a right-handed spiral. As the DNA strands wind around each other, they leave gaps between each set of phosphate backbones, revealing the sides of the bases inside. There are two of these grooves twisting around the surface of the double helix: one groove, the major groove, is 22 Å wide and the other, the minor groove, is 12 Å wide. The narrowness of the minor groove means that the edges of the bases are more accessible in the major groove. As a result, proteins like transcription factors that can bind to specific sequences in double-stranded DNA usually make contacts to the sides of the bases exposed in the major groove.

OH P HO N O CH 2 HN N N O OH H H 2 O N O NH 2 N HO P O O CH 2 O N O CH 2 N O H N 2 N H N N CH 2 O P HO H O OH H 2 O 5’Phosphate group HO O H BONE 3’Hydroxyl group P NH 2 HO P O HO O PHATE BACK NH CH 2 O N O O OH N O N H P 3’Hydroxyl group 3 H N N O O HO O NH 2 O CH E S B A S SUGAR-PHOS D N A 5’Phosphate group

Forms of the Double Helix B DNA Z DNA A T C G A DNA G C T A 3. 9 nm 1 nm Minor groove G C T A C G A T Major groove A T C G T A A T 1. 2 nm 2. 8 nm G C T A C G A T G T A G C C A T G C 0. 26 nm 0. 9 nm G C C G GC C G G C 6. 8 nm 0. 57 nm G C 0. 34 nm 10. 4 Bp/turn +34. 6 o Rotation/Bp C G G C 11 Bp/turn +34. 7 o Rotation/Bp 12 Bp/turn -30. 0 o Rotation/Bp

B DNA & Z DNA

Even More Forms Of DNA • C-DNA: – Exists only under high dehydration conditions – 9. 3 bp/turn, 0. 19 nm diameter and tilted bases • D-DNA: – Occurs in helices lacking guanine – 8 bp/turn • E-DNA: – Like D-DNA lack guanine – 7. 5 bp/turn • P-DNA: – Artificially stretched DNA with phosphate groups found inside the long thin molecule and bases closer to the outside surface of the helix – 2. 62 bp/turn

1. Large hole in center 2. Sugar phosphate backbone is at the edge 3. Bases are displaced towards edge 1. Bases in center (no hole) 2. Phosphates at periphery 1. Bases present throughout the matrix of the helix 2. No exclusive domains for either bases or backbone bases sugar phosphate

More about different types of DNA you should know about: • Centromeric DNA (CEN) Center of chromosome, specialized sequences function with the microtubles and spindle apparatus during mitosis/meiosis. • Telomeric DNA chromosome, consist of in DNA At extreme ends of the maintain stability, and tandem repeats. Play a role replication and stability of DNA. • Unique-sequence DNA Often referred to as single-copy and usually code for genes. • Repetitive-sequence DNA May be interspersed or clustered and vary in size. SINEs short interspersed repeated sequences (100 -500 bp) LINEs long interspersed repeated sequences (>5, 000 bp) Microsatellites short tandem repeats (e. g. , TTA|TTA)

Denaturation and Renaturation • Heating double stranded DNA can overcome the hydrogen bonds holding it together and cause the strands to separate resulting in denaturation of the DNA • When cooled relatively weak hydrogen bonds between bases can reform and the DNA renatures ion t a r tu na Re na tur ATGAGCTGTACGATCGTG Denatured DNA ati o De ATGAGCTGTACGATCGTG TACTCGACATGCTAGCAC Double stranded DNA n TACTCGACATGCTAGCAC Single stranded DNA Double stranded DNA

Denaturation and Renaturation • DNA with a high guanine and cytosine content has relatively more hydrogen bonds between strands • This is because for every GC base pair 3 hydrogen bonds are made while for AT base pairs only 2 bonds are made • Thus higher GC content is reflected in higher melting or denaturation temperature ACGAGCTGCACGAGC TGCTCGACGTGCTCG 67 % GC content - High melting temperature ATGATCTGTAAGATC TACTAGACATTCTAG 33 % GC content - Low melting temperature ATGAGCTGTCCGATC TACTCGACAGGCTAG 50 % GC content Intermediate melting temperature

Determination of GC Content • Comparison of melting temperatures can be used to determine the GC content of an organisms genome • To do this it is necessary to be able to detect whether DNA is melted or not • Absorbance at 260 nm of DNA in solution provides a means of determining how much is single stranded • Single stranded DNA absorbs 260 nm ultraviolet light more strongly than double stranded DNA does although both absorb at this wavelength • Thus, increasing absorbance at 260 nm during heating indicates increasing concentration of single stranded DNA

Double Helix Model of DNA: Six main features 1. Two polynucleotide chains wound in a right-handed (clockwise) double-helix. 2. Nucleotide chains are anti-parallel: 3. Sugar-phosphate backbones are on the outside of the double helix, and the bases are oriented towards the central axis. e. g. , 5’-TATTCCGA-3’ 3’-ATAAGGCT-3’ 5’ 3’ 3’ 5’

Structure of DNA/RNA Deoxyribose and Ribose are both 5 carbon sugars

A, B, AND Z-DNA's

DNA: structure 1. 2. 3. 4. 5. 6. 7. 8. DNA is double stranded DNA strands are antiparallel G-C pairs have 3 hydrogen bonds A-T pairs have 2 hydrogen bonds One strand is the complement of the other Major and minor grooves present different surfaces Cellular DNA is almost exclusively B-DNA has ~10. 5 bp/turn of the helix

RNA (A pairs with U and C pairs with G) Examples: m. RNA t. RNA r. RNA sn. RNA messenger RNA transfer RNA ribosomal RNA small nuclear RNA secondary structure: Yeast Alanine t. RNA single-stranded Function in transcription (RNA processing) and translation

Organization of DNA/RNA in chromosomes Genome = chromosome or set of chromosomes that contains all the DNA an organism (or organelle) possesses Viral chromosomes 1. single or double-stranded DNA or RNA 2. circular or linear 3. surrounded by proteins TMV T 2 bacteriophage Prokaryotic chromosomes 1. most contain one double-stranded circular DNA chromosome 2. others consist of one or more chromosomes and are either circular or linear 3. typically arranged in a dense clump in a region called the nucleoid

Problem: Measured linearly, the Escherichia coli genome (4. 6 Mb) would be 1, 000 times longer than the E. coli cell. The human genome (3. 4 Gb) would be 2. 3 m long if stretched linearly. Solutions: 1. Supercoiling 2. Looped domains DNA double helix is twisted in space about its own axis, a process is controlled by topoisomerases (enzymes). (occurs in circular and linear DNA molecules) Fig. 2. 24

More about genome size: C value = total amount of DNA in the haploid (1 N) genome Varies widely from species to species and shows no relationship to structural or organizational complexity. Examples C value (bp) T 4 HIV-1 E. Coli Lilium formosanum Zea mays Amoeba proteus Drosophila melanogaster Mus musculus Canis familiaris Equus caballus Homo sapiens 48, 502 168, 900 9, 750 4, 639, 221 36, 000, 000 5, 000, 000 290, 000, 000 180, 000 3, 454, 200, 000 3, 355, 500, 000 3, 311, 000 3, 400, 000

Eukaryotic chromosome structure Chromatin complex of DNA and chomosomal proteins ~ twice as much protein as DNA Two major types of proteins: 1. Histones abundant, basic proteins with a positive charge that bind to DNA 5 main types: H 1, H 2 A, H 2 B, H 3, H 4 ~equal in mass to DNA evolutionarily conserved 2. Non-histones all the other proteins associated with DNA differ markedly in type and structure amounts vary widely >> 100% DNA mass << 50% DNA mass

Packing of DNA into chromosomes: 1. Level 1 Winding of DNA around histones to create a nucleosome structure. 2. Level 2 Nucleosomes connected by strands of linker DNA like beads on a string. 3. Level 3 Packaging of nucleosomes into 30 -nm chromatin fiber. 4. Level 4 Formation of looped domains. Figs. 2. 25 -29

DNA vs. RNA

DNA base: thymine (pyrimidine) monophosphate sugar: 2’-deoxyribose 5’ 4’ 3’ (5’ to 3’) 1’ 2’ 3’ linkage base: adenine (purine) 5’ linkage no 2’-hydroxyl

Structure of DNA/RNA

RNA – Ribonucleic acid In RNA the base Thymine (T) is replaced by Uracil (U). The other difference to DNA is that the sugar (Pentose) will be Ribose instead of Deoxiribose. Ribose has an additional hydroxyl group. Bases: Cytosine Guanine Adenine Uracil - C G A U Uracil RNA transmits genetic information from DNA (via transcription) into proteins (by translation). RNA is almost exclusively found in the single-stranded form.

RNA – Ribonucleic acid RNA plays several roles in biology: • Messenger RNA (m. RNA) is transcribed directly from a gene's DNA and is used to encode proteins. • RNA genes are genes that encode functional RNA molecules; in contrast to m. RNA, these RNA do not code for proteins. The best-known examples of RNA genes are transfer RNA (t. RNA) and ribosomal RNA (r. RNA). Both forms participate in the process of translation, but many others exist. • RNA forms the genetic material (genomes) of some kinds of viruses. • Double-stranded RNA (ds. RNA) is used as the genetic material of some RNA viruses and is involved in some cellular processes, such as RNA interference.

Proteins have a variety of roles that they must fulfil: 1. they are the enzymes that rearrange chemical bonds. 2. they carry signals to and from the outside of the cell, and within the cell. 3. they transport small molecules. 4. they form many of the cellular structures. 5. they regulate cell processes, turning them on and off and controlling their rates.

Proteins – Amino Acids • there are 20 different types of amino acids (see below). • different sequences of amino acids fold into different 3 -D shapes. • Proteins can range from fewer than 20 to more than 5000 amino acids in length. • Each protein that an organism can produce is encoded in a piece of the DNA called a “gene”. • the single-celled bacterium E. coli has about 4300 different genes. • Humans are believed to have about 30, 000 different genes (the exact number as yet unresolved),

Proteins – Amino Acids Name 1 -letter code Triplet Glycine G GGT, GGC, GGA, GGG Alanine A GCT, GCC, GCA, GCG Valine V GTT, GTC, GTA, GTG Leucine L TTG, TTA, CTT, CTC, CTA, CTG Isoleucine I ATT, ATC, ATA Histidine H CAT, CAC Serine S TCT, TCC, TCA, TCG, AGT, AGC Threonine T ACT, ACC, ACA, ACG Cysteine C TGT, TGC Methionine M ATG Glutamic Acid E GAA, GAG Aspartic Acid D GAT, GAC, AAT, AAC Lysine K AAA, AAG Arginine R CGT, CGC, CGA, CGG, AGA, AGG Asparagine N AAT, AAC Glutamine Q CAA, CAG Phenylalanine F TTT, TTC Tyrosine Y TAT, TAC Tryptophan W TGG Proline P CCT, CCC, CCA, CCG Terminator (Stop) * TAA, TAG, TGA Protein-Sequence (Alphabet: ACDEFGHIKLMNPQRSTVWY): MENFQKVEKIGEGTYGVVY KARNKLTGEVVALKKIRLDT ETEGVPSTAIREISLLK. . . • a typical human cell contains about 100 million proteins of about 10, 000 types

Proteins Primary protein structure is the sequence of a chain of amino acids Secondary protein structure occurs when the sequence of amino acids are linked by hydrogen bonds. Tertiary protein structure occurs when certain attractions are present between alpha helices and pleated sheets. Quaternary protein structure is a protein consisting of more than one amino acid chain.

Proteins

DNA • genes form only 1, 5% of the human genome • a gene is a segment of the DNA, that encodes the constructon plan for a protein • in humans there are ca. 30, 000 genes only