La modélisation des acides nucléiques

La modélisation des acides nucléiques

La structure des acides nucléiques

Les acides nucléiques sont des molécules biologiques au cœur des mécanismes de la vie, situées dans les cellules des organismes procaryotes et eucaryotes ainsi que dans les capsides des virus. L’acide désoxyribonucléique (ADN) est le support de l’information génétique utilisée pour le développement, le fonctionnement et la reproduction des êtres vivants. L’acide ribonucléique (ARN) participe à l’expression de cette information, c’est-à-dire à la construction de protéines à partir du modèle codé dans l’ADN. Certains virus, cependant, ne possèdent pas d’ADN et stockent leur matériel génétique directement sous forme d’ARN. Connaître précisément la structure de ces molécules est indispensable pour comprendre leur fonctionnement, et lutter contre les maladies génétiques ou le développement de virus comme le VIH. Or, les acides nucléiques présentent une grande variété structurelle. Leur composition chimique ainsi que leur formule développée sont bien identifiées et seront présentées en soussection 1.1. Toutefois, leur structure tridimensionnelle, i.e. l’arrangement et le positionnement des atomes dans l’espace, est moins complètement déterminée. Elle est l’objet d’une diversité significative énoncée en sous-section 1.2. Quoique largement étudiée avec différentes méthodes prometteuses, la détermination de la structure 3D des acides nucléiques présente toujours des enjeux, qui seront discutés en sous-section 1.3.

Briques élémentaires des acides nucléiques

Les acides nucléiques sont des polymères, c’est-à-dire de longues chaînes moléculaires constituées par la répétition de nombreuses sous-unités appelées nucléotides. La plus longue molécule d’ADN du corps humain en contient environ 220 millions (Gregory et al. 2006), pour une longueur rectiligne supérieure à 7 cm. Chaque nucléotide se compose de trois parties : un phosphate P O 3− 4 , un sucre à cinq atomes de carbone C5H10O5 (pentose) lié au phosphate par une liaison ester, et une base azotée liée au sucre par une liaison glycosidique (Figure 1.1a). L’ADN diffère de l’ARN par l’absence d’un atome d’oxygène en liaison avec le carbone C2 0 du sucre, qui lui vaut le préfixe « désoxy ». Les bases azotées sont les éléments fondamentaux du code génétique. Elles existent dans l’ADN sous quatre formes différentes : l’Adénine (A), la Guanine (G), la Cytosine (C) et la Thymine (T). On les retrouve également dans l’ARN, à l’exception de la Thymine qui est remplacée par l’Uracile (U). C’est l’ordre d’enchaînement de ces bases, ou séquence, qui code l’information génétique. C’est pourquoi elles sont souvent qualifiées de « briques élémentaires » des acides nucléiques. Les bases de l’ADN sont regroupées en « paquets », les gènes, chacun spécialisé dans le codage d’une ou plusieurs protéines. Chez les eucaryotes, les gènes sont en grande partie constitués d’introns, des séquences d’ADN ne codant généralement pas de protéine mais utilisées pour la régulation, l’organisation et la maintenance du génome. L’enchaînement des nucléotides de l’ADN et de l’ARN se produit par l’intermédiaire des phosphates, qui forment une liaison ester avec le carbone C3 0 d’un sucre et une autre liaison ester avec le carbone C5 0 du sucre suivant (liaison phosphodiester). Cela constitue la chaîne sucrephosphate, aussi appelée squelette de la molécule, sur laquelle les bases azotées sont accrochées. Cette chaîne est directionnelle : elle a une extrémité 5 0 et une extrémité 3 0 , dont les atomes C5 0 et, 1. La structure des acides nucléiques I. Social, Ethical, and Legal Considerations Knowledge of how genes are expressed and how they can be manipulated is becoming increasingly important for understanding nearly every aspect of biochemistry. Consequently, although we do not undertake a detailed discussion of these processes until Part V of this textbook, we outline their general principles in this chapter. We do so by describing the chemical structures of nucleic acids, how we have come to know that DNA is the carrier of genetic information, the structure of the major form of DNA, and the general principles of how the information in genes directs the synthesis of RNA and proteins (how genes are expressed) and how DNA is replicated. The chapter ends with a discussion of how DNA is experimentally manipulated and expressed, processes that are collectively referred to as genetic engineering. These processes have revolutionized the practice of biochemistry. 1 NUCLEOTIDES AND NUCLEIC ACIDS Nucleotides and their derivatives are biologically ubiquitous substances that participate in nearly all biochemical processes: 1. They form the monomeric units of nucleic acids and thereby play central roles in both the storage and the expression of genetic information. 2. Nucleoside triphosphates, most conspicuously ATP (Section 1-3B), are the “energy-rich” end products of the majority of energy-releasing pathways and the substances whose utilization drives most energy-requiring processes. 3. Most metabolic pathways are regulated, at least in part, by the levels of nucleotides such as ATP and ADP. Moreover, certain nucleotides, as we shall see, function as intracellular signals that regulate the activities of numerous metabolic processes. 4. Nucleotide derivatives, such as nicotinamide adenine dinucleotide (Section 13-2A), flavin adenine dinucleotide (Section 16-2C), and coenzyme A (Section 21-2), are required participants in many enzymatic reactions. 5. As components of the enzymelike nucleic acids known as ribozymes, nucleotides have important catalytic activities themselves. A. Nucleotides, Nucleosides, and Bases Nucleotides are phosphate esters of a five-carbon sugar (which is therefore known as a pentose; Section 11-1A) in which a nitrogenous base is covalently linked to C1¿ of the sugar residue. In ribonucleotides (Fig. 5-1a), the monomeric units of RNA, the pentose is D-ribose, whereas in deoxyribonucleotides(or just deoxynucleotides; Fig. 5-1b), O H H H H OH Ribonucleotides OH CH2 –2 O3 PO 49 59 39 29 19 Base O H H H H OH H CH2 –2 O3 PO 49 59 39 29 19 Base Deoxyribonucleotides (a) (b) Figure 5-1 Chemical structures of (a) ribonucleotides and (b) deoxyribonucleotides. Ribonucléotide Ribonucléotide Désoxyribonucléotide Désoxyribonucléotide 82 C H A P T E R 5 Nucleic Acids, Gene Expression, and Recombinant DNA Technology 1 Nucleotides and Nucleic Acids A. Nucleotides, Nucleosides, and Bases B. The Chemical Structures of DNA and RNA 2 DNA Is the Carrier of Genetic Information A. Transforming Principle Is DNA B. The Hereditary Molecule of Many Bacteriophages Is DNA 3 Double Helical DNA A. The Watson–Crick Structure: B-DNA B. DNA Is Semiconservatively Replicated C. Denaturation and Renaturation D. The Size of DNA 4 Gene Expression and Replication: An Overview A. RNA Synthesis: Transcription B. Protein Synthesis: Translation C. DNA Replication 5 Molecular Cloning A. Restriction Endonucleases B. Cloning Vectors C. Gene Manipulation D. The Identification of Specific DNA Sequences: Southern Blotting E. Genomic Libraries F. The Polymerase Chain Reaction G. Production of Proteins H. Transgenic Organisms and Gene Therapy I. Social, Ethical, and Legal Considerations Knowledge of how genes are expressed and how they can be manipulated is becoming increasingly important for understanding nearly every aspect of biochemistry. Consequently, although we do not undertake a detailed discussion of these processes until Part V of this textbook, we outline their general principles in this chapter. We do so by describing the chemical structures of nucleic acids, how we have come to know that DNA is the carrier of genetic information, the structure of the major form of DNA, and the general principles of how the information in genes directs the synthesis of RNA and proteins (how genes are expressed) and how DNA is replicated. The chapter ends with a discussion of how DNA is experimentally manipulated and expressed, processes that are collectively referred to as genetic engineering. These processes have revolutionized the practice of biochemistry.

NUCLEOTIDES AND NUCLEIC ACIDS

Nucleotides and their derivatives are biologically ubiquitous substances that participate in nearly all biochemical processes: 1. They form the monomeric units of nucleic acids and thereby play central roles in both the storage and the expression of genetic information. 2. Nucleoside triphosphates, most conspicuously ATP (Section 1-3B), are the “energy-rich” end products of the majority of energy-releasing pathways and the substances whose utilization drives most energy-requiring processes. 3. Most metabolic pathways are regulated, at least in part, by the levels of nucleotides such as ATP and ADP. Moreover, certain nucleotides, as we shall see, function as intracellular signals that regulate the activities of numerous metabolic processes. 4. Nucleotide derivatives, such as nicotinamide adenine dinucleotide (Section 13-2A), flavin adenine dinucleotide (Section 16-2C), and coenzyme A (Section 21-2), are required participants in many enzymatic reactions. 5. As components of the enzymelike nucleic acids known as ribozymes, nucleotides have important catalytic activities themselves. A. Nucleotides, Nucleosides, and Bases Nucleotides are phosphate esters of a five-carbon sugar (which is therefore known as a pentose; Section 11-1A) in which a nitrogenous base is covalently linked to C1¿ of the sugar residue. In ribonucleotides (Fig. 5-1a), the monomeric units of RNA, the pentose is D-ribose, whereas in deoxyribonucleotides(or just deoxynucleotides; Figure 5-1 Chemical structures of (a) ribonucleotides and (b) deoxyribonucleotides. Ribonucléotide Ribonucléotide Désoxyribonucléotide Désoxyribonucléotide (a) their N9 atoms, whereas pyrimidines do so through their N1 atoms (note that purines and pyrimidines have dissimilar atom numbering schemes). B. The Chemical Structures of DNA and RNA The chemical structures of the nucleic acids were elucidated by the early 1950s largely through the efforts of Phoebus Levene, followed by the work of Alexander Todd. Nucleic acids are, with few exceptions, linear polymers of nucleotides whose phosphate groups bridge the 3¿ and 5¿ positions of successive sugar residues (e.g., Fig. 5-2).The phosphates of these polynucleotides, the phosphodiester groups, are acidic, so that, at physiological pH’s, nucleic acids are polyanions. Polynucleotides have directionality, that is, each has a 39 end (the end whose C3¿ atom is not linked to a neighboring nucleotide) and a 59 end (the end whose C5¿ atom is not linked to a neighboring nucleotide). a. DNA’s Base Composition Is Governed by Chargaff’s Rules DNA has equal numbers of adenine and thymine residues (A 5 T) and equal numbers of guanine and cytosine residues (G 5 C). These relationships, known as Chargaff’s rules, were discovered in the late 1940s by Erwin Chargaff, who first devised reliable quantitative methods for the separation and analysis of DNA hydrolysates. Chargaff also found that the base composition of DNA from a given organism is characteristic of that organism; that is, it is independent of the tissue from which the DNA is taken as well as the organism’s age, its nutritional state, or any other environmental factor. The structural basis for Chargaff’s rules is that in double-stranded DNA, G is always hydrogen bonded (forms a base pair) with C, whereas A always forms a base pair with T (Fig. 1-16). DNA’s base composition varies widely among different organisms. It ranges from ,25% to 75% G 1 C in different Figure 5-2 Chemical structure of a nucleic acid. The sugar atom numbers are primed to distinguish them from the atomic positions of the bases. By convention, a polynucleotide sequence is written with its 5¿ end at the left and its 3¿ end to the right. Thus, reading left to right, the phosphodiester bond links neighboring ribose residues in the 5¿ S 3¿ direction. The above sequence may be abbreviated ApUpCpGp or just AUCGp (where a “p” to the left and/or right of a nucleoside symbol indicates a 5¿ and/or a 3¿ phosphate group, respectively; see Table 5-1 for other symbol definitions). The corresponding deoxytetranucleotide, in which the 2¿-OH groups are each replaced by H atoms and the base uracil (U) is replaced by thymine (5-methyluracil; T), is abbreviated d(ApTpCpGp) or d(ATCGp). (b) A schematic representation of AUCGp. Here a vertical line denotes a ribose residue, its attached base is indicated by the corresponding one-letter abbreviation, and a diagonal line flanking an optional “p” represents a phosphodiester bond. The atom numbering of the ribose residues, which is indicated here, is usually omitted. The equivalent representation of deoxypolynucleotides differs only by the absence of the 2¿-OH groups and the replacement of U by T. (b) Figure 1.1. Formule développée des acides nucléiques : (a) nucléotides d’ARN (ribonucléotide) et d’ADN (désoxyribonucléotide) et (b) exemple d’une chaîne d’ARN de séquence AUCG (ou d’une chaîne d’ADN de séquence ATCG avec les informations entre parenthèses). Schémas de Voet et Voet (2011). respectivement, C3 0 ne sont liés à aucun nucléotide. Par convention, on écrit un acide nucléique dans le sens 5 0 −→ 3 0 . La Figure 1.1b représente un exemple d’enchaînement des quatre bases A, U (T pour l’ADN), C et G. Les phosphates sont à l’origine de l’acidité du squelette, ainsi que de sa charge négative. L’expression des gènes, autrement dit la construction de protéines à partir de « morceaux de séquence d’ADN », est généralement décrite chez les procaryotes et les eucaryotes par une procédure schématique appelée le dogme central. Une enzyme, l’ARN polymérase, reconnaît le début 6 Chapitre 1. La modélisation des acides nucléiques du gène, le promoteur, et se fixe dessus. Elle parcourt ensuite le gène et fabrique l’ARN dit messager : cette étape est appelée la transcription. L’ARN messager transporte alors l’information génétique jusqu’au ribosome, un important complexe d’ARN (dit ribosomique) et de protéines. Le ribosome parcourt l’ARN messager, et une protéine est construite par l’enchaînement d’acides aminés apportés par l’ARN de transfert : c’est la traduction. 1.2 Structure tridimensionnelle La structure tridimensionnelle, ou conformation, est l’arrangement et le positionnement des atomes dans l’espace tridimensionnel. Elle présente, dans le cas des acides nucléiques, une telle diversité qu’il est encore difficile de la caractériser entièrement. Elle varie notamment en fonction de la séquence de la molécule et des contraintes qui lui sont imposées par son environnement physico-chimique.

Structure 3D de l’ADN Double-hélice

Watson et Crick (1953) ont découvert et caractérisé la conformation d’acide nucléique qui est probablement la plus connue : la double-hélice d’ADN-B. Elle se présente sous la forme de deux brins de polynucléotides dont les chaînes sucre-phosphate s’enroulent autour d’un axe commun en formant des hélices droites d’environ 2 nm de diamètre et 3.3 nm de pas (Figure 1.2a). Ces 2 1. DNA MECHANICS provides redundant storage of genetic information and also facilitates DNA replication, via the use of each chain as a template for assembly of a new complementary polynucleotide chain. O P −O O O 5′ 4′ O 3′ 2′ 1′ N N N NH2 N O P −O O O 5′ 4′ O 3′ 2′ 1′ N O NH O O P −O O O 5′ 4′ O 3′ 2′ 1′ N N NH2 NH O N O P −O O O 5′ 4′ O 3′ 2′ 1′ N O N NH2 a 2 nm 3.6 nm 10.5 bp M m b Figure 1.1: DNA double helix structure. (a) Chemical structure of one DNA chain, showing the deoxyribose sugars (note numbered carbons) and charged phosphates along the backbone, and the attached bases (A, T, G and C following the 5’ to 3’ direction from top to bottom). (b) Space-filling diagram of the double helix. Two complementary-sequence strands as in (a) noncovalently bind together via base-pairing and stacking interactions, and coil around one another to form a regular helix. The two strands can be seen to have directed chemical structures, and are oppositely directed. Note the different sizes of the major (M) and minor (m) grooves, and the negatively charged phosphates along the backbones (dark groups). The helix repeat is 3.6 nm, and the DNA cross-sectional diameter is 2 nm. Image reproduced from Ref. [3]. 1.1.2 Physical properties of the DNA double helix The basic physical properties of DNA molecules found inside cells are key to thinking about how cellular machinery reads, replicates, repairs, and stores them. Length Double-helix DNAs in vivo are long polymers: the chromosome of the λ bacteriophage (a virus that infects E. coli bacteria) is 48502 base pairs (bp) or about 16 microns in length; the E. coli bacterial chromosome is 4.6×106 bp (4.6 Mb) or about 1.5 mm long; small E. coli “plasmid” DNA molecules used in genetic engineering are typically 2 kb to 10 kb (0.7 to 3 microns) in length; and the larger chromosomal DNAs in human cell nuclei are roughly 200 Mb or a few cm in length. Each base pair contributes about 0.34 nm of length (or “rise”) to a double helix DNA. P G 3.6 nm 10.5 pb 2 nm 3.3 nm (a) The Watson–Crick structure of B-DNA has the following major features: 1. It consists of two polynucleotide strands that wind about a common axis with a right-handed twist to form an ,20-Å-diameter double helix (Fig. 5-11). The two strands are antiparallel (run in opposite directions) and wrap around each other such that they cannot be separated without unwinding the helix. The bases occupy the core of the helix and the sugar–phosphate chains are coiled about its periphery, thereby minimizing the repulsions between charged phosphate groups. 2. The planes of the bases are nearly perpendicular to the helix axis. Each base is hydrogen bonded to a base on the opposite strand to form a planar base pair (Fig. 5-11). It is these hydrogen bonding interactions, a phenomenon known as complementary base pairing, that result in the specific association of the two chains of the double helix. 3. The “ideal” B-DNA helix has 10 base pairs (bp) per turn (a helical twist of 36° per bp) and, since the aromatic bases have van der Waals thicknesses of 3.4 Å and are partially stacked on each other (base stacking, Fig. 5-11), the helix has a pitch (rise per turn) of 34 Å. The most remarkable feature of the Watson–Crick structure is that it can accommodate only two types of base pairs: Each adenine residue must pair with a thymine residue and vice versa, and each guanine residue must pair with a cytosine residue and vice versa. The geometries of these A ? T and G ? C base pairs, the so-called Watson–Crick base pairs, are shown in Fig. 5-12. It can be seen that both of these base pairs are interchangeable in that they can replace each other in the double helix without altering the positions of the sugar–phosphate backbone’s Section 5-3. Double Helical DNA 89 Figure 5-11 Three-dimensional structure of B-DNA. The repeating helix in this ball-and-stick drawing is based on the X-ray structure of the self-complementary dodecamer d(CGCGAATTCGCG) determined by Richard Dickerson and Horace Drew. The view is perpendicular to the helix axis. The sugar–phosphate backbones (blue with blue-green ribbon outlines) wind about the periphery of the molecule in opposite directions. The bases (red), which occupy its core, form hydrogen bonded base pairs. H atoms have been omitted for clarity. [Illustration, Irving Geis. Image from the Irving Geis Collection, Howard Hughes Medical Institute. Reprinted with permission.] Figure 5-12 Watson–Crick base pairs. The line joining the C1¿ atoms is the same length in both base pairs and makes equal angles with the glycosidic bonds to the bases. This gives DNA a series of pseudo-twofold symmetry axes (often referred to as dyad axes) that pass through the center of each base pair (red line) and are perpendicular to the helix axis. Note that A ? T base pairs associate via two hydrogen bonds, whereas C ? G base pairs are joined by three hydrogen bonds. [After Arnott, S., Dover, S.D., and Wonacott, A.J., Acta Cryst. B25, 2192 (1969).] See Kinemages 2-2 and 17-2 Grand sillon (G) Petit sillon (G) Grand sillon (G) Petit sillon (P) (b) 42 3. DNA Structure as Observed in Fibers and Crystals twisted and tilted with respect to the helix axis (Fig. 3.5a), and are displaced nearly 5 A˚ from it, in striking contrast to the B helix. The helical rise is as a consequence much reduced, to 2.54 A˚ , compared to 3.4 A˚ for canonical B-DNA. The helix is wider than the B one and has an 11 base-pair helical repeat. The combination of base-pair tilt with respect to the helix axis and base-pair displacement from the axis results in very different groove characteristics for the A double helix compared to the B form (Fig. 3.5b). This also results in the center of the A double helix being a hollow cylinder (Fig. 3.5c). The major groove is now deep and narrow, and the minor one is wide and very shallow. Figure 3.3 (a) The structure of canonical B-DNA, from fiber diffraction analysis; (b) View down the helix axis; twisted and tilted with respect to the helix axis (Fig. 3.5a), and are displaced nearly twisted from it, in striking contrast to the B helix. The helical rise is as a consequence , compared to 3.4 A for canonical B-DNA. The helix is wider than the B one and has an 11 base-pair helical repeat. The combination of base-pair tilt with respect to the helix axis and base-pair displacement from the axis results in very different groove characteristics for the A double helix compared to the B form (Fig. 3.5b). This also results in the center of the A double helix being a hollow cylinder (Fig. 3.5c). The major groove is now deep and narrow, and the minor one is wide and very shallow. (c) Figure 1.2. Double-hélice d’ADN-B. (a) Représentation de van der Waals, image de Goodsell (1992). (b) Appariement G-C en représentation boule-bâton, schéma de Voet et Voet (2011). (c) Représentation en ruban, image de Neidle (2008). deux brins sont antiparallèles : ils évoluent en sens opposés, et l’extrémité 3 0 de l’un est en face de l’extrémité 5 0 de l’autre. Ils diffèrent par une symétrie pseudo-dyadique : le squelette de l’un 1. La structure des acides nucléiques 7 peut être obtenu à partir de celui de l’autre par une rotation de 180 degrés autour d’un axe (judicieusement choisi) perpendiculaire à l’axe des hélices. Enfin, ils sont accrochés entre eux par des liaisons hydrogène entre leurs bases azotées. Les bases A et T s’apparient en formant deux liaisons hydrogène, et les bases C et G font de même avec trois liaisons hydrogène. Les paires de bases ainsi constituées sont presque planes et perpendiculaires à l’axe des hélices. Elles sont empilées les unes sur les autres avec un décalage angulaire quasi constant d’environ 36 degrés. La double-hélice possède deux sillons extérieurs qui s’enroulent entre les chaînes sucrephosphates. Ces sillons sont de tailles inégales : le petit sillon correspond aux bords des paires de bases où le carbone C1 0 ressort, tandis que le grand sillon correspond au bord opposé (Figure 1.2b). Pour visualiser plus facilement la structure de cette double-hélice, il est courant de représenter le squelette sucre-phosphate par un ruban qui décrit sa trajectoire et les sucres et bases par des blocs (Figure 1.2c). Cependant, l’ADN-B n’est pas la seule structure en double-hélice. Il en existe d’autres, comme l’ADN-A qui présente une différence de taille plus marquée entre les sillons, et des paires de bases inclinées de 19 degrés par rapport au plan perpendiculaire à l’axe des hélices. Un autre exemple, l’ADN-Z, fait intervenir des hélices gauches avec des sillons de tailles équivalentes.