If one thinks of the genetic material within one of our cells as a
storehouse of information, then the following analogies may be made: a
local library can be used to
represent all of the genetic information within a cell - all of the
DNA
(deoxyribonucleic acid). The books within the library can be
used to represent the
individual chromosomes within a cell. The sentences within
the books can be used to
represent the genes located along the length of each chromosome.
Using
these analogies, we'll discuss what a gene is, and how it works. The
strict definition of a gene is:
a region of DNA within a chromosome which when expressed (transcribed), leads
to the production of RNA (ribonucleic acid). The DNA within our cells
contains the information for everything which occurs within each
cell - every action, every substance made, every event, every response
- everything! The very same thing holds true for single-cell
organisms like a bacterium - the DNA within a bacterial cell has ALL of
the information necessary for the life of the cell.
The essence of a chromosome is the DNA.
Humans have 46 chromosomes (46 books in the library), and bacteria have
only one (one book - maybe that is why bacteria are so small - no, not
really...). The DNA within a chromosome is double-stranded, and the two
strands pair very specifically with one another to form a kind of braid
known as the DNA helix. Each braid is made-up of individual substances
called nucleotides (which themselves are made-up of a structure
called a base, a phosphate molecule,
and a kind of sugar molecule). The nucleotides are linked together in a
long line - there are only
four different kinds of DNA nucleotides, e.g., adenine [A], thymidine [T],
guanine [G], and cytosine [C]. However, these compounds can be arranged
within the line in any order, and repeats are also allowed. So, one
chromosome may be made of hundreds of thousands of each of these four
nucleotides
linked together. The opposite braid is also made of these substances,
but, there is a rule which is followed: in opposing braids (strands)
A always pairs with T and C always pairs with
G across the two braids (kind-of like a special zipper). Here is
another analogy: imagine a spiral staircase. Each step represents
two nucleotides which are paired across from one another - say, A with T, a
kind of tongue-in-groove arrangement. The hand-rail section between
posts on each side of
the staircase represents individual phosphate molecules bonded to a
next-in-line sugar molecule, and each of the
posts which attach the hand-rail to the steps represent sugar
molecules. Take a look at some electron microscope pictures of DNA:
Analyses of
DNA. Lots of it in there, isn't it?
The gene region (there may be thousands of genes on one chromosome) is accessible to proteins, and there are different kinds of RNA which can be generated from the DNA template through the action of these proteins. There are three major kinds of RNA which may be made - depending on the particular gene - these are: ribosomal RNA (rRNA), transfer RNA (tRNA), and messenger RNA (mRNA). Ribosomal RNA (many different kinds from many different rRNA genes) is part of the structure of the protein-synthesizing machinery inside a cell - called a ribosome. Transfer RNA (tRNA) consists of several different kinds also, each one of which "carries" a particular amino-acid to the ribosome for linkage with other amino-acids to form a protein. Messenger RNA (mRNA) consists of many, many different kinds, each of which contains information which allows the ribosome machinery to connect individual amino-acids in a precise order. If the order of connection of amino-acids is different, then a different protein will be made. We will focus on messenger RNA with regard to gene expression (the synthesis of RNA from a DNA template).
As was said before, the DNA of a gene is accessible - but - accessibility is usually regulated (you need a card to check-out a book from the library). In order to understand this regulation, we need to discuss some details of a gene. If one looked at the sequence of the nucleotides in one of the strands where the gene is located (only one-half of each step in the staircase), the sequence would appear to be just a bunch of nucleotides linked in some apparently random order. However, such is not the case. It is precisely this order which contains the critical information - the code. A gene is therefore much like a sentence in our language - a sentence, like a gene, has a beginning (start here) and an end (stop here) - there are keys which allow recognition. A sentence is made of words - as is a gene. In a gene the words are all only three letters-long, and there are only 4 letters in the alphabet. However, the words are arranged in a three-letter code which "says" - "put the amino-acid right here in this position." Because there are 4 different nucleotides, and any combination of three-letters is allowed, there are 64 possible (4-cubed) sets of three letters. Using our analogies, the local library is the DNA (all of the chromosomes). You go inside, show your card, and you are allowed to open a book (one of the chromosomes). You look at a sentence (gene), however, the sentence is written in French (DNA - you can recognize individual letters, but you do not understand the words). Luckily for you, there is a copy machine which miraculously converts the letters (DNA nucleotides) in the sentence to English (RNA) when the sentence is copied. Now, you (the ribosome and tRNA) can read and understand the text. The copy machine of the cell (RNA polymerase enzyme) reads the text (DNA), and generates a copy (mRNA) both readable and understandable by the ribosome, and the tRNA.
What are the parts of a gene? Let's look at a bacterial gene which
encodes mRNA as an
example. The parts are, from left to right: promoter, transcription
start, translation start, open-reading-frame, translation stop,
transcription stop. The promoter is comprised of: a DNA sequence 35
nucleotides prior to the beginning of transcription (-35) and a DNA
sequence
10 nucleotides prior to the beginning of transcription (-10, also known as
the Pribnow sequence in bacteria, and in eucaryotes, the
Goldberg-Hogness or TATA box).
Here is how a
protein called Catabolite Activator Protein [CAP - in blue] interacts with
DNA (about position -50) near the promoter region of the bacterial gene
which encodes the enzyme which degrades the sugar lactose into the two
sugars, glucose and galactose. CAP requires association with the
chemical cyclic-AMP in order to be able to bind to the DNA at this site
(called the CAP DNA-binding site). When certain bacteria are grown in
the absence of glucose, but the presence of lactose, one result is the
generation inside the cell of a chemical called cyclic-AMP. When this
compound binds to CAP, then CAP can bind to the DNA at position -50. As you
can see, this interaction significantly distorts the DNA structure in this
region. It is ONLY through these interactions,
that the promoter is made accessible to the enzyme RNA polymerase,
and transcription of the gene can begin. Thus, expression of what is called
the Lac operon (a tandem multi-gene system under the control of a single
promoter) is regulated.
If you look at
Bigger Pictures
you will see some protein/DNA interactions.
The beginning of transcription (making a complimentary base-pair copy of the DNA into RNA) is called the +1 site. After the +1 site is a sequence called the Shine/Dalgarno sequence, followed by a 3-letter translation start sequence (TAC in the DNA); then, there is a long line of nucleotides in-a-row called the ORF (open-reading-frame) which is comprised of the three-letter "words." Near the end of the ORF is a 3-letter sequence (ATT, or ATC, or ACT) which is called the translation stop signal, and pretty much at the end of the gene is a sequence which causes the stop of transcription. Genes which encode rRNA or tRNA do not have a Shine/Dalgarno sequence, nor a translation start/stop sequence
The enzyme, RNA polymerase, has a structure which allows it to bind to the DNA, but only if the DNA sequence is appropriate - this appropriate region of the gene is called the promoter. The enzyme gets in-between the two strands, and stretches them apart (yep, they are flexible). Once the enzyme binds, because of the orientation of the promoter sequence, the enzyme will "face" in the correct direction (at the beginning of the sentence), and will begin to physically move along one of the strands (the sense strand) of the gene, unwinding the DNA helix as it moves. As it moves, the enzyme will "read" each letter of the DNA sequence (sentence) and will begin synthesis (complimentary base-pair copying) of the DNA into RNA at the correct starting place, following the same nucleotide-pairing rules (A with U and C with G - for some reason, T is never found in mRNA - but may be found in transfer-RNA). While the RNA is being made, behind where the enzyme is, the DNA will re-wind (each half of the DNA "steps" will re-pair) which forces the RNA to dissociate from the DNA (only behind the enzyme, though). At the end of the sentence (gene), there will be a DNA sequence which results in the dissociation of RNA polymerase enzyme from the gene. Then, the RNA will completely dissociate from the DNA as the DNA completely re-winds. Since our example is mRNA, the mRNA will travel to the ribosome where it will attach (complimentary base-pair) with one of the rRNA pieces within the structure of the ribosome. The particular mRNA sequence which associates with the ribosome's rRNA, is called the Shine/Dalgarno sequence - remember - this sequence is at the very beginning of the mRNA transcript. This attachment orients the mRNA properly, and as the ribosome physically "pulls" the mRNA, one word-at-a-time (three-letter word called a codon), a particular tRNA carrying an amino-acid associates (complimentary base-pairs) with the mRNA using its three-letter word (called an anti- codon), and amino-acids are linearly hooked together in the proper order followng the code that originated in the DNA of the gene, to form a protein of some sort. This action is called translation. The beginning of translation is at the translation start sequence, and at the end of the mRNA transcript is a sequence which when it arrives at the ribosome (like reaching the end of a rope pulled through your hand), causes the mRNA to dissociate from the ribosome. Thus, translation stops.
The expression of a gene may be regulated - that is - some genes are always
"on", and some are sometimes on and sometimes off. If the gene
is always on, it will always be expressed - and therefore - one of the
kinds of RNA will always be being made from it. Most genes are however,
regulated in expression. When they are, the gene will also contain a DNA
sequence named the operator region located after the promoter region, and
prior to the beginning of the transcription start sequence, which controls
whether or not RNA can
be made. In this simple case, a certain protein, called a regulatory
protein, may be attached to the operator region. Therefore, even though
RNA polymerase binds to the promoter as described above, the regulatory
protein is in the way, and RNA polymerase cannot move it out of the way.
In this case then, the gene would not be expressed. In order for this
gene to be expressed, something would need to happen which caused the
regulatory protein to dissociate from the DNA, and unblock the staircase.
Regulatory proteins have places on their surface to which a particular
substance within the cell can bind. Sometimes the binding of a
substance with a regulatory protein leads to association of the protein
with the operator region, and sometimes this binding leads to the
dissociation of a regulatory protein from the operator region.
In this way then, signals received at the outer surface of a
cell can be transmitted internally within the cell, and lead to the
appearance of a compound which can subsequently lead to association or
dissociation of regulatory proteins with or from the DNA, respectively,
and therefore allow a gene to be expressed or to not be expressed. This
kind of delicate balance occurs all of the time in each of our cells and
in all other living cells. It is an amazing and truly elegant system -
unbelievably beautiful.