~~~ What the Heck is a Gene? ~~~
Just think, every thought, every action, whether it be mental, physical or chemical that we experience, is initiated by gene expression within the cells of our brain and elsewhere, and the response of cells all over our body to externally-received and internally-transmitted signals which may also govern gene expression within these cells. Add to this scenario the three-dimensional abilities to reason, to comprehend, to read, to write, to talk, to see, to hear, to feel, to smell, to jump, to run, to walk, to cry, to laugh, to feel joy, to feel sadness, to feel hope, to love, to like, to appreciate patterns, colors, shapes, to sculpt, to draw, to sing. Wow! Life is extraordinary, unbelievable, unimaginable, a precious gift. Genes are good!

If one thinks of the genetic material within one of our cells as a storehouse of information, then the following analogies may be made: a local library can be used to represent all of the genetic information within a cell - all of the DNA (deoxyribonucleic acid). The books within the library can be used to represent the individual chromosomes within a cell. The sentences within the books can be used to represent the genes located along the length of each chromosome. Using these analogies, we'll discuss what a gene is, and how it works. The strict definition of a gene is:
a region of DNA within a chromosome which when expressed (transcribed), leads to the production of RNA (ribonucleic acid). The DNA within our cells contains the information for everything which occurs within each cell - every action, every substance made, every event, every response - everything! The very same thing holds true for single-cell organisms like a bacterium - the DNA within a bacterial cell has ALL of the information necessary for the life of the cell. The essence of a chromosome is the DNA. Humans have 46 chromosomes (46 books in the library), and bacteria have only one (one book - maybe that is why bacteria are so small - no, not really...). The DNA within a chromosome is double-stranded, and the two strands pair very specifically with one another to form a kind of braid known as the DNA helix. Each braid is made-up of individual substances called nucleotides (which themselves are made-up of a structure called a base, a phosphate molecule, and a kind of sugar molecule). The nucleotides are linked together in a long line - there are only four different kinds of DNA nucleotides, e.g., adenine [A], thymidine [T], guanine [G], and cytosine [C]. However, these compounds can be arranged within the line in any order, and repeats are also allowed. So, one chromosome may be made of hundreds of thousands of each of these four nucleotides linked together. The opposite braid is also made of these substances, but, there is a rule which is followed: in opposing braids (strands) A always pairs with T and C always pairs with G across the two braids (kind-of like a special zipper). Here is another analogy: imagine a spiral staircase. Each step represents two nucleotides which are paired across from one another - say, A with T, a kind of tongue-in-groove arrangement. The hand-rail section between posts on each side of the staircase represents individual phosphate molecules bonded to a next-in-line sugar molecule, and each of the posts which attach the hand-rail to the steps represent sugar molecules. Take a look at some electron microscope pictures of DNA: Analyses of DNA. Lots of it in there, isn't it?

The gene region (there may be thousands of genes on one chromosome) is accessible to proteins, and there are different kinds of RNA which can be generated from the DNA template through the action of these proteins. There are three major kinds of RNA which may be made - depending on the particular gene - these are: ribosomal RNA (rRNA), transfer RNA (tRNA), and messenger RNA (mRNA). Ribosomal RNA (many different kinds from many different rRNA genes) is part of the structure of the protein-synthesizing machinery inside a cell - called a ribosome. Transfer RNA (tRNA) consists of several different kinds also, each one of which "carries" a particular amino-acid to the ribosome for linkage with other amino-acids to form a protein. Messenger RNA (mRNA) consists of many, many different kinds, each of which contains information which allows the ribosome machinery to connect individual amino-acids in a precise order. If the order of connection of amino-acids is different, then a different protein will be made. We will focus on messenger RNA with regard to gene expression (the synthesis of RNA from a DNA template).

As was said before, the DNA of a gene is accessible - but - accessibility is usually regulated (you need a card to check-out a book from the library). In order to understand this regulation, we need to discuss some details of a gene. If one looked at the sequence of the nucleotides in one of the strands where the gene is located (only one-half of each step in the staircase), the sequence would appear to be just a bunch of nucleotides linked in some apparently random order. However, such is not the case. It is precisely this order which contains the critical information - the code. A gene is therefore much like a sentence in our language - a sentence, like a gene, has a beginning (start here) and an end (stop here) - there are keys which allow recognition. A sentence is made of words - as is a gene. In a gene the words are all only three letters-long, and there are only 4 letters in the alphabet. However, the words are arranged in a three-letter code which "says" - "put the amino-acid right here in this position." Because there are 4 different nucleotides, and any combination of three-letters is allowed, there are 64 possible (4-cubed) sets of three letters. Using our analogies, the local library is the DNA (all of the chromosomes). You go inside, show your card, and you are allowed to open a book (one of the chromosomes). You look at a sentence (gene), however, the sentence is written in French (DNA - you can recognize individual letters, but you do not understand the words). Luckily for you, there is a copy machine which miraculously converts the letters (DNA nucleotides) in the sentence to English (RNA) when the sentence is copied. Now, you (the ribosome and tRNA) can read and understand the text. The copy machine of the cell (RNA polymerase enzyme) reads the text (DNA), and generates a copy (mRNA) both readable and understandable by the ribosome, and the tRNA.

What are the parts of a gene? Let's look at a bacterial gene which encodes mRNA as an example. The parts are, from left to right: promoter, transcription start, translation start, open-reading-frame, translation stop, transcription stop. The promoter is comprised of: a DNA sequence 35 nucleotides prior to the beginning of transcription (-35) and a DNA sequence 10 nucleotides prior to the beginning of transcription (-10, also known as the Pribnow sequence in bacteria, and in eucaryotes, the Goldberg-Hogness or TATA box).
Here is how a protein called Catabolite Activator Protein [CAP - in blue] interacts with DNA (about position -50) near the promoter region of the bacterial gene which encodes the enzyme which degrades the sugar lactose into the two sugars, glucose and galactose. CAP requires association with the chemical cyclic-AMP in order to be able to bind to the DNA at this site (called the CAP DNA-binding site). When certain bacteria are grown in the absence of glucose, but the presence of lactose, one result is the generation inside the cell of a chemical called cyclic-AMP. When this compound binds to CAP, then CAP can bind to the DNA at position -50. As you can see, this interaction significantly distorts the DNA structure in this region. It is ONLY through these interactions, that the promoter is made accessible to the enzyme RNA polymerase, and transcription of the gene can begin. Thus, expression of what is called the Lac operon (a tandem multi-gene system under the control of a single promoter) is regulated. If you look at Bigger Pictures you will see some protein/DNA interactions.

The beginning of transcription (making a complimentary base-pair copy of the DNA into RNA) is called the +1 site. After the +1 site is a sequence called the Shine/Dalgarno sequence, followed by a 3-letter translation start sequence (TAC in the DNA); then, there is a long line of nucleotides in-a-row called the ORF (open-reading-frame) which is comprised of the three-letter "words." Near the end of the ORF is a 3-letter sequence (ATT, or ATC, or ACT) which is called the translation stop signal, and pretty much at the end of the gene is a sequence which causes the stop of transcription. Genes which encode rRNA or tRNA do not have a Shine/Dalgarno sequence, nor a translation start/stop sequence

The enzyme, RNA polymerase, has a structure which allows it to bind to the DNA, but only if the DNA sequence is appropriate - this appropriate region of the gene is called the promoter. The enzyme gets in-between the two strands, and stretches them apart (yep, they are flexible). Once the enzyme binds, because of the orientation of the promoter sequence, the enzyme will "face" in the correct direction (at the beginning of the sentence), and will begin to physically move along one of the strands (the sense strand) of the gene, unwinding the DNA helix as it moves. As it moves, the enzyme will "read" each letter of the DNA sequence (sentence) and will begin synthesis (complimentary base-pair copying) of the DNA into RNA at the correct starting place, following the same nucleotide-pairing rules (A with U and C with G - for some reason, T is never found in mRNA - but may be found in transfer-RNA). While the RNA is being made, behind where the enzyme is, the DNA will re-wind (each half of the DNA "steps" will re-pair) which forces the RNA to dissociate from the DNA (only behind the enzyme, though). At the end of the sentence (gene), there will be a DNA sequence which results in the dissociation of RNA polymerase enzyme from the gene. Then, the RNA will completely dissociate from the DNA as the DNA completely re-winds. Since our example is mRNA, the mRNA will travel to the ribosome where it will attach (complimentary base-pair) with one of the rRNA pieces within the structure of the ribosome. The particular mRNA sequence which associates with the ribosome's rRNA, is called the Shine/Dalgarno sequence - remember - this sequence is at the very beginning of the mRNA transcript. This attachment orients the mRNA properly, and as the ribosome physically "pulls" the mRNA, one word-at-a-time (three-letter word called a codon), a particular tRNA carrying an amino-acid associates (complimentary base-pairs) with the mRNA using its three-letter word (called an anti- codon), and amino-acids are linearly hooked together in the proper order followng the code that originated in the DNA of the gene, to form a protein of some sort. This action is called translation. The beginning of translation is at the translation start sequence, and at the end of the mRNA transcript is a sequence which when it arrives at the ribosome (like reaching the end of a rope pulled through your hand), causes the mRNA to dissociate from the ribosome. Thus, translation stops.

The expression of a gene may be regulated - that is - some genes are always "on", and some are sometimes on and sometimes off. If the gene is always on, it will always be expressed - and therefore - one of the kinds of RNA will always be being made from it. Most genes are however, regulated in expression. When they are, the gene will also contain a DNA sequence named the operator region located after the promoter region, and prior to the beginning of the transcription start sequence, which controls whether or not RNA can be made. In this simple case, a certain protein, called a regulatory protein, may be attached to the operator region. Therefore, even though RNA polymerase binds to the promoter as described above, the regulatory protein is in the way, and RNA polymerase cannot move it out of the way. In this case then, the gene would not be expressed. In order for this gene to be expressed, something would need to happen which caused the regulatory protein to dissociate from the DNA, and unblock the staircase. Regulatory proteins have places on their surface to which a particular substance within the cell can bind. Sometimes the binding of a substance with a regulatory protein leads to association of the protein with the operator region, and sometimes this binding leads to the dissociation of a regulatory protein from the operator region. In this way then, signals received at the outer surface of a cell can be transmitted internally within the cell, and lead to the appearance of a compound which can subsequently lead to association or dissociation of regulatory proteins with or from the DNA, respectively, and therefore allow a gene to be expressed or to not be expressed. This kind of delicate balance occurs all of the time in each of our cells and in all other living cells. It is an amazing and truly elegant system - unbelievably beautiful.


Book: Don't Touch That Doorknob!

Copyright John C. Brown, 1995

[ Top of Page | What the Heck?? | "Bugs" | My HomePage | KU Microbiology ]