Part I
Foundations of Biochemistry
Facing page:  Supernova SN 1987a (the bright “star” at the lower right) resulted
from the explosion of a blue supergiant star in the Large Magellanic Cloud, a galaxy
near the Milky Way. Energy released by nuclear explosions in such supernovae
brought about the fusion of simple atomic nuclei, forming the more complex elements
of which the earth, its atmosphere, and all living things are composed.
Fifteen to twenty billion years ago the universe arose with a cataclysmic explosion that hurled hot, energy-rich subatomic particles into all space. Within seconds, the simplest elements (hydrogen and helium) were formed. As the universe expanded and cooled, galaxies condensed under the influence of gravity. Within these galaxies, enormous stars formed and later exploded as supernovae, releasing the energy needed to fuse simpler atomic nuclei into the more complex elements. Thus were produced, over billions of years, the chemical elements found on earth today. Biochemistry asks how the thousands of different biomolecules formed from these elements interact with each other to confer the remarkable properties of living organisms.
In Part I we will summarize the biological and chemical background to biochemistry. Living organisms operate within the same physical laws that apply to all natural processes, and we begin by discussing those laws and several axioms that flow from them (Chapter 1). These axioms make up the molecular logic of life. They define the means by which cells transform energy to accomplish work, catalyze the chemical transformations that typify them, assemble molecules of great complexity from simpler subunits, form supramolecular complexes that are the machinery of life, and store and pass on the instructions for the assembly of all future generations of organisms from simple, nonliving precursors.
Cells, the units of all living organisms, share certain features; but the cells of different organisms, and the various cell types within a single organism, are remarkably diverse in structure and function. Chapter 2 is a brief description of the common features and the diverse specializations of cells, and of the evolutionary processes that lead to such diversity.
Nearly all of the organic compounds from which living organisms are constructed are products of biological activity. These biomolecules were selected during the course of biological evolution for their fitness in performing specific biochemical and cellular functions. The biomolecules can be characterized and understood in the same terms that apply to the molecules of inanimate matter: the types of bonds between atoms, the factors that contribute to bond formation and bond strength, the three-dimensional structure of molecules, and chemical reactivities. Three-dimensional structure is especially important in biochemistry; the specificity of biological interactions, such as those between enzyme and substrate, antibody and antigen, hormone and receptor, is achieved by close steric complementarity between molecules. Prominent among the forces that stabilize three-dimensional structure
are noncovalent interactions, individually weak but with significant cumulative effects on the structure of biological macromolecules. Chapter 3 provides the chemical basis for later discussions of the structure, catalysis, and metabolic interconversions of individual classes of biomolecules.
Water is the medium in which the first cells arose, and the solvent in which most biochemical transformations occur. The properties of water have shaped the course of evolution and exert a decisive influence on the structure of biomolecules in aqueous solution. Many of the weak interactions within and between biomolecules are strongly affected by the solvent properties of water. Even water-insoluble components of cells, such as membrane lipids, interact with each other in ways dictated by the polar properties of water. In Chapter 4 we consider the properties of water, the weak noncovalent interactions that occur in aqueous solutions of biomolecules, and the ionization of water and of solutes in aqueous solution.
These initial chapters are intended to provide a chemical backdrop for the later discussions of biochemical structures and reactions, so that whatever your background in chemistry or biology, you can immediately begin to follow, and to enjoy, the action.
Chapter 1
The Molecular Logic of Life
Living organisms are composed of lifeless molecules. When these molecules are isolated and examined individually, they conform to all the physical and chemical laws that describe the behavior of inanimate matter. Yet living organisms possess extraordinary attributes not shown by any random collection of molecules. In this chapter, we first consider the properties of living organisms that distinguish them from other collections of matter. After arriving at a broad definition of life, we can describe a set of principles that characterize all living organisms. These principles underlie the organization of organisms and the cells that make them up, and they provide the framework for this book. They will help you to keep the larger picture in mind while exploring the illustrative examples presented in the text.
What distinguishes all living organisms from all inanimate objects? First, they are structurally complicated and highly organized. They possess intricate internal structures (Fig. 1–1a) and contain many kinds of complex molecules. By contrast, the inanimate matter in our environment – clay, sand, rocks, seawater – usually consists of mixtures of relatively simple chemical compounds.
Second, living organisms extract, transform, and use energy from their environment (Fig. 1–1b), usually in the form of either chemical nutrients or the radiant energy of sunlight. This energy enables living organisms to build and maintain their own intricate structures and to do mechanical, chemical, osmotic, and other types of work. By contrast, inanimate matter does not use energy in a systematic way to maintain structure or to do work. Inanimate matter tends to decay toward a more disordered state, to come to equilibrium with its surroundings.
The third and most characteristic attribute of living organisms is the capacity for precise self-replication and self-assembly (Fig. 1–1c), a property that can be regarded as the quintessence of the living state. A single bacterial cell placed in a sterile nutrient medium can give rise to a billion identical “daughter” cells in 24 hours. Each of the cells contains thousands of different molecules, some extremely complex; yet each bacterium is a faithful copy of the original, constructed entirely from information contained within the genetic material of the original cell. By contrast, mixtures of inanimate matter show no capacity to grow and reproduce in forms identical in mass, shape, and internal structure, generation after generation.
Figure 1–1  Some characteristics of living matter. (a) Microscopic complexity and organization are apparent in this thin section of vertebrate muscle tissue, viewed with the electron microscope. (b) The lion uses organic compounds obtained by eating other animals to fuel intense bursts of muscular activity. The zebra derives energy from compounds in the plants it consumes; the plants derive their energy from sunlight. (c) Biological reproduction occurs with near-perfect fidelity.
Erwin Schrödinger
The ability to self-replicate has no true analog in the nonliving world, but there is an instructive analogy in the growth of crystals in saturated solutions. Crystallization produces more material identical in lattice structure with the original “seed” crystal. Crystals are much less complex than the simplest living organisms, and their structure is static, not dynamic as are living cells. Nonetheless, the ability of crystals to “reproduce” themselves led the physicist Erwin Schrödinger to propose in his famous essay “What Is Life?” that the genetic material of cells must have some of the properties of a crystal. Schrödinger’s 1944 notion (years before the modern understanding of gene structure was achieved) describes rather accurately some of the properties of deoxyribonucleic acid, the material of genes.
Each component of a living organism has a specific function. This is true not only of macroscopic structures such as leaves and stems or hearts and lungs, but also of microscopic intracellular structures such as the nucleus or chloroplast. Even individual chemical compounds in cells have specific functions. The interplay among the chemical components of a living organism is dynamic; changes in one component cause coordinating or compensating changes in another, with the result that the whole ensemble displays a character beyond that of the individual constituents. The collection of molecules carries out a program, the end result of which is the reproduction of the program and the self-perpetuation of that collection of molecules.
The molecules of which living organisms are composed conform to all the familiar laws of chemistry, but they also interact with each other in accordance with another set of principles, which we shall refer to collectively as the molecular logic of life. These principles do not involve new or as yet undiscovered physical laws or forces. Instead, they are a set of relationships characterizing the nature, function, and interactions of biomolecules.
If living organisms are composed of molecules that are intrinsically inanimate, how do these molecules confer the remarkable combination of characteristics we call life? How is it that a living organism appears to be more than the sum of its inanimate parts? Philosophers once answered that living organisms are endowed with a mysterious and divine life force, but this doctrine (vitalism) has been firmly rejected by modern science. The basic goal of the science of biochemistry is to determine how the collections of inanimate molecules that constitute living organisms interact with each other to maintain and perpetuate life. Although biochemistry yields important insights and practical applications in medicine, agriculture, nutrition, and industry, it is ultimately concerned with the wonder of life itself.
A massive oak tree, an eagle that soars above it, and a soil bacterium that grows among its roots appear superficially to have very little in common. However, a hundred years of biochemical research has revealed that living organisms are remarkably alike at the microscopic and chemical levels (Fig. 1–2). Biochemistry seeks to describe in molecular terms those structures, mechanisms, and chemical processes shared by all organisms and to discover the organizing principles that underlie life in all of its diverse forms.
Figure 1–2  Diverse living organisms share common chemical features. The eagle, the oak tree, the soil bacterium, and the human share the same basic structural units (cells), the same kinds of macromolecules (DNA, RNA, proteins) made up of the same kinds of monomeric subunits (nucleotides, amino acids), the same pathways for synthesis of cellular components, and the same genetic code and evolutionary ancestors.
Although there is a fundamental unity to life, it is important to recognize at the outset that very few generalizations about living organisms are absolutely correct for every organism under every condition. The range of habitats in which organisms live, from hot springs to Arctic tundra, from animal intestines to college dormitories, is matched by a correspondingly wide range of specific biochemical adaptations. These adaptations are integrated within the fundamental chemical framework shared by all organisms. Although generalizations are not perfect, they remain useful. In fact, exceptions often illuminate scientific generalizations.
Most of the molecular constituents of living systems are composed of carbon atoms covalently joined with other carbon atoms and with hydrogen, oxygen, or nitrogen. The special bonding properties of carbon permit the formation of a great variety of molecules. Organic compounds of molecular weight (Mr) less than about 500, such as amino acids, nucleotides, and monosaccharides, serve as monomeric subunits of proteins, nucleic acids, and polysaccharides, respectively. A single protein molecule may have 1,000 or more amino acids, and deoxyribonucleic acid has millions of nucleotides.
monomeric subunits, letters of English alphabet (26 different kinds), deoxyribonucleotides (4 different kinds), amino acids (20 different kinds), ordered linear sequences, English words, deoxyribonucleic acid (DNA), protein, for a segment of 8 subunits, the number of different sequences possible = 268 or 2.1 × 1011, 48 or 65,536, 208 or 2.56 × 1010
Each cell of the bacterium Escherichia coli (E. coli) contains more than 6,000 different kinds of organic compounds, including about 3,000 different proteins and a similar number of different nucleic acid molecules. In humans there may be tens of thousands of different kinds of proteins, as well as many types of polysaccharides (chains of simple sugars), a variety of lipids, and many other compounds of lower molecular weight.
To purify and to characterize thoroughly all of these molecules would be an insuperable task were it not for the fact that each class of macromolecules (proteins, nucleic acids, polysaccharides) is composed of a small, common set of monomeric subunits. These monomeric subunits can be covalently linked in a virtually limitless variety of sequences (Fig. 1–3), just as the 26 letters of the English alphabet can be arranged into a limitless number of words, sentences, or books.
Figure 1–3  Monomeric subunits in linear sequences can spell infinitely complex messages. The number of different sequences possible (S) depends on the number of different kinds of subunits (N) and the length of the linear sequence (L): S = NL. For polymers the size of proteins (L ≈ 1,000), S is very large, and for nucleic acids, for which L may be many millions, S is astronomical.
Deoxyribonucleic acids (DNA) are constructed from only four different kinds of simple monomeric subunits, the deoxyribonucleotides, and ribonucleic acids (RNA) are composed of just four types of ribonucleotides. Proteins are composed of 20 different kinds of amino acids. The eight kinds of nucleotides from which all nucleic acids are built and the 20 different kinds of amino acids from which all proteins are built are identical in all living organisms.
Most of the monomeric subunits from which all macromolecules are constructed serve more than one function in living cells. The nucleotides serve not only as subunits of nucleic acids, but also as energy-carrying molecules. The amino acids are subunits of protein molecules, and also precursors of hormones, neurotransmitters, pigments, and many other kinds of biomolecules.
From these considerations we can now set out some of the principles in the molecular logic of life:

All living organisms have the same kinds of monomeric subunits.

There are underlying patterns in the structure of biological macromolecules.

The identity of each organism is preserved by its possession of distinctive sets of nucleic acids and of proteins.

Energy is a central theme in biochemistry: cells and organisms depend upon a constant supply of energy to oppose the inexorable tendency in nature for decay to the lowest energy state. The synthetic reactions that occur within cells, like the synthetic processes in any factory, require the input of energy. Energy is consumed in the motion of a bacterium or an Olympic sprinter, in the flashing of a firefly or the electrical discharge of an eel. The storage and expression of information cost energy, without which structures rich in information inevitably become disordered and meaningless. Cells have evolved highly efficient mechanisms for capturing the energy of sunlight, or extracting the energy of oxidizable fuels, and coupling the energy thus obtained to the many energy-consuming processes they carry out.
In the course of biological evolution, one of the first developments must have been an oily membrane that enclosed the water-soluble molecules of the primitive cell, segregating them and allowing them to accumulate to relatively high concentrations. The molecules and ions contained within a living organism differ in kind and in concentration from those in the organism’s surroundings. The cells of a freshwater fish contain certain inorganic ions at concentrations far different from those in the surrounding water (Fig. 1–4). Proteins, nucleic acids, sugars, and fats are present in the fish but essentially absent from the surrounding water, which instead contains carbon, hydrogen, and oxygen atoms only in simpler molecules such as carbon dioxide and water. When the fish dies, its contents eventually come to equilibrium with those of its surroundings.
[K+]fish > [K+]lake, [Na+]fish > [Na+]lake, [Cl]fish > [Cl]lake, [K+]fish = [K+]lake, [Na+]fish = [Na+]lake, [Cl]fish = [Cl]lake, K+, Na+, Cl, DNA, RNA, protein, lipids, etc., monomeric subunits, NH3, CO2, HPO42−
Figure 1–4  Living organisms are not at equilibrium with their surroundings. Death and decay restore the equilibrium. During growth, energy from food is used to build complex molecules and to concentrate ions from the surroundings. When the organism dies, it loses its ability to derive energy from food. Without energy, the dead body cannot maintain concentration gradients; ions leak out. Inexorably, macromolecular components decay to simpler compounds. These simple compounds serve as nutritional sources for phytoplankton, which are then eaten by larger organisms. (By convention, square brackets denote concentration – in this case, of ionic species.)
Figure 1–5  A dynamic steady state results when the rate of appearance of a cellular component is exactly matched by the rate of its disappearance. In (a), a protein (hemoglobin) is synthesized, then degraded. In (b), glucose derived from food (or from carbohydrate stores) enters the bloodstream in some tissues (intestine, liver), then leaves the blood to be consumed by metabolic processes in other tissues (heart, brain, skeletal muscle). In this scheme, r1, r2, etc., represent the rates of the various processes. The dynamic steady-state concentrations of hemoglobin and glucose are maintained by complex mechanisms regulating the relative rates of the processes shown here.
Although the chemical composition of an organism may be almost constant through time, the population of molecules within a cell or organism is far from static. Molecules are synthesized and then broken down by continuous chemical reactions, involving a constant flux of mass and energy through the system. The hemoglobin molecules carrying oxygen from your lungs to your brain at this moment were synthesized within the past month; by next month they will have been degraded and replaced with new molecules. The glucose you ingested with your most recent meal is now circulating in your bloodstream; before the day is over these particular glucose molecules will have been converted into something else, such as carbon dioxide or fat, and will have been replaced with a fresh supply of glucose. The amount of hemoglobin and glucose in the blood remains nearly constant because the rate of synthesis or intake of each just balances the rate of its breakdown, consumption, or conversion into some other product (Fig. 1–5). The constancy of concentration does not, therefore, reflect chemical inertness of the components, but is rather the result of a dynamic steady state.
precursors (20 amino acids), synthesis, r1 → hemoglobin (in erythrocyte) → degradation, r2, breakdown products (20 amino acids), when r1 = r2, the concentration of hemoglobin is constant, food (carbohydrates), ingestion, r1 → glucose (in blood) → utilization, r2, r3, r4, waste CO2, storage fats, other products, when r1 = r2 + r3 + r4, the concentration of glucose in blood is constant
Living cells and organisms must perform work to stay alive and to reproduce themselves. The continual synthesis of cellular components requires chemical work; the accumulation and retention of salts and various organic compounds against a concentration gradient involves osmotic work; and the contraction of a muscle or the motion of a bacterial flagellum represents mechanical work. Biochemistry examines the processes by which energy is extracted, channeled, and consumed, so it is essential to develop an understanding of the fundamental principles of bioenergetics.
Consider the simple mechanical example shown in Figure 1–6. An object at the top of an inclined plane has a certain amount of potential energy as a result of its elevation. It tends spontaneously to slide down the plane, losing its potential energy of position as it approaches the ground. When an appropriate string-and-pulley device is attached to the object, the spontaneous downward motion can accomplish a certain
Figure 1–6  (Top) The downward motion of an object releases potential energy that can do work. The potential energy made available by spontaneous downward motion (an exergonic process, represented by the pink box) can be coupled to the upward movement of another object (an endergonic process, represented by the blue box). (Bottom) A spontaneous (exergonic) chemical reaction (B→C) releases free energy, which can pull or drive an endergonic reaction (A→B) when the two reactions share a common intermediate, B. The exergonic reaction B→C has a large, negative free-energy change (ΔGB→C), and the endergonic reaction A→B has a smaller, positive free-energy change (ΔGA→B). The free-energy change for the overall reaction A→C is the arithmetic sum of these two values (ΔGA→C). Because the value of ΔGA→C is negative, the overall reaction is exergonic and proceeds spontaneously.
mechanical example, work done in raising object, loss of potential energy of position, chemical example, free energy, G, reaction coordinate, A → B → C, ΔGA→B (positive), ΔGA→C (negative), ΔGB→C (negative), endergonic, exergonic
amount of work, an amount never greater than the change in potential energy of position. The amount of energy actually available to do work (called the free energy) will always be somewhat less than the total change in energy, because some energy is dissipated as the heat of friction. The greater the elevation of the object relative to its final position, the greater the change in energy as it slides downward, and the greater the amount of work that can be accomplished.
In the chemical analog of this mechanical example (Fig. 1–6, bottom), a reactant, B, is converted into a product, C. The compounds B and C each contain a certain amount of potential energy, related to the kind and number of bonds in each type of molecule. This energy is analogous to the potential energy in an elevated object. Some of the energy is available to do work when B is converted into C by a chemical reaction that involves no change in temperature or pressure. This portion of the energy, the free energy, is designated G (for J. Willard Gibbs, who developed much of the theory of chemical energetics), and the change in free energy during the conversion of B to C is ΔG.
We can define a system as all of the reactants and products, the solvent, and the immediate atmosphere – in short, everything within a defined region of space. The system and its surroundings together constitute the universe. If the system exchanges neither matter nor energy with its surroundings, it is said to be closed. The magnitude of the free-energy change for a process proceeding toward equilibrium depends upon how far from equilibrium the system was in its initial state. In the mechanical example, no spontaneous sliding will occur once the object has reached the ground; the object is then at equilibrium with its surroundings, and the free-energy change for sliding along the horizontal surface is zero.
In chemical reactions in closed systems, the process also proceeds spontaneously until equilibrium is reached. The free-energy changeG) for a chemical reaction is a quantitative expression of how far the system is from chemical equilibrium. Reactions that proceed with the release of free energy are exergonic, and because the products of such reactions have less free energy than the reactants, ΔG is negative. Chemical reactions in which the products have more free energy than the reactants are endergonic, and for these reactions ΔG is positive. When all of the chemical species in the system are at equilibrium, the free-energy change for the reaction is zero, and no further net conversion of reactants into products will occur without the input of energy or matter from outside the system.
As in the mechanical example, some of the energy released in a spontaneous process can accomplish work – chemical work in this case. In living systems, as in mechanical processes, part of the total energy change in the chemical reaction is unavailable to accomplish work. Some is dissipated as heat, and some is lost as entropy, a measure of energy due to randomness, which we will define more rigorously later.
How is free energy from a chemical reaction channeled into energy-requiring processes in living organisms? In the mechanical example in Figure 1–6, it is clear that if one sliding object is coupled to another object on another inclined plane, the energy released by the spontaneous downward sliding of one may be harnessed to produce upward motion of the other, a motion that cannot occur spontaneously. This is a direct analogy to a biochemical process in which the energy released in an exergonic chemical reaction can be used to drive another reaction that is endergonic and would not proceed spontaneously. The reactions
in this system are coupled because the product of one (compound B) is a reactant in the other. This coupling of an exergonic reaction with an endergonic one is absolutely central to the free-energy exchanges that occur in all living systems. In biological energy coupling, the simultaneous occurrence of two reactions is not enough. The two reactions must be coupled in the sense of Figure 1–6 (bottom); the two reactions share an intermediate, B.
A living organism is an open system; it exchanges both matter and energy with its surroundings. Living organisms use either of two strategies to derive free energy from their surroundings: (1) they take up chemical components from the environment (fuels), extract free energy by means of exergonic reactions involving these fuels, and couple these reactions to endergonic reactions; or (2) they use energy absorbed from sunlight to bring about exergonic photochemical reactions, to which they couple endergonic reactions.

Living organisms create and maintain their complex, orderly structures at the expense of free energy from their environment.

Exergonic chemical or photochemical reactions are coupled to endergonic processes through shared chemical intermediates, channeling the free energy to do work.

Figure 1–7  During metabolic transductions, entropy increases as the potential energy of complex nutrient molecules decreases. Living organisms (a) extract energy from their environment, (b) convert some of it into useful forms of energy to produce work, and (c) return some energy to the environment as heat, together with end-product molecules that are less well organized than the starting fuel, increasing the entropy of the universe.
The first law of thermodynamics, developed from physics and chemistry but fully valid for biological systems as well, describes the energy conservation principle:

In any physical or chemical change, the total amount of energy in the universe remains constant, although the form of the energy may change.

Not until the nineteenth century did physicists discover that energy can be transduced (converted from one form to another), yet living cells have been using that principle for eons. Cells are consummate transducers of energy, capable of interconverting chemical, electromagnetic, mechanical, and osmotic energy with great efficiency (Fig. 1–7). Biological energy transducers differ from many familiar machines that depend on temperature or pressure differences. The steam engine, for example, converts the chemical energy of fuel into heat, raising the temperature of water to its boiling point to produce steam pressure that drives a mechanical device. The internal combustion engine, similarly, depends upon changes in temperature and pressure. By contrast, all parts of a living organism must operate at about the same temperature and pressure, and heat flow is therefore not a useful source of energy. Cells are isothermal, or constant-temperature, systems.

Living cells are chemical engines that function at constant temperature.

potential energy, nutrients in environment (complex molecules such as sugars, fats), sunlight, energy transductions, chemical transformations within cells → cellular work: chemical synthesis, mechanical work, osmotic and electrical gradients, light production, genetic information transfer, entropy increase, metabolic end products (simple molecules such as CO2, H2O), heat
thermonuclear fusion, 4H → 4He + positrons + electromagnetic radiation (light), photons of visible light
Figure 1–8  Sunlight is the ultimate source of all biological energy. Thermonuclear reactions in the sun produce energy that is transmitted to the earth as light and converted into chemical energy by plants and certain microorganisms.
Virtually all of the energy transductions in cells can be traced to a flow of electrons from one molecule to another, in the oxidation of fuel or in the trapping of light energy during photosynthesis. This electron flow is “downhill”, from higher to lower electrochemical potential; as such, it is formally analogous to the flow of electrons in an electric circuit driven by an electrical battery. Nearly all living organisms derive their energy, directly or indirectly, from the radiant energy of sunlight, which arises from the thermonuclear fusion reactions that form helium in the sun (Fig. 1–8). Photosynthetic cells absorb the sun’s radiant energy and use it to drive electrons from water to carbon dioxide, forming energy-rich products such as starch and sucrose. In doing so, most photosynthetic organisms release molecular oxygen into the atmosphere. Ultimately, nonphotosynthetic organisms obtain energy for their needs by oxidizing the energy-rich products of photosynthesis, passing electrons to atmospheric oxygen to form water, carbon dioxide, and other end products, which are recycled in the environment. All of these reactions involving electron flow are oxidation–reduction reactions. Thus, other principles of the living state emerge:

The energy needs of virtually all organisms are provided, directly or indirectly, by solar energy.

The flow of electrons in oxidation–reduction reactions underlies energy transduction and energy conservation in living cells.

All living organisms are dependent on each other through exchanges of energy and matter via the environment.

free energy, G, reactants (A), activation barrier (transition state, ‡), ΔGuncat, ΔGcat, ΔG, products (B), reaction coordinate (A → B)
Figure 1–9  The energetic course of a chemical reaction. A high activation barrier, representing the transition state, must be overcome in the conversion of reactants (A) into products (B), even though the products are more stable than the reactants – as indicated by a large, negative free-energy change (ΔG). The energy required to overcome the activation barrier is the activation energy (ΔG). Enzymes catalyze reactions by lowering the activation barrier. They bind the transition-state intermediates tightly, and the binding energy of this interaction effectively reduces the activation energy from ΔGuncat to ΔGcat. (Note that the activation energy is unrelated to the free-energy change of the reaction, ΔG.)
The fact that a reaction is exergonic does not mean that it will necessarily proceed rapidly. The reaction coordinate diagram in Figure 1–6 (bottom) is actually an oversimplification. The path from reactant to product almost invariably involves an energy barrier, called the activation barrier (Fig. 1–9), that must be surmounted for any reaction to occur. The breaking and joining of bonds generally requires the prior bending or stretching of existing bonds, creating a transition state of higher free energy than either reactant or product. The highest point in the reaction coordinate diagram represents the transition state.
Activation barriers are crucial to the stability of biomolecules in living systems. Although, when isolated from other cellular components, most biomolecules are stable for days or even years, inside cells they often undergo chemical transformations within milliseconds. Without activation barriers, biomolecules within cells would rapidly break down to simple, low-energy forms. The lifetime of complex molecules would be very short, and the extraordinary continuity and organization of life would be impossible.
Virtually every cellular chemical reaction occurs because of enzymes – catalysts that are capable of greatly enhancing the rate of specific chemical reactions without being consumed in the process (Fig. 1–10). Enzymes, as catalysts, act by lowering this energy barrier between reactant and product. The activation energyG; Fig. 1–9) required to overcome this energy barrier could in principle be supplied by heating the reaction mixture, but this option is not available in living cells. Instead, during a reaction, enzymes bind reactant molecules in the transition state, thereby lowering the activation energy and enormously accelerating the rate of the reaction. The relationship between the activation energy and reaction rate is exponential; a small decrease in ΔG results in a very large increase in reaction rate. Enzyme-catalyzed reactions commonly proceed at rates up to 1010- to 1014-fold greater than the uncatalyzed rates.
amount of product formed (B), with enzyme, reaction: A → B, without enzyme (uncatalyzed), time
Figure 1–10  An enzyme increases the rate of a specific chemical reaction. In the presence of an enzyme specific for the conversion of reactant A into product B, the rate of the reaction may increase a millionfold or more over that of the uncatalyzed reaction. The enzyme is not consumed in the process; one enzyme molecule can act repeatedly to convert many molecules of A to B.
Enzymes are, with a few exceptions we will consider later, proteins. Each enzyme protein is specific for the catalysis of a specific reaction, and each reaction in a cell is catalyzed by a different enzyme. Thousands of different types of enzymes are therefore required by each cell. The multiplicity of enzymes, their high specificity for reactants, and their susceptibility to regulation give cells the capacity to lower activation barriers selectively. This selectivity is crucial in the effective regulation of cellular processes.
The thousands of enzyme-catalyzed chemical reactions in cells are functionally organized into many different sequences of consecutive reactions called pathways, in which the product of one reaction becomes the reactant in the next (Fig. 1–11). Some of these sequences of enzyme-catalyzed reactions degrade organic nutrients into simple end products, in order to extract chemical energy and convert it into a form useful to the cell. Together these degradative, free-energy-yielding reactions are designated catabolism. Other enzyme-catalyzed pathways start from small precursor molecules and convert them to progressively larger and more complex molecules, including proteins and nucleic acids; such synthetic pathways invariably require the input of energy, and taken together represent anabolism. The network of enzyme-catalyzed pathways constitutes cellular metabolism.
Figure 1–11  An example of a typical synthetic (anabolic) pathway. In the bacterium E. coli, threonine is converted to isoleucine in five steps, each catalyzed by a separate enzyme. (Only the main reactants and products are shown here.) Threonine, in turn, was synthesized from a simpler precursor. Both threonine and isoleucine are precursors of much larger and more complex molecules: the proteins. (The letters A to F correspond to those in Fig. 1–14.)
threonine (A), enzyme 1 → α-ketobutyrate (B), enzyme 2 → α-aceto-α-hydroxybutyrate (C), enzyme 3 → α,β-dihydroxy-β-methylvalerate (D), enzyme 4 → α-keto-β-methylvalerate (E), enzyme 5 → isoleucine (F)
Figure 1–12  (a) Structural formula and (b) ball-and-stick model for adenosine triphosphate (ATP). The removal of the terminal phosphate of ATP is highly exergonic, and this reaction is coupled to many endergonic reactions in the cell.
Figure 1–13  ATP is the chemical intermediate linking energy-releasing to energy-requiring cell processes. Its role in the cell is analogous to that of money in an economy: it is “earned/produced” in exergonic reactions and “spent/consumed” in endergonic ones.
Cells capture, store, and transport free energy in a chemical form. Adenosine triphosphate (ATP) (Fig. 1–12) functions as the major carrier of chemical energy in all cells. ATP carries energy between metabolic pathways by serving as the shared intermediate that couples endergonic reactions to exergonic ones. The terminal phosphate group of ATP is transferred to a variety of acceptor molecules, which are thereby activated for further chemical transformation. The adenosine diphosphate (ADP) that remains after the phosphate transfer is recycled to become ATP, at the expense of either chemical energy (during oxidative phosphorylation) or solar energy in photosynthetic cells (by the process of photophosphorylation). ATP is the major connecting link (the shared intermediate) between the catabolic and anabolic networks of enzyme-catalyzed reactions in the cell (Fig. 1–13).
These linked networks of enzyme-catalyzed reactions are virtually identical in all living organisms.
stored nutrients, ingested foods, solar photons, catabolic reaction pathways (exergonic), ADP + HPO42−, ATP, CO2, NH3, H2O, simple products, precursors, anabolic reaction pathways (endergonic), osmotic work, mechanical work, complex biomolecules, other cellular work
Not only can living cells simultaneously synthesize thousands of different kinds of carbohydrate, fat, protein, and nucleic acid molecules and their simpler subunits, they can also do so in the precise proportions required by the cell. For example, when rapid cell growth occurs, the precursors of proteins and nucleic acids must be made in large quantities, whereas in nongrowing cells the requirement for these precursors is much reduced. Key enzymes in each metabolic pathway are regulated so that each type of precursor molecule is produced in a quantity appropriate to the current requirements of the cell. Consider the pathway shown in Figure 1–14 (see also Fig. 1–11), which leads to the synthesis of isoleucine (one of the amino acids, the monomeric subunits of proteins). If a cell begins to produce more isoleucine than is needed for protein synthesis, the unused isoleucine accumulates. High concentrations of isoleucine inhibit the catalytic activity of the first enzyme in the pathway, immediately slowing the production of the amino acid. Such negative feedback keeps the production and utilization of each metabolic intermediate in balance.
threonine, A, enzyme 1 → B → C → D → E → F, isoleucine
Figure 1–14  Regulation of a biosynthetic pathway by feedback inhibition. In the pathway by which isoleucine is formed in five steps from threonine (Fig. 1–11), the accumulation of the product isoleucine (F) causes inhibition of the first reaction in the pathway by binding to the enzyme catalyzing this reaction and reducing its activity. (The letters A to F represent the corresponding compounds shown in Fig. 1–11.)
Living cells also regulate the synthesis of their own catalysts, the enzymes. Thus a cell can switch off the synthesis of an enzyme required to make a given product whenever that product is available ready-made in the environment. These self-adjusting and self-regulating properties allow cells to maintain themselves in a dynamic steady state, despite fluctuations in the external environment.

Living cells are self-regulating chemical engines, adjusted for maximum economy.

The continued existence of a biological species requires that its genetic information be maintained in a stable form and, at the same time, expressed with very few errors. Effective storage and accurate expression of the genetic message defines individual species, distinguishes them from one another, and assures their continuity over successive generations.
Among the seminal discoveries of twentieth-century biology are the chemical nature and the three-dimensional structure of the genetic material, DNA. The sequence of deoxyribonucleotides in this linear polymer encodes the instructions for forming all other cellular components and provides a template for the production of identical DNA molecules to be distributed to progeny when a cell divides.
Perhaps the most remarkable of all the properties of living cells and organisms is their ability to reproduce themselves with nearly perfect fidelity for countless generations. This continuity of inherited traits implies constancy, over thousands or millions of years, in the structure of the molecules that contain the genetic information. Very few historical records of civilization, even those etched in copper or carved in stone, have survived for a thousand years (Fig. 1–15). But there is good evidence that the genetic instructions in living organisms have remained nearly unchanged over very much longer periods; many bacteria have nearly the same size, shape, and internal structure and contain the same kinds of precursor molecules and enzymes as those that lived a billion years ago.
Figure 1–15  Two ancient scripts. (a) The Prism of Sennacherib, inscribed in about 700 B.C., describes in characters of the Assyrian language some historical events during the reign of King Sennacherib. The Prism contains about 20,000 characters, weighs about 50 kg, and has survived almost intact for about 2,700 years. (b) The single DNA molecule of the bacterium E. coli, seen leaking out of a disrupted cell, is hundreds of times longer than the cell itself and contains all of the encoded information necessary to specify the cell’s structure and functions. The bacterial DNA contains about 10 million characters (nucleotides), weighs less than 10−10 g, and has undergone only relatively minor changes during the past several million years. The black spots and white specks are artifacts of the preparation.
Hereditary information is preserved in DNA, a long, thin organic polymer so fragile that it will fragment from the shear forces arising in a solution that is stirred or pipetted. A human sperm or egg, carrying the accumulated hereditary information of millions of years of evolution, transmits these instructions in the form of DNA molecules, in which the linear sequence of covalently linked nucleotide subunits encodes the genetic message.
The capacity of living cells to preserve their genetic material and to duplicate it for the next generation results from the structural complementarity between the two halves of the DNA molecule (Fig. 1–16). The basic unit of DNA is a linear polymer of four different monomeric subunits, deoxyribonucleotides (see Fig. 1–3), arranged in a precise linear sequence. It is this linear sequence that encodes the genetic information. Two of these polymeric strands are twisted about each other to form the DNA double helix, in which each monomeric subunit in one strand pairs specifically with the complementary subunit in the opposite strand. In the enzymatic replication or repair of DNA, one of the two strands serves as a template for the assembly of another, structurally complementary DNA strand. Before a cell divides, the two DNA strands separate and each serves as a template for the synthesis of a complementary strand, generating two identical double-helical molecules, one for each daughter cell. If one strand is damaged, continuity of information is assured by the information present on the other strand.
strand 1, strand 2,
old strand 1, new strand 2, new strand 1, old strand 2
Figure 1–16  The complementary structure of double-stranded DNA accounts for its accurate replication. DNA is a linear polymer of four subunits, the deoxyribonucleotides deoxyadenylate (A), deoxyguanylate (G), deoxycytidylate (C), and deoxythymidylate (T), joined covalently. Each nucleotide has the intrinsic ability, due to its precise three-dimensional structure, to associate very specifically but noncovalently with one other nucleotide: A always associates with its complement T, and G with its complement C. In the double-stranded DNA molecule, the sequence of nucleotides in one strand is complementary to the sequence in the other; wherever G occurs in strand 1, C occurs in strand 2; wherever A occurs in strand 1, T occurs in strand 2. The two strands of the DNA, held together by a large number of hydrogen bonds (represented here by vertical blue lines) between the pairs of complementary nucleotides, twist about each other to form the DNA double helix. In DNA replication, prior to cell division, the two strands of the original DNA separate and two new strands are synthesized, each with a sequence complementary to one of the original strands. The result is two double-helical DNA molecules, each identical to the original DNA.

Genetic information is encoded in the linear sequence of four kinds of subunits of DNA.

The double-helical DNA molecule contains an internal template for its own replication and repair.

Despite the near-perfect fidelity of genetic replication, infrequent, unrepaired mistakes in the replication process produce changes in the nucleotide sequence of DNA, representing a genetic mutation (Fig. 1–17). Incorrectly repaired damage to one of the DNA strands has the same effect. Mutations can change the instructions for producing cellular components. Many mutations are deleterious or even lethal to the organism; they may, for example, cause the synthesis of a defective enzyme that is not able to catalyze an essential metabolic reaction.
mutation 1, mutation 2, mutation 3, mutation 4, mutation 5, mutation 6
Figure 1–17  The gradual accumulation of mutations over long periods of time results in new biological species, each with a unique DNA sequence. At top is shown a short segment of a gene in a hypothetical progenitor organism. With the passage of time, changes in nucleotide sequence (mutations, indicated here by colored boxes) occur, one at a time, resulting in progeny with different DNA sequences. These mutant progeny themselves undergo occasional mutations, yielding their own progeny differing by two or more nucleotides from the original sequence.
Occasionally the mutation better equips an organism or cell to survive in its environment. The mutant enzyme might, for example, have acquired a slightly different specificity, so that it is now able to use as a reactant some compound that the cell was previously unable to metabolize. If a population of cells were to find itself in an environment where that compound was the only available source of fuel, the mutant cell would have an advantage over the other, unmutated (wild-type) cells in the population. The mutant cell and its progeny would survive in the new environment, whereas wild-type cells would starve and be eliminated.
Chance genetic variations in individuals in a population, combined with natural selection (survival of the fittest individuals in a challenging or changing environment), have resulted in the evolution of an enormous variety of organisms, each adapted to life in a particular ecological niche.
Carolus Linnaeus
Charles Darwin
Biochemistry has confirmed and greatly extended evolutionary theory. Carolus Linnaeus recognized the anatomic similarities and differences among living organisms and provided a framework for assessing the relatedness of different species. Charles Darwin gave us a unifying hypothesis to explain the phylogeny of modern organisms – the origin of different species from a common ancestor. Biochemistry has begun to reveal the molecular anatomy of cells of different species – the sequences of subunits in nucleic acids and proteins and the three-dimensional structures of individual molecules of nucleic acid and protein. There is a reasonable prospect that when the twenty-first century dawns, we will know the entire nucleotide sequence of all of the genes that make up the biological heritage of a human.
At the molecular level, evolution is the emergence over time of different sequences of nucleotides within genes. With new genetic sequences being experimentally determined almost daily, biochemists have an enormously rich treasury of evidence with which to analyze evolutionary relationships and to refine evolutionary theory. The molecular phylogeny derived from gene sequences is consistent with, but in many cases more precise than, the classical phylogeny based on macroscopic structures.
Molecular structures and mechanisms have been conserved in evolution even though organisms have continuously diverged at the level of gross anatomy. At the molecular level, the basic unity of life is readily apparent; crucial molecular structures and mechanisms are remarkably similar from the simplest to the most complex organisms. Biochemistry makes it possible to discover the unifying features common to all life. This book examines many of these features: the mechanisms for energy conservation, biosynthesis, gene replication, and gene expression.
The information in DNA is encoded as a linear (one-dimensional) sequence of the nucleotide units of DNA, but the expression of this information results in a three-dimensional cell. This change from one to three dimensions occurs in two phases. A linear sequence of deoxyribonucleotides in DNA codes (through the intermediary, RNA) for the production of a protein with a corresponding linear sequence of amino acids (Fig. 1–18). The protein folds itself into a particular three-dimensional shape, dictated by its amino acid sequence. The precise three-dimensional structure (native conformation) is crucial to the protein’s function as either catalyst or structural element. This principle emerges:
gene 1, gene 2, gene 3,
transcription of DNA sequence into RNA sequence → RNA 1, RNA 2, RNA 3,
translation on the ribosome of RNA sequence into protein sequence and folding of protein
into native conformation → protein 1, protein 2, protein 3 → formation of
supramolecular complex
Figure 1–18  Linear sequences of deoxyribonucleotides in DNA, arranged into units known as genes, are transcribed into ribonucleic acid (RNA) molecules with complementary ribonucleotide sequences. The RNA sequences are then translated into linear protein chains, which fold spontaneously into their native three-dimensional shapes. Individual proteins sometimes associate with other proteins to form supramolecular complexes, stabilized by numerous weak interactions.

The linear sequence of amino acids in a protein leads to the acquisition of a unique three-dimensional structure by a self-assembly process.

Once a protein has folded into its native conformation, it may associate noncovalently with other proteins, or with nucleic acids or lipids,
to form supramolecular complexes such as chromosomes, ribosomes, and membranes (Fig. 1–18). These complexes are in many cases self-assembling. The individual molecules of these complexes have specific, high-affinity binding sites for each other, and within the cell they spontaneously form functional complexes.

Individual macromolecules with specific affinity for other macromolecules self-assemble into supramolecular complexes.

The forces that provide stability and specificity to the three-dimensional structures of macromolecules and supramolecular complexes are mostly noncovalent interactions. These interactions, individually weak but collectively strong, include hydrogen bonds, ionic interactions among charged groups, van der Waals interactions, and hydrophobic interactions among nonpolar groups. These weak interactions are transient; individually they form and break in small fractions of a second. The transient nature of noncovalent interactions confers a flexibility on macromolecules that is critical to their function. Furthermore, the large number of noncovalent interactions in a single macromolecule makes it unlikely that at any given moment all the interactions will be broken; thus macromolecular structures are stable over time.

Three-dimensional biological structures combine the properties of flexibility and stability.

The flexibility and stability of the double-helical structure of DNA are due to the complementarity of its two strands and the many weak interactions between them. The flexibility of these interactions allows strand separation during DNA replication (see Fig. 1–16); the complementarity of the double helix is essential to genetic continuity.
Noncovalent interactions are also central to the specificity and catalytic efficiency of enzymes. Enzymes bind transition-state intermediates through numerous weak but precisely oriented interactions. Because the weak interactions are flexible, the complex survives the structural distortions as the reactant is converted into product.
The formation of noncovalent interactions provides the energy for self-assembly of macromolecules by stabilizing native conformations relative to unfolded, random forms. The native conformation of a protein is that in which the energetic advantages of forming weak interactions counterbalance the tendency of the protein chain to assume random forms. Given a specific linear sequence of amino acids and a specific set of conditions (temperature, ionic conditions, pH), a protein will assume its native conformation spontaneously, without a template or scaffold to direct the folding.
We can now summarize the various principles of the molecular logic of life:

A living cell is a self-contained, self-assembling, self-adjusting, self-perpetuating isothermal system of molecules that extracts free energy and raw materials from its environment.

The cell carries out many consecutive reactions promoted by specific catalysts, called enzymes, which it produces itself.

The cell maintains itself in a dynamic steady state, far from equilibrium with its surroundings. There is great economy of parts and processes, achieved by regulation of the catalytic activity of key enzymes.

Self-replication through many generations is ensured by the self-repairing, linear information-coding system. Genetic information encoded as sequences of nucleotide subunits in DNA and RNA specifies the sequence of amino acids in each distinct protein, which ultimately determines the three-dimensional structure and function of each protein.

Many weak (noncovalent) interactions, acting cooperatively, stabilize the three-dimensional structures of biomolecules and supramolecular complexes.

At no point in our examination of the molecular logic of living cells have we encountered any violation of known physical laws; nor have we needed to define new physical laws. The organic machinery of living cells functions within the same set of laws that governs the operation of inanimate machines, but the chemical reactions and regulatory processes of cells have been highly refined during evolution.
This set of principles has been most thoroughly validated in studies of unicellular organisms (such as the bacterium E. coli), which are exceptionally amenable to biochemical and genetic study. Although multicellular organisms must solve certain problems not encountered by unicellular organisms, such as the differentiation of the fertilized egg into specialized cell types, the same principles have been found to apply. Can such simple and mechanical statements apply to humans as well, with their extraordinary capacity for thought, language, and creativity? The pace of recent biochemical progress toward understanding such processes as gene regulation, cellular differentiation, communication among cells, and neural function has been extraordinarily fast, and is accelerating. The success of biochemical methods in solving and redefining these problems justifies the hope that the most complex functions of the most highly developed organism will eventually be explicable in molecular terms.
The relevant facts of biochemistry are many; the student approaching this subject for the first time may occasionally feel overwhelmed. Perhaps the most encouraging development in twentieth-century
biology is the realization that, for all of the enormous diversity in the biological world, there is a fundamental unity and simplicity to life. The organizing principles, the biochemical unity, and the evolutionary perspective of diversity, provided at the molecular level, will serve as helpful frames of reference for the study of biochemistry.
Further Reading
Asimov, I. (1962) Life and Energy: An Exploration of the Physical and Chemical Basis of Modern Biology, Doubleday & Co., Inc., New York. 
An engaging account of the role of energy transformations in biology, written for the intelligent layman.
Blum, H.F. (1968) Time’s Arrow and Evolution, 3rd edn, Princeton University Press, Princeton, NJ. 
An excellent discussion of the way the second law of thermodynamics has influenced biological evolution.
Dulbecco, R. (1987) The Design of Life, Yale University Press, New Haven, CT. 
An unusual and excellent introduction to biology.
Fruton, J.S. (1972) Molecules and Life. Historical Essays on the Interplay of Chemistry and Biology, Wiley-Interscience, New York. 
This series of essays describes the development of biochemistry from Pasteur’s studies of fermentation to the present studies of metabolism and information transfer. You may want to refer to these essays through this textbook.
Fruton, J.S. (1992) A Skeptical Biochemist, Harvard University Press, Cambridge, MA. 
Hawking, S. (1988) A Brief History of Time, Bantam Books, Inc., New York. 
Jacob, F. (1973) The Logic of Life: A History of Heredity, Pantheon Books, Inc., New York. Originally published (1970) as La logique du vivant: une histoire de l’hérédité, Editions Gallimard, Paris. 
A fascinating historical and philosophical account of the route by which we came to the present molecular understanding of life.
Kornberg, A. (1987) The two cultures: chemistry and biology. Biochem. 26, 6888–6891. 
The importance of applying chemical tools to biological problems, described by an eminent practitioner.
Monod, J. (1971) Chance and Necessity, Alfred A. Knopf, Inc., New York. [Paperback version (1972) Vintage Books, New York.] Originally published (1970) as Le hasard et la necessité, Editions du Seuil, Paris. 
An exploration of the philosophical implications of biological knowledge.
Schrödinger, E. (1944) What is Life? Cambridge University Press, New York. [Reprinted (1956) in What is Life? and Other Scientific Essays, Doubleday Anchor Books, Garden City, NY.] 
A thought-provoking look at life, written by a prominent physical chemist.
Chapter 2
nucleus (eukaryotes) or nucleoid (bacteria) contains genetic material – DNA and associated proteins, nucleus is membrane-bounded, plasma membrane, tough, flexible bilayer, selectively permeable to polar substances, includes membrane proteins that function in transport, in signal reception, and as enzymes, cytoplasm, aqueous cell contents and suspended particles and organelles, centrifuge at 150,000 g, supernatant: cytosol, concentrated solution of enzymes, RNA, building block molecules, metabolites, inorganic ions, pellet: particles and organelles, ribosomes, storage granules, mitochondria, chloroplasts, lysosomes, endoplasmic reticulum
Figure 2–1  The universal features of all living cells: a nucleus or nucleoid, a plasma membrane, and cytoplasm. The cytosol is that portion of the cytoplasm that remains in the supernatant after centrifugation of a cell extract of 150,000 g for 1 h.
Cells are the structural and functional units of all living organisms. The smallest organisms consist of single cells and are microscopic, whereas larger organisms are multicellular. The human body, for example, contains at least 1014 cells. Unicellular organisms are found in great variety throughout virtually every environment from Antarctica to hot springs to the inner recesses of larger organisms. Multicellular organisms contain many different types of cells, which vary in size, shape, and specialized function. Yet no matter how large and complex the organism, each of its cells retains some individuality and independence.
Despite their many differences, cells of all kinds share certain structural features (Fig. 2–1). The plasma membrane defines the periphery of the cell, separating its contents from the surroundings. It is composed of enormous numbers of lipids and protein molecules, held together primarily by noncovalent hydrophobic interactions (p. 18), forming a thin, tough, pliable, hydrophobic bilayer around the cell. The membrane is a barrier to the free passage of inorganic ions and most other charged or polar compounds; transport proteins in the plasma membrane allow the passage of certain ions and molecules. Other membrane proteins are receptors that transmit signals from the outside to the inside of the cell, or are enzymes that participate in membrane-associated reaction pathways.
Because the individual lipid and protein subunits of the plasma membrane are not covalently linked, the entire structure is remarkably flexible, allowing changes in the shape and size of the cell. As a cell grows, newly made lipid and protein molecules are inserted into its plasma membrane; cell division produces two cells, each with its own membrane. Growth and fission occur without loss of membrane integrity. In a reversal of the fission process, two separate membrane surfaces can fuse, also without loss of integrity. Membrane fusion and fission are central to mechanisms of transport known as endocytosis and exocytosis.
The internal volume bounded by the plasma membrane, the cytoplasm, is composed of an aqueous solution, the cytosol, and a variety of insoluble, suspended particles (Fig. 2–1). The cytosol is not simply a dilute aqueous solution; it has a complex composition and gel-like consistency. Dissolved in the cytosol are many enzymes and the RNA molecules that encode them; the monomeric subunits (amino acids and nucleotides) from which these macromolecules are assembled; hundreds of small organic molecules called metabolites, intermediates in biosynthetic and degradative pathways; coenzymes, compounds of
intermediate molecular weight (Mr 200 to 1,000) that are essential participants in many enzyme-catalyzed reactions; and inorganic ions.
Among the particles suspended in the cytosol are supramolecular complexes and, in higher organisms but not in bacteria, a variety of membrane-bounded organelles in which specialized metabolic machinery is localized. Ribosomes, complexes of over 50 different protein and RNA molecules, are small particles, 18 to 22 nm in diameter. Ribosomes are the enzymatic machines on which protein synthesis occurs; they often occur in clusters called polysomes (polyribosomes) held together by a strand of messenger RNA. Also present in the cytoplasm of many cells are granules containing stored nutrients such as starch and fat. Nearly all living cells have either a nucleus or a nucleoid, in which the genome (the complete set of genes, composed of DNA) is stored and replicated. The DNA molecules are always very much longer than the cells themselves, and are tightly folded and packed within the nucleus or nucleoid as supramolecular complexes of DNA with specific proteins. The bacterial nucleoid is not separated from the cytoplasm by a membrane, but in higher organisms, the nuclear material is enclosed within a double membrane, the nuclear envelope. Cells with nuclear envelopes are called eukaryotes (Greek eu, “true”, and karyon, “nucleus”); those without nuclear envelopes – bacterial cells – are prokaryotes (Greek pro, “before”). Unlike bacteria, eukaryotes have a variety of other membrane-bounded organelles in their cytoplasm, including mitochondria, lysosomes, endoplasmic reticulum, Golgi complexes, and, in photosynthetic cells, chloroplasts.
In this chapter we review briefly the evolutionary relationships among some commonly studied cells and organisms, and the structural features that distinguish cells of various types. Our main focus is on eukaryotic cells. Also discussed in brief are the cellular parasites known as viruses.
Figure 2–2  Smaller cells have larger ratios of surface area to volume, and their interiors are therefore more accessible to substances diffusing into the cell through the surface. When the large cube (representing a large cell) is subdivided into many smaller cubes (cells), the total surface area increases greatly without a change in the total volume, and the surface-to-volume ratio increases accordingly.
Most cells are of microscopic size. Animal and plant cells are typically 10 to 30 μm in diameter, and many bacteria are only 1 to 2 μm long.
What limits the dimensions of a cell? The lower limit is probably set by the minimum number of each of the different biomolecules required by the cell. The smallest complete cells, certain bacteria known collectively as mycoplasma, are 300 nm in diameter and have a volume of about 10−14 mL. A single ribosome is about 20 nm in its longest dimension, so a few ribosomes take up a substantial fraction of the cell’s volume. In a cell of this size, a 1 μM solution of a compound represents only 6,000 molecules.
The upper limit of cell size is set by the rate of diffusion of solute molecules in aqueous systems. The availability of fuels and essential nutrients from the surrounding medium is sometimes limited by the rate of their diffusion to all regions of the cell. A bacterial cell that depends upon oxygen-consuming reactions for energy production (an aerobic cell) must obtain molecular oxygen (O2) from the surrounding medium by diffusion through its plasma membrane. The cell is so small, and the ratio of its surface area to its volume is so large, that every part of its cytoplasm is easily reached by O2 diffusing into the cell. As the size of a cell increases, its surface-to-volume ratio decreases (Fig. 2–2), until metabolism consumes O2 faster than diffusion can
supply it. Aerobic metabolism thus becomes impossible as cell size increases beyond a certain point, placing a theoretical upper limit on the size of the aerobic cell.
20 μm, 10 μm, 5 μm, number of cubes, n, 1, 8, 64, length of a side, l (μm), 20, 10, 5, total surface area, l2 × 6n (μm2), 2,400, 4,800, 9,600, total volume, l3 × n (μm3), 8,000, 8,000, 8,000, surface area:volume ratio, 6/l, 0.3, 0.6, 1.2
There are interesting exceptions to this generalization that cells must be small. The giant alga Nitella has cells several centimeters long. To assure the delivery of nutrients, metabolites, and genetic information (RNA) to all of its parts, each cell is vigorously “stirred” by active cytoplasmic streaming (p. 43). The shape of a cell can also help to compensate for its large size. A smooth sphere has the smallest surface-to-volume ratio possible for a given volume. Many large cells, although roughly spherical, have highly convoluted surfaces (Fig. 2–3a), creating larger surface areas for the same volume and thus facilitating the uptake of fuels and nutrients and release of waste products to the surrounding medium. Other large cells (neurons, for example) have large surface-to-volume ratios because they are long and thin, star-shaped, or highly branched (Fig. 2–3b), rather than spherical.
Figure 2–3  Convolutions of the plasma membrane, or long, thin extensions of the cytoplasm, increase the surface-to-volume ratio of cells. (a) Cells of the intestinal mucosa (the inner lining of the small intestine) are covered with microvilli, increasing the area for absorption of nutrients from the intestine. (b) Neurons of the hippocampus of the rat brain are several millimeters long, but the long extensions (axons) are only about 10 nm wide.
Because all living cells have evolved from the same progenitors, they share certain fundamental similarities. Careful biochemical study of just a few cells, however different in biochemical details and varied in superficial appearance, ought to yield general principles applicable to all cells and organisms. The burgeoning knowledge in biology in the past 150 years has supported these propositions over and over again. Certain cells, tissues, and organisms have proved more amenable to experimental studies than others. Knowledge in biochemistry, and much of the information in this book, continues to be derived from a few representative tissues and organisms, such as the bacterium Escherichia coli, the yeast Saccharomyces, photosynthetic algae, spinach leaves, the rat liver, and the skeletal muscle of several different vertebrates.
A dividing Escherichia coli cell.
Saccharomyces cerevisiae (baker’s yeast).
In the isolation of enzymes and other cellular components, it is ideal if the experimenter can begin with a plentiful and homogeneous source of the material. The component of interest (such as an enzyme or nucleic acid) often represents only a miniscule fraction of the total material, and many grams of starting material are needed to obtain a few micrograms of the purified component. Certain types of physical and chemical studies of biomolecules are precluded if only microgram quantities of the pure substance are available. A homogeneous source of an enzyme or nucleic acid, in which all of the cells are genetically and biochemically identical, leaves no doubt about which cell type yielded the purified component, and makes it safer to extrapolate the results of in vitro studies to the situation in vivo. A large culture of bacterial or protistan cells (E. coli, Saccharomyces, or Chlamydomonas, for example), all derived by division from the same parent and therefore genetically identical, meets the requirement for a plentiful and homogeneous source. Individual tissues from laboratory animals (rat liver, pig brain, rabbit muscle) are plentiful sources of similar, though not identical, cells. Some animal and plant cells proliferate in cell culture, producing populations of identical (cloned) cells in quantities suitable for biochemical analysis.
Genetic mutants, in which a defect in a single gene produces a specific functional defect in the cell or organism, are extremely useful in establishing that a certain cellular component is essential to a particular cellular function. Because it is technically much simpler to produce and detect mutants in bacteria and yeast, these organisms (E. coli and Saccharomyces cerevisiae, for example) have been favorite experimental targets for biochemical geneticists.
An organism that is easy to culture in the laboratory, with a short generation time, offers significant advantages to the research biochemist. An organism that requires only a few simple precursor molecules in its growth medium can be cultured in the presence of a radioisotopically labeled precursor, and the metabolic fate of that precursor can then be conveniently traced by following the incorporation of the radioactive atoms into its metabolic products. The short generation time (minutes or hours) of microorganisms allows the investigator to follow a labeled precursor or a genetic defect through many generations in a few days. In higher organisms with generation times of months or years, this is virtually impossible.
Some highly specialized tissues of multicellular organisms are
remarkably enriched in some particular component related to their specialized function. Vertebrate skeletal muscle is a rich source of actin and myosin; pancreatic secretory cells contain high concentrations of rough endoplasmic reticulum; sperm cells are rich in DNA and in flagellar proteins; liver (the major biosynthetic organ of vertebrates) contains high concentrations of many enzymes of biosynthetic pathways; spinach leaves contain large numbers of chloroplasts; and so on. For studies on such specific components or processes, biochemists commonly choose a specialized tissue for their experimental systems.
Sometimes simplicity of structure or function makes a particular cell or organism attractive as an experimental system. For studies of plasma membrane structure and function, the mature erythrocyte (red blood cell) has been a favorite; it has no internal membranes to complicate purification of the plasma membrane. Some bacterial viruses (bacteriophages) have few genes. Their DNA molecules are therefore smaller and much simpler than those of humans or corn plants, and it has proved easier to study replication in these viruses than in human or corn chromosomes.
The biochemical description of living cells in this book is a composite, based on studies of many types of cells. The biochemist must always exercise caution in generalizing from results obtained in studies of selected cells, tissues, and organisms, and in relating what is observed in vitro to what happens within the living cell.
Nostoc sp., a photosynthetic cyanobacterium.
All of the organisms alive today are believed to have evolved from ancient, unicellular progenitors. Two large groups of extant prokaryotes evolved from these early forms: archaebacteria (Greek, arché, “origin”) and eubacteria. Eubacteria inhabit the soil, surface waters, and the tissues of other living or decaying organisms. Most common and well-studied bacteria, including E. coli and the cyanobacteria (formerly called blue–green algae), are eubacteria. The archaebacteria are more recently discovered and less well studied. They inhabit more extreme environments – salt brines, hot acid springs, bogs, and the deep regions of the ocean.
Within each of these two large groups of bacteria are subgroups distinguished by the habitats to which they are best adapted. In some habitats there is a plentiful supply of oxygen, and the resident organisms live by aerobic metabolism; their catabolic processes ultimately result in the transfer of electrons from fuel molecules to oxygen. Other environments are virtually devoid of oxygen, forcing resident organisms to conduct their catabolic business without it. Many of the organisms that have evolved in these anaerobic environments are obligate anaerobes; they die when exposed to oxygen.
All organisms, including bacteria, can be classified as either chemotrophs (those obtaining their energy from a chemical fuel) or phototrophs (those using sunlight as their primary energy source). Certain organisms can synthesize some or all of their monomeric subunits, metabolic intermediates, and macromolecules from very simple starting materials such as CO2 and NH3; these are the autotrophs. Others must acquire some of their nutrients from the environment preformed (by autotrophic organisms, for example); these are heterotrophs. There are therefore four general modes of obtaining fuel and energy, and four general groups of organisms distinguished by these
modes: chemoheterotrophs, chemoautotrophs, photoheterotrophs, and photoautotrophs (Fig. 2–4).
Figure 2–4  Organisms can be classified according to their source of energy (shaded red) and the form in which they obtain carbon atoms (shaded blue) for the synthesis of cellular material. Organic compounds are both energy source and carbon source for chemoheterotrophs such as ourselves. Some, but not all, chemoheterotrophs consume O2 and produce CO2, and some photoautotrophs produce O2 (shaded green).
chemoheterotroph, organic compounds, O2 → CO2, cells, chemoautotroph, CO2, H2S, Fe2+ → S, Fe3+, cells, photoheterotroph, organic compounds, light → cells, photoautotroph, CO2, light → O2, cells
As shown in Figure 2–5, the earliest cells probably arose about 3.5 billion (3.5 × 109) years ago in the rich mixture of organic compounds, the “primordial soup”, of prebiotic times; they were almost certainly chemoheterotrophs. The organic compounds were originally synthesized from such components of the early earth’s atmosphere as CO, CO2, N2, and CH4 by the nonbiological actions of volcanic heat and lightning (Chapter 3). Primitive heterotrophs gradually acquired the capability to derive energy from certain compounds in their environment and to use that energy to synthesize more and more of their own precursor molecules, thereby becoming less dependent on outside sources of these molecules – less extremely heterotrophic. A very significant evolutionary event was the development of pigments capable of capturing visible light from the sun and using the energy to reduce or “fix” CO2 into more complex organic compounds. The original electron (hydrogen) donor for these photosynthetic organisms was probably H2S, yielding elemental sulfur as the byproduct, but at some point cells developed the enzymatic capacity to use H2O as the electron donor in photosynthetic reactions, producing O2. The cyanobacteria are the modern descendants of these early photosynthetic O2 producers.
The atmosphere of the earth in the earliest stages of biological evolution was nearly devoid of O2, and the earliest cells were therefore anaerobic. With the rise of O2-producing photosynthetic cells, the earth’s atmosphere became progressively richer in O2, allowing the evolution of aerobic organisms, which obtained energy by passing electrons from fuel molecules to O2 (that is, by oxidizing organic compounds). Because electron transfers involving O2 yield energy (they are very exergonic; see Chapter 1), aerobic organisms enjoyed an energetic advantage over their anaerobic counterparts when both competed in an environment containing O2. This advantage translated into the predominance of aerobic organisms in O2-rich environments.
Figure 2–5  Landmarks in the evolution of life on earth.
Modern bacteria inhabit almost every ecological niche in the biosphere, and there are bacterial species capable of using virtually every type of organic compound as a source of carbon and energy. Perhaps three-fourths of all the living matter on the earth consists of microscopic organisms, most of them bacteria.
Bacteria play an important role in the biological exchanges of matter and energy. Photosynthetic bacteria in both fresh and marine waters trap solar energy and use it to generate carbohydrates and other cell materials, which are in turn used as food by other forms of life. Some bacteria can capture molecular nitrogen (N2) from the atmosphere and use it to form biologically useful nitrogenous compounds, a process known as nitrogen fixation. Because animals and most plants cannot do this, bacteria form the starting point of many food chains in the biosphere. They also participate as ultimate consumers, degrading the organic structures of dead plants and animals and recycling the end products to the environment.
millions of years ago, 0, diversification of large eukaryotes, 500, multicellular eukaryotes, 1,500, eukaryotes, aerobic bacteria, development of O2-rich atmosphere, 2,500, photosynthetic O2-producing cyanobacteria, nonphotosynthetic sulfur bacteria, 3,500, anaerobic photosynthetic sulfur bacteria, anaerobic methanogens, formation of oceans and continents, 4,500, formation of the earth
Bacterial cells share certain common structural features, but also show group-specific specializations (Fig. 2–6). E. coli is a usually harmless inhabitant of the intestinal tract of human beings and many other mammals. The E. coli cell is about 2 μm long and a little less than 1 μm in diameter. It has a protective outer membrane and an inner plasma membrane that encloses the cytoplasm and the nucleoid. Between the inner and outer membranes is a thin but strong layer of peptidoglycans (sugar polymers cross-linked by amino acids), which gives the cell its shape and rigidity. The plasma membrane and the layers outside it constitute the cell envelope. Differences in the cell envelope account for the different affinities for the dye Gentian violet, which is the basis for Gram’s stain; gram-positive bacteria retain the dye, and gram-negative bacteria do not. The outer membrane of E. coli, like that of other gram-negative eubacteria, is similar to the plasma membrane in structure but is different in composition. In gram-positive bacteria (Bacillus subtilis and Staphylococcus aureus, for example) there is no outer membrane, and the peptidoglycan layer surrounding the plasma membrane is much thicker than that in gram-negative bacteria. The plasma membranes of eubacteria consist of a thin bilayer of lipid molecules penetrated by proteins. Archaebacterial membranes have a similar architecture, although their lipids differ from those of the eubacteria.
ribosomes, bacterial ribosomes are smaller than eukaryotic ribosomes, but serve the same function – protein synthesis from an RNA message, nucleoid, contains a single, simple, long circular DNA molecule, pili, provide points of adhesion to surface of animal cells, flagella, propel cell through its surroundings, cell envelope, structure varies with different types of bacteria, gram-negative bacteria, outer membrane and peptidoglycan layer, outer membrane, peptidoglycan layer, inner membrane, gram-positive bacteria, thicker peptidoglycan layer, outer membrane absent, peptidoglycan layer, inner membrane, cyanobacteria, type of gram-negative bacteria with tougher peptidoglycan layer and extensive internal membrane system containing photosynthetic pigments, archaebacteria, no peptidoglycan layer
Figure 2–6  Common structural features of bacterial cells. Because of differences in cell envelope structure, some eubacteria (gram-positive bacteria) retain Gram’s stain, and others (gram-negative bacteria) do not. E. coli is gram-negative. Cyanobacteria are also eubacteria, but are distinguished by their extensive internal membrane system, in which photosynthetic pigments are localized.
The plasma membrane contains proteins capable of transporting certain ions and compounds into the cell and carrying products and waste out. Also in the plasma membrane of most eubacteria are electron-carrying proteins (cytochromes) essential in the formation of ATP from ADP (Chapter 1). In the photosynthetic bacteria, internal membranes derived from the plasma membrane contain chlorophyll and other light-trapping pigments.
From the outer membrane of E. coli cells and some other eubacteria protrude short, hairlike structures called pili, by which cells adhere to the surfaces of other cells. Strains of E. coli and other motile bacteria have one or more long flagella, which can propel the bacterium through its aqueous surroundings. Bacterial flagella are thin, rigid, helical rods, 10 to 20 nm thick. They are attached to a protein structure that spins in the plane of the cell surface, rotating the flagellum.
The cytoplasm of E. coli contains about 15,000 ribosomes, thousands of copies of each of several thousand different enzymes, numerous metabolites and cofactors, and a variety of inorganic ions. Under some conditions, granules of polysaccharides or droplets of lipid accumulate. The nucleoid contains a single, circular molecule of DNA. Although the DNA molecule of an E. coli cell is almost 1,000 times longer
than the cell itself, it is packaged with proteins and tightly folded into the nucleoid, which is less than 1 μm in its longest dimension. As in all bacteria, no membrane surrounds the genetic material. In addition to the DNA in the nucleoid, the cytoplasm of most bacteria contains many smaller, circular segments of DNA called plasmids. These nonessential segments of DNA are especially amenable to experimental manipulation and are extremely useful to the molecular geneticist. In nature, some plasmids confer resistance to toxins and antibiotics in the environment.
There is a primitive division of labor within the bacterial cell. The cell envelope regulates the flow of materials into and out of the cell, and protects the cell from noxious environmental agents. The plasma membrane and the cytoplasm contain a variety of enzymes essential to energy metabolism and the synthesis of precursor molecules; the ribosomes manufacture proteins; and the nucleoid stores and transmits genetic information. Most bacteria lead existences that are nearly independent of other cells, but some bacterial species tend to associate in clusters or filaments, and a few (the myxobacteria, for example) demonstrate primitive social behavior. Only eukaryotic cells, however, form true multicellular organisms with a division of labor among cell types.
Fossils older than 1.5 billion years are limited to those from small and relatively simple organisms, similar in size and shape to modern prokaryotes. Starting about 1.5 billion years ago, the fossil record begins to show evidence of larger and more complex organisms, probably the earliest eukaryotic cells (see Fig. 2–5). Details of the evolutionary path from prokaryotes to eukaryotes cannot be deduced from the fossil record alone, but morphological and biochemical comparison of modern organisms has suggested a reasonable sequence of events consistent with the fossil evidence.
membrane-bounded nucleus, plasma membrane, membrane-bounded organelles, ribosomes, protists, fungi, animals, plants, primitive anaerobic eukaryote, mitochondria, chloroplasts, cyanobacteria, heterotrophic anaerobes, other eubacteria, archaebacteria, ancestral prokaryote, nucleoid, ribosomes, plasma membrane, time
Figure 2–7  One view of how modern plants, animals, fungi, protists, and bacteria share a common evolutionary precursor.
Table 2–1, DNA content and genome complexity / Genome size (nucleotide pairs), Relative genome size (E. coli = 1), Length of DNA (mm) / Viruses / SV40, 5 × 103, 0.00125, 0.0017 / T7, 4 × 104, 0.01, 0.014 / T2, 2 × 105, 0.05, 0.068 / Prokaryotes / Mycoplasma, 3 × 105, 0.075, 0.10 / Bacillus, 3 × 106, 0.75, 1.02 / E. coli, 4 × 106, 1.00, 1.36 / Fungi / Yeast, 2 × 107, 5, 6.8 / Animals / Fruit fly, 2 × 108, 50, 68 / Chicken, 2 × 109, 500, 680 / Human, 5 × 109, 1,250, 1,700 / Plants / Peas, 9 × 109, 2,250, 3,100 / Trillium, 1 × 1011, 30,000, 34,000
Three major changes must have occurred as prokaryotes gave rise to eukaryotes (Fig. 2–7). First, as cells acquired more DNA (Table 2–1), mechanisms evolved to fold it compactly into discrete complexes with specific proteins and to divide it equally between daughter cells at cell division. These DNA-protein complexes, chromosomes, (Greek chroma, “color” and soma, “body”), become especially compact at the time of cell division, when they can be visualized with the light microscope as threads of chromatin. Second, as cells became larger, a system of intracellular membranes developed, including a double membrane surrounding the DNA. This membrane segregated the nuclear process of RNA synthesis using a DNA template from the cytoplasmic process of protein synthesis on ribosomes. Finally, primitive eukaryotic cells, which were incapable of photosynthesis or of aerobic metabolism, pooled their assets with those of aerobic bacteria or photosynthetic bacteria to form symbiotic associations that became permanent. Some aerobic bacteria evolved into the mitochondria of modern eukaryotes, and some photosynthetic cyanobacteria became the chloroplasts of modern plant cells. Prokaryotic and eukaryotic cells are compared in Table 2–2.
Source: From Becker, W.M. & Deamer, D.W. (1991) The World of the Cell, 2nd edn, p. 363, The Benjamin/Cummings Publishing Company, Menlo Park, CA.
Table 2–2, Comparison of prokaryotic and eukaryotic cells / Characteristic, Prokaryotic cell, Eukaryotic cell / Size, Generally small (1–10 μm), Generally large (10–100 μm) / Genome, DNA with nonhistone protein; genome in nucleoid, not surrounded by membrane, DNA complexed with histone and nonhistone proteins in chromosomes; chromosomes in nucleus with membranous envelope / Cell division, Fission or budding; no mitosis, Mitosis including mitotic spindle; centrioles in many / Membrane-bounded organelles, Absent, Mitochondria, chloroplasts (in plants), endoplasmic reticulum, Golgi complexes, lysosomes, etc. / Nutrition, Absorption; some photosynthesis, Absorption, ingestion; photosynthesis by some / Energy metabolism, No mitochondria; oxidative enzymes bound to plasma membrane; great variation in metabolic pattern, Oxidative enzymes packaged in mitochondria; more unified pattern of oxidative metabolism / Cytoskeleton, None, Complex, with microtubules, intermediate filaments, actin filaments / Intracellular movement, None, Cytoplasmic streaming, endocytosis, phagocytosis, mitosis, axonal transport
Source: Modified from Hickman, C.P., Roberts, L.S., & Hickman, F.M. (1990) Biology of Animals, 5th edn, p. 30, Mosby–Yearbook, Inc. St. Louis, MO.
With the rise of primitive eukaryotic cells, further evolution led to a tremendous diversity of unicellular eukaryotic organisms (protists). Some of these (those with chloroplasts) resembled modern photosynthetic protists such as Euglena and Chlamydomonas; other, nonphotosynthetic protists were more like Paramecium or Dictyostelium. Unicellular eukaryotes are abundant, and the cells of all multicellular animals, plants, and fungi are eukaryotic; there are only a few thousand prokaryotic species, but millions of species of eukaryotic organisms.
Typical eukaryotic cells (Fig. 2–8) are much larger than prokaryotic cells – commonly 10 to 30 μm in diameter, with cell volumes 1,000 to 10,000 times larger than those of bacteria. The distinguishing characteristic of eukaryotes is the nucleus with a complex internal structure, surrounded by a double membrane. The other striking difference between eukaryotes and prokaryotes is that eukaryotes contain a number of other membrane-bounded organelles. The following sections describe the structures and roles of the components of eukaryotic cells in more detail.
Figure 2–8  Schematic illustration of the two types of eukaryotic cell: a representative animal cell (a) and a representative plant cell (b).
ribosomes, peroxisome, cytoskeleton, lysosome, transport vesicle, Golgi complex, centrioles, nuclear envelope, plasma membrane, mitochondrion, rough endoplasmic reticulum, nucleolus, nucleus, smooth endoplasmic reticulum, ribosomes, cytoskeleton, Golgi complex, chloroplast, starch granule, thylakoids, cell wall, cell wall of adjacent cell, vacuole, plasmodesma
transporter, nutrient → nutrient, signal receptor, ligands, substrate → product, ion channel, ions → ions, (intracellular signals)
Figure 2–9  Proteins in the plasma membrane serve as transporters, signal receptors, and ion channels. Extracellular signals are amplified by receptors, because binding of a single ligand molecule to the surface receptor causes many molecules of an intracellular signal molecule to be formed, or many ions to flow through the opened channel. Transporters carry substances into and out of the cell, but do not act as signal amplifiers.
The external surface of a cell is in contact with other cells, the extracellular fluid, and the solutes, nutrient molecules, hormones, neurotransmitters, and antigens in that fluid. The plasma membranes of all cells contain a variety of transporters, proteins that span the width of the membrane and carry nutrients into and waste products out of the cell. Cells also have surface membrane proteins (signal receptors) that present highly specific binding sites for extracellular signaling molecules (receptor ligands). When an external ligand binds to its specific receptor, the receptor protein transduces the signal carried by that ligand into an intracellular message (Fig. 2–9). For example, some surface receptors are associated with ion channels that open when the receptor is occupied; others span the membrane and activate or inhibit cellular enzymes on the inner membrane surface. Whatever the mode of signal transduction, surface receptors characteristically act as signal amplifiers – a single ligand molecule bound to a single receptor may cause the flux of thousands of ions through an opened channel, or the synthesis of thousands of molecules of an intracellular messenger molecule by an activated enzyme.
Some surface receptors recognize ligands of low molecular weight, and others recognize macromolecules. For example, binding of acetylcholine (Mr 146) to its receptor begins a cascade of cellular events that underlie the transmission of signals for muscle contraction. Blood proteins (Mr > 20,000) that carry lipids (lipoproteins) are recognized by specific cell surface receptors and then transported into the cells. Antigens (proteins, viruses, or bacteria, recognized by the immune system as foreign) bind to specific receptors and trigger the production of antibodies. During the development of multicellular organisms, neighboring cells influence each other’s developmental paths, as signal molecules from one cell type react with receptors of other cells. Thus the surface membrane of a cell is a complex mosaic of different kinds of highly specific “molecular antennae” through which cells receive, amplify, and react to external signals.
Most cells of higher plants have a cell wall outside the plasma membrane (Fig. 2–8b), which serves as a rigid, protective shell. The cell wall, composed of cellulose and other carbohydrate polymers, is thick but porous. It allows water and small molecules to pass readily, but swelling of the cell due to the accumulation of water is resisted by the rigidity of the wall.
Endocytosis is a mechanism for transporting components of the surrounding medium deep into the cytoplasm. In this process (Fig. 2–10), a region of the plasma membrane invaginates, enclosing a small volume of extracellular fluid within a bud that pinches off inside the cell by membrane fission. The resulting small vesicle (endosome) can move into the interior of the cell, delivering its contents to another organelle bounded by a single membrane (a lysosome, for example; see p. 34) by fusion of the two membranes. The endosome thus serves as an intracellular extension of the plasma membrane, effectively allowing intimate contact between components of the extracellular medium and regions deep within the cytoplasm, which could not be reached by diffusion alone. Phagocytosis is a special case of endocytosis, in which the material carried into the cell (within a phagosome) is particulate, such as a cell fragment or even another, smaller cell. The inverse of endocytosis is exocytosis (Fig. 2–10), in which a vesicle in the cytoplasm moves to the inside surface of the plasma membrane and fuses with it, releasing the vesicular contents outside the membrane. Many proteins destined for secretion into the extracellular space are released by exocytosis after being packaged into secretory vesicles.
Figure 2–10  The endomembrane system includes the nuclear envelope, endoplasmic reticulum, Golgi complex, and several types of small vesicles. This system encloses a compartment (the lumen) distinct from the cytosol. Contents of the lumen move from one region of the endomembrane system to another as small transport vesicles bud from one component and fuse with another. High-magnification electron micrographs of a sectioned cell show rough endoplasmic reticulum, studded with ribosomes, smooth endoplasmic reticulum, and the Golgi complex.
     The endomembrane system is dynamic; newly synthesized proteins move into the lumen of the rough endoplasmic reticulum and thus to the smooth endoplasmic reticulum, then to the Golgi complex via transport vesicles. In the Golgi complex, molecular “addresses” are added to specific proteins to direct them to the cell surface, lysosomes, or secretory vesicles. The contents of secretory vesicles are released from the cell by exocytosis. Endocytosis and phagocytosis bring extracellular materials into the cell. Fusion of endosomes (or phagosomes) with lysosomes, which are full of digestive enzymes, results in the degradation of the extracellular materials.
The small transport vesicles moving to and from the plasma membrane in exocytosis and endocytosis are parts of a dynamic system of intracellular membranes (Fig. 2–10), which includes the endoplasmic reticulum, the Golgi complexes, the nuclear envelope, and a variety of small vesicles such as lysosomes and peroxisomes. Although generally represented as discrete and static elements, these structures are in fact in constant flux, with membrane vesicles continually budding from one of the structures and moving to and merging with another.
The endoplasmic reticulum is a highly convoluted, three-dimensional network of membrane-enclosed spaces extending throughout the cytoplasm and enclosing a subcellular compartment (the lumen of the endoplasmic reticulum) separate from the cytoplasm. The many flattened branches (cisternae) of this compartment are continuous with each other and with the nuclear envelope. In cells specialized for the secretion of proteins into the extracellular space, such as the pancreatic cells that secrete the hormone insulin, the endoplasmic reticulum is particularly prominent. The ribosomes that synthesize proteins destined for export attach to the outer (cytoplasmic) surface of the endoplasmic reticulum, and the secretory proteins are passed through the membrane into the lumen as they are synthesized. Proteins destined for sequestration within lysosomes, or for insertion into the nuclear or plasma membranes, are also synthesized on ribosomes attached to the endoplasmic reticulum. By contrast, proteins that will remain and function within the cytosol are synthesized on cytoplasmic ribosomes unassociated with the endoplasmic reticulum.
The attachment of thousands of ribosomes (usually in regions of large cisternae) gives the rough endoplasmic reticulum its granular appearance (Fig. 2–10) and thus its name. In other regions of the cell, the endoplasmic reticulum is free of ribosomes. This smooth endoplasmic reticulum, which is physically continuous with the
rough endoplasmic reticulum, is the site of lipid biosynthesis and of a variety of other important processes, including the metabolism of certain drugs and toxic compounds. Smooth endoplasmic reticulum is generally tubular, in contrast to the long, flattened cisternae typical of rough endoplasmic reticulum. In some tissues (skeletal muscle, for example) the endoplasmic reticulum is specialized for the storage and rapid release of calcium ions. Ca2+ release is the trigger for many cellular events, including muscle contraction.
nucleus, rough endoplasmic reticulum, proteins synthesized for export, transport vesicle, smooth endoplasmic reticulum, Golgi complex, cis side, trans side, secretory vesicles, exocytosis of secretory products, proteins, polysaccharides, etc., endocytosis or phagocytosis of bacteria, debris, etc., phagosome/endosome, lysosome
Nearly all eukaryotic cells have characteristic clusters of membrane vesicles called dictyosomes. Several connected dictyosomes constitute a Golgi complex. A Golgi complex (also called Golgi apparatus) is most commonly seen as a stack of flattened membrane vesicles (cisternae) (Fig. 2–10). Near the ends of these cisternae are numerous, much smaller, spherical vesicles (transport vesicles) that bud off the edges of the cisternae.
The Golgi complex is asymmetric, structurally and functionally. The cis side faces the rough endoplasmic reticulum, and the trans side, the plasma membrane; between these are the medial elements. Proteins, during their synthesis on ribosomes bound to the rough endoplasmic reticulum, are inserted into the interior (lumen) of the cisternae. Small membrane vesicles containing the newly synthesized proteins bud from the endoplasmic reticulum and move to the Golgi complex, fusing with the cis side. As the proteins pass through the Golgi complex to the trans side, enzymes in the complex modify the protein molecules by adding sulfate, carbohydrate, or lipid moieties to side chains of certain amino acids. One of the functions of this modification of a newly synthesized protein is to “address” it to its proper destination as it leaves the Golgi complex in a transport vesicle budding from the trans side. Certain proteins are enclosed in secretory vesicles, eventually to be released from the cell by exocytosis. Others are targeted for intracellular organelles such as lysosomes, or for incorporation into the plasma membrane during cell growth.
Lysosomes, found in the cytoplasm of animal cells, are spherical vesicles bounded by a single membrane. They are usually about 1 μm in diameter, about the size of a small bacterium (Fig. 2–10). Lysosomes contain enzymes capable of digesting proteins, polysaccharides, nucleic acids, and lipids. They function as cellular recycling centers for complex molecules brought into the cell by endocytosis, fragments of foreign cells brought in by phagocytosis, or worn-out organelles from the cell’s own cytoplasm. These materials selectively enter the lysosomes by fusion of the lysosomal membrane with endosomes, phagosomes, or defective organelles, and are then degraded to their simple components (amino acids, monosaccharides, fatty acids, etc.), which are released into the cytosol to be recycled into new cellular components or further catabolized.
The degradative enzymes within lysosomes would be harmful if not confined by the lysosomal membrane; they would be free to act on all cellular components. The lysosomal compartment is more acidic (pH ≤ 5) than the cytoplasm (pH ≈ 7); the acidity is due to the action of an ATP-fueled proton pump in the lysosomal membrane. Lysosomal enzymes are much less active at pH 7 than at pH ≤ 5, which provides a second line of defense against destruction of cytosolic macromolecules, should these enzymes escape into the cytosol.
Figure 2–11  The vacuole of a plant cell contains high concentrations of a variety of stored compounds and waste products. Water enters the vacuole by osmosis and increases the vacuolar volume. The resulting turgor pressure forces the cytoplasm out against the cell wall. The rigidity of the cell wall prevents expansion and rupture of the plasma membrane.
cell wall, mitochondrion, vacuole, H2O, turgor pressure, cytosol, tonoplast (vacuole membrane), chloroplast
Plant cells do not have organelles identical to lysosomes, but their vacuoles carry out similar degradative reactions as well as other functions
not found in animal cells. Growing plant cells contain several small vacuoles, vesicles bounded by a single membrane, which fuse and become one large vacuole in the center of the mature cell (Fig. 2–11; see also Fig. 2–8b). The surrounding membrane, the tonoplast, regulates the entry into the vacuole of ions, metabolites, and cellular structures destined for degradation. In the mature cell, the vacuole may represent as much as 90% of the total cell volume, pressing the cytoplasm into a thin layer between the tonoplast and the plasma membrane. The liquid within the vacuole, the cell sap, contains digestive enzymes that degrade and recycle macromolecular components no longer useful to the cell. In some plant cells, the vacuole contains high concentrations of pigments (anthocyanins) that give the deep purple and red colors to the flowers of roses and geraniums and the fruits of grapes and plums. Like the contents of lysosomes, the cell sap is generally more acidic than the surrounding cytosol. In addition to its role in storage and degradation of cellular components, the vacuole also provides physical support to the plant cell. Water passes into the vacuole by osmosis because of the high solute concentration of the cell sap, creating outward pressure on the cytosol and the cell wall. This turgor pressure within cells stiffens the plant tissue (Fig. 2–11).
Some of the oxidative reactions in the breakdown of amino acids and fats produce free radicals and hydrogen peroxide (H2O2), very reactive chemical species that could damage cellular machinery. To protect the cell from these destructive byproducts, such reactions are segregated within small membrane-bounded vesicles called peroxisomes. The hydrogen peroxide is degraded by catalase, an enzyme present in large quantities in peroxisomes and glyoxysomes; it catalyzes the reaction 2H2O2 → 2H2O + O2.
Glyoxysomes are specialized peroxisomes found in certain plant cells. They contain high concentrations of the enzymes of the glyoxylate cycle, a metabolic pathway unique to plants that allows the conversion of stored fats into carbohydrates during seed germination. Lysosomes, peroxisomes, and glyoxysomes are sometimes referred to collectively as microbodies.
nucleolus – transcription of ribosomal RNA, chromatin – tight complex of DNA and histone proteins, nuclear pores – specific transport of RNA and proteins, paired membranes of nuclear envelope, rough endoplasmic reticulum, ribosomes
The eukaryotic nucleus is very complex in both its structure and its biological activity, compared with the relatively simple nucleoid of prokaryotes. The nucleus contains nearly all of the cell’s DNA, typically 1,000 times more than is present in a bacterial cell; a small amount of DNA is also present in mitochondria and chloroplasts. The nucleus is surrounded by a nuclear envelope, composed of two membranes separated by a narrow space and continuous with the rough endoplasmic reticulum (Fig. 2–12; see also Fig. 2–10). At intervals the two nuclear membranes are pinched together around openings (nuclear pores), which have a diameter of about 90 nm. Associated with the pores are protein structures (nuclear pore complexes), specific macromolecule transporters that allow only certain molecules to pass between the cytoplasm and the aqueous phase of the nucleus (the nucleoplasm), such as enzymes synthesized in the cytoplasm and required in the nucleoplasm for DNA replication, transcription, or repair. Messenger RNA precursors and associated proteins also pass out of the nucleus through the nuclear pore complexes, to be translated on ribosomes in the cytoplasm; the nucleoplasm contains no ribosomes.
Figure 2–12  The nucleus and nuclear envelope. (a) Scanning electron micrograph of the surface of the nuclear envelope, showing numerous nuclear pores. (b) Electron micrograph of the nucleus of the alga Chlamydomonas. The dark body in the center of the nucleus is the nucleolus, and the granular material that fills the rest of the nucleus is chromatin. The nuclear envelope has paired membranes with nuclear pores; two are shown by arrows.
Inside the nucleus is the nucleolus, which appears dense in electron micrographs (Fig. 2–12b) because of its high content of RNA. The nucleolus is a specific region of the nucleus, in which the DNA contains many copies of the genes encoding ribosomal RNA. To produce the large number of ribosomes needed by the cell, these genes are continually copied into RNA (transcribed). The nucleolus is the visible evidence of the transcriptional machinery and the RNA product. Ribosomal RNA produced in the nucleolus passes into the cytoplasm through the nuclear pores. The rest of the nucleus contains chromatin, so called because early microscopists found that it stained brightly with certain dyes. Chromatin consists of DNA and proteins bound tightly to the DNA, and represents the chromosomes, which are decondensed in the interphase (nondividing) nucleus and not individually visible.
mitotic chromosome, chromatid (≈600 nm in diameter), chromatin fiber (30 nm in diameter), nucleosomes (10 nm in diameter), histones, DNA
Before division of the cell (cytokinesis), nuclear division (mitosis) occurs. The chromatin condenses into discrete bodies, the chromosomes (Fig. 2–13). Cells of each species have a characteristic number of chromosomes with specific sizes and shapes. The protist Tetrahymena has 4; cabbage has 20, humans have 46, and the plant Ophioglossum, about 1,250! Usually each cell has two copies of each chromosome; such cells are called diploid. Gametes (egg and sperm, for example) produced by meiosis (Chapter 24) have only one copy of each chromosome and are called haploid. During sexual reproduction, two haploid gametes combine to regenerate a diploid cell in which each chromosome pair consists of a maternal and a paternal chromosome.
Chromosomes and chromatin are composed of DNA and a family of positively charged proteins, histones, which associate strongly with DNA by ionic interactions with its many negatively charged phosphate groups. About half of the mass of chromatin is DNA and half is histones. When DNA replicates prior to cell division, large quantities of histones are also synthesized to maintain this 1:1 ratio. The histones and DNA associate in complexes called nucleosomes, in which the DNA strand winds around a core of histone molecules (Fig. 2–13). The DNA of a single human chromosome forms about a million nucleosomes; nucleosomes associate to form very regular and compact supramolecular complexes. The resulting chromatin fibers, about 30 nm in diameter, condense further by forming a series of looped regions, which cluster with adjacent looped regions to form the chromosomes visible during cell division. This tight packing of DNA into nucleosomes achieves a remarkable condensation of the DNA molecules. The DNA in the chromosomes of a single diploid human cell would have a combined length of about 2 m if fully stretched as a DNA double helix, but the combined length of all 46 chromosomes is only about 200 nm.
Figure 2–13  Chromosomes are visible in the electron microscope during mitosis. Shown here is one of the 46 human chromosomes. Every chromosome is composed of two chromatids, each consisting of tightly folded chromatin fibers. Each chromatin fiber is in turn formed by the packaging of a DNA molecule wrapped about histone proteins to form a series of nucleosomes.
(Adapted from Becker, W.M. & Deamer, D.W. (1991) The World of the Cell, 2nd edn, Fig. 13–20, The Benjamin/Cummings Publishing Company, Menlo Park, CA.)
→ late interphase, centrioles, nuclear envelope → early prophase, plasma membrane → late prophase, mitotic spindle, sister chromatids, nuclear envelope fragmenting, centrosome (pair of centrioles) → metaphase, paired chromatids, spindle fibers attached to daughter chromosomes at centromeres → anaphase, nuclear envelope reforming → telophase → early interphase
Before the beginning of mitosis, each chromosome is duplicated to form paired, identical chromatids, each of which is a double helix of DNA. During mitosis (Fig. 2–14), the two chromatids move to opposite ends (poles) of the cell, each becoming a new chromosome. Small cylindrical particles called centrioles, composed of the protein tubulin, provide the spatial organization for the migration of chromatids to opposite ends of the dividing cell. To allow the separation of chromatids, the nuclear envelope breaks down, dispersing into membrane vesicles. When the separation of the two sets of chromosomes is complete, a nuclear envelope derived from the endoplasmic reticulum re-forms around each set. Finally, the two halves of the cell are separated by cytokinesis, and each daughter cell has a complete diploid complement of chromosomes. After mitosis is complete the chromosomes decondense to form dispersed chromatin, and the nucleoli, which disappeared early in mitosis, reappear.
Figure 2–14  Mitosis and cell division in animal cells. In the interphase (nondividing) nucleus (a), the chromosomes are in the form of dispersed chromatin. As mitosis begins (b), chromatin condenses into chromosomes and the mitotic spindle begins to form; centrosomes, which typically contain centriole pairs, dictate the orientation of the spindle. The nuclear envelope disintegrates and the nucleolus disappears (c), and the chromosomes align at the center of the cell (d). The chromatids of each chromosome move to opposite poles of the cell, pulled by spindle fibers attached to their centromeres (e), and a nuclear envelope forms around each new set of chromosomes (f). Finally, two daughter cells form by cell division (cytokinesis) (g). Although the same basic process occurs in all eukaryotes, there are differences in details of mitosis in plants, fungi, and protists.
DNA, crista, matrix, ribosomes, inner membrane, outer membrane
Mitochondria (singular, mitochondrion) are very conspicuous in the cytoplasm of most eukaryotic cells (Fig. 2–15). These membrane-bounded organelles vary in size, but typically have a diameter of about 1 μm, similar to that of bacterial cells. Mitochondria also vary widely in shape, number, and location, depending on the cell type or tissue function. Most plant and animal cells contain several hundred to a thousand mitochondria. Generally, cells in more metabolically active tissues devote a larger proportion of their volume to mitochondria.
Figure 2–15  Structure of a mitochondrion. This electron micrograph of a mitochondrion shows the smooth outer membrane and the numerous infoldings of the inner membrane, called cristae. (Note the extensive rough endoplasmic reticulum surrounding the mitochondrion.)
Each mitochondrion has two membranes. The outer membrane is unwrinkled and completely surrounds the organelle. The inner membrane has infoldings called cristae, which give it a large surface area. The inner compartment of mitochondria, the matrix, is a very concentrated aqueous solution of many enzymes and chemical intermediates involved in energy-yielding metabolism. Mitochondria contain many enzymes that together catalyze the oxidation of organic nutrients by molecular oxygen (O2); some of these enzymes are in the matrix and some are embedded in the inner membrane. The chemical energy released in mitochondrial oxidations is used to generate ATP, the major energy-carrying molecule of cells. In aerobic cells, mitochondria are the
principal producers of ATP, which diffuses to all parts of the cell and provides the energy for cellular work.
Unlike other membranous structures such as lysosomes, Golgi complexes, and the nuclear envelope, mitochondria are produced only by division of previously existing mitochondria; each mitochondrion contains its own DNA, RNA, and ribosomes. Mitochondrial DNA codes for certain proteins specific to the mitochondrial inner membrane, but other mitochondrial proteins are encoded in nuclear DNA. This and other evidence supports the theory that mitochondria are the descendants of aerobic bacteria that lived symbiotically with early eukaryotic cells.
Figure 2–16  A chloroplast in a photosynthetic cell. The thylakoids are flattened membranous sacs that contain chlorophyll, the light-harvesting pigment.
outer membrane, inner membrane, DNA, ribosomes, thylakoids
Plastids are specialized organelles in the cytoplasm of plants; they have two surrounding membranes. Most conspicuous of the plastids and characteristically present in all green plant cells and eukaryotic algae are the chloroplasts (Fig. 2–16). Like mitochondria, the chloroplasts may be considered power plants, with the important difference that chloroplasts use solar energy, whereas mitochondria use the chemical energy of oxidizable molecules. Pigment molecules in chloroplasts absorb the energy of light and use it to make ATP and, ultimately, to reduce carbon dioxide to form carbohydrates such as starch and sucrose. The photosynthetic process in eukaryotes and in cyanobacteria produces O2 as a byproduct of the light-capturing reactions. Photosynthetic plant cells contain both chloroplasts and mitochondria. Chloroplasts transduce energy only in the light, but mitochondria function independently of light, oxidizing carbohydrates generated by photosynthesis during daylight hours.
Chloroplasts are generally larger (diameter 5 μm) than mitochondria and occur in many different shapes. Because chloroplasts contain a high concentration of the pigment chlorophyll, photosynthetic cells are usually green, but their color depends on the relative amounts of other pigments present. These pigment molecules, which together can absorb light energy over much of the visible spectrum, are localized in the internal membranes of the chloroplast, which form stacks of closed cisternae known as thylakoids (Fig. 2–16). Like mitochondria, chloroplasts contain DNA, RNA, and ribosomes. Chloroplasts appear to have had their evolutionary origin in symbiotic ancestors of the cyanobacteria.
ancestral anaerobe, anaerobic metabolism is inefficient because fuel is not completely oxidized, aerobic bacterium, aerobic metabolism is efficient because fuel is oxidized to CO2 → host cell with aerobic endosymbionts, the endosymbiont and the host cell share materials, to the advantage of both, cyanobacterium, uses the energy of light to synthesize cellular structures and fuels from CO2 and H2O → modern photosynthetic eukaryotic cell, nucleus, chloroplast, mitochondrion, symbiosis allows further specialization of photosynthetic membranes, cell oxidizes fuel efficiently and can obtain energy from sunlight
Several independent lines of evidence suggest that the mitochondria and chloroplasts of modern eukaryotes were derived during evolution from aerobic bacteria and cyanobacteria that took up endosymbiotic residence in early eukaryotic cells (Fig. 2–17; see also Fig. 2–7). Mitochondria are always derived from preexisting mitochondria, and chloroplasts from chloroplasts, by simple fission, just as bacteria multiply by fission. Mitochondria and chloroplasts are in fact semiautonomous; they contain DNA, ribosomes, and the enzymatic machinery to synthesize proteins encoded in their DNA. Sequences in mitochondrial DNA are strikingly similar to sequences in certain aerobic bacteria, and chloroplast DNA shows strong sequence homology with the DNA of certain cyanobacteria. The ribosomes found in mitochondria and chloroplasts are more similar in size, overall structure, and ribosomal RNA sequences to those of bacteria than to those in the cytoplasm of the eukaryotic cell. The enzymes that catalyze protein synthesis in these organelles also resemble those of the bacteria more closely.
Figure 2–17  A plausible theory for the evolutionary origin of mitochondria and chloroplasts. It is based on a number of striking biochemical and genetic similarities between certain aerobic bacteria and mitochondria, and between certain cyanobacteria and chloroplasts. During the evolution of eukaryotic cells, the invading bacteria became symbiotic with the host cell. Ultimately the cytoplasmic bacteria became the mitochondria and chloroplasts of modern cells.
If mitochondria and chloroplasts are the descendants of early bacterial endosymbionts, some of the genes present in the original free-living bacteria must have been transferred into the nuclear DNA of the host eukaryote over the course of evolution. Neither mitochondria nor chloroplasts contain all of the genes necessary to specify all of their proteins. Most of the proteins of both organelles are encoded in nuclear genes, translated on cytoplasmic ribosomes, and subsequently imported into the organelles.
Figure 2–18  The three types of cytoplasmic filaments. The upper panels show epithelial cells photographed after treatment with antibodies that bind to and specifically stain (a) actin filaments bundled together to form “stress fibers”, (b) microtubules radiating from the cell center, and (c) intermediate filaments, extending throughout the cytoplasm. For these experiments, antibodies that specifically recognize actin, tubulin, or intermediate filament proteins are covalently attached to a fluorescent compound. When the cell is viewed with a fluorescence microscope, only the stained structures are visible. The lower panels show each type of filament as visualized by electron microscopy.
Several types of protein filaments visible with the electron microscope crisscross the eukaryotic cell, forming an interlocking three-dimensional meshwork throughout the cytoplasm, the cytoskeleton. There are three general types of cytoplasmic filaments: actin filaments, microtubules, and intermediate filaments (Fig. 2–18). They differ in width (from about 6 to 22 nm), composition, and specific function, but all apparently provide structure and organization to the cytoplasm and shape to the cell. Actin filaments and microtubules also help to produce the motion of organelles or of the whole cell.
Each of the cytoskeletal components is composed of simple protein subunits that polymerize to form filaments of uniform thickness. These filaments are not permanent structures; they undergo constant disassembly into their monomeric subunits and reassembly into filaments. Their locations in cells are not rigidly fixed, but may change dramatically with mitosis, cytokinesis, or changes in cell shape. All types of filaments associate with other proteins that cross-link filaments to themselves or to other filaments, influence assembly or disassembly, or move cytoplasmic organelles along the filaments.
Figure 2–19  Individual subunits of actin polymerize to form actin filaments. The protein filamin holds two filaments together where they cross at right angles. The filaments are cross-linked by another protein, fodrin, to form side-by-side aggregates or bundles.
actin subunits, ATP → actin (thin) filaments, 6–7 nm, + fodrin, + filamin
Actin is a protein present in virtually all eukaryotes, from the protists to the vertebrates. In the presence of ATP, the monomeric protein spontaneously associates into linear, helical polymers, 6 to 7 nm in diameter, called actin filaments or microfilaments (Fig. 2–19).
The importance of actin polymerization and depolymerization is clear from the effects of cytochalasins, compounds that bind to actin and block polymerization. Cells treated with a cytochalasin lose actin filaments and their ability to carry out cytokinesis, phagocytosis, and amoeboid movement. However, chromatid separation at mitosis is not affected, ruling out an essential role for actin in this process. Compounds such as cytochalasins, which are naturally occurring poisons or specific toxins, are often very helpful in experimental studies in pinpointing the important participants in a biological process.
Cells contain proteins that bind to actin monomers or filaments and influence the state of actin aggregation (Fig. 2–19). Filamin and fodrin cross-link actin filaments to each other, stabilizing the meshwork and greatly increasing the viscosity of the medium in which the filaments are suspended; a concentrated solution of actin in the presence of filamin is a gel too viscous to pour. Large numbers of actin filaments bound to specific plasma membrane proteins lie just beneath and more or less parallel to the plasma membrane, conferring shape and rigidity on the cell surface.
Figure 2–20  Myosin molecules move along actin filaments using energy from ATP. Cytoplasmic streaming is produced in the giant green alga Nitella as myosin pulls organelles around a track of actin filaments. The chloroplasts of Nitella are located in the layer of stationary cytoplasm that lies between the actin filaments and the cell membrane.
Actin filaments bind to a family of proteins called myosins, enzymes that use the energy of ATP breakdown to move themselves along the actin filament in one direction. The simplest members of this family, such as myosin I, have a globular head and a short tail (Fig. 2–20). The
actin filament, myosin, ATP → ADP + PO43−, cytoplasmic vesicle or organelle, actin filaments, streaming cytoplasm with organelles and vesicles, vacuole, chloroplasts in stationary cytoplasm
head binds to and moves along an actin filament, driven by the breakdown of ATP. The tail region binds to the membrane of a cytoplasmic organelle, dragging the organelle behind as the myosin head moves along the actin filament. It appears likely that myosins of this type bind to various organelles, providing specific transport systems to move each type of organelle through the cytoplasm. This motion is readily seen in living cells such as the giant green alga Nitella; endoplasmic reticulum, as well as mitochondria, nucleus, and other membrane-bound organelles and vesicles, move uniformly around the cell at 50 to 75 μm/s in a process called cytoplasmic streaming (Fig. 2–20). This motion has the effect of mixing the cytoplasmic contents of the enormous algal cell much more efficiently than would occur by diffusion alone.
A larger form of myosin is found in muscle cells, and also in the cytoplasm of many nonmuscle cells. This type of myosin also has a globular head that binds to and moves along actin filaments in an ATP-driven reaction, but it has a longer tail, which permits myosin molecules to associate side by side to form thick filaments (see Fig. 7–31). Contractile systems composed of actin and myosin occur in a wide variety of organisms, from slime molds to humans. Actin–myosin complexes form the contractile ring that squeezes the cytoplasm in two during cytokinesis in all eukaryotes. In multicellular animals, muscle cells are filled with highly organized arrays of actin (thin) filaments and myosin (thick) filaments, which produce a coordinated contractile force by ATP-driven sliding of actin filaments past stationary myosin filaments.
Figure 2–21  Microtubules are formed from dimers of the proteins α- and β-tubulin. Colchicine blocks the assembly of microtubules, and can be used to arrest mitosis in cells.
tubulin subunits, β subunit, α subunit ⇌ α,β-tubulin dimers ⇌ microtubule, tubulin, α, β, 8 nm, 22 nm, colchicine blocks polymerization
Like actin filaments, microtubules form spontaneously from their monomeric subunits, but the polymeric structure of microtubules is slightly more complex. Dimers of α- and β-tubulin form linear polymers (protofilaments), 13 of which associate side by side to form the hollow microtubule, about 22 nm in diameter (Fig. 2–21). Most microtubules undergo continual polymerization and depolymerization in cells by addition of tubulin subunits primarily at one end and dissociation at the other. Microtubules are present throughout the cytoplasm, but are concentrated in specific regions at certain times. For example, when sister chromatids move to opposite poles of a dividing cell during mitosis, a highly organized array of microtubules (the mitotic spindle; Fig. 2–14) provides the framework and probably the motive force for the separation of chromatids. Colchicine, a poisonous alkaloid from meadow saffron, prevents tubulin polymerization. Colchicine treatment reversibly blocks the movement of chromatids during mitosis, demonstrating that microtubules are required for this process.
Microtubules, like actin filaments, associate with a variety of proteins that move along them, form cross-bridges, or influence their state of polymerization. Kinesin and cytoplasmic dynein, proteins found in the cytoplasm of many cells, bind to microtubules and move along them using the energy of ATP to drive their motion (Fig. 2–22). Each protein is capable of associating with specific organelles and pulling them along the microtubule over long distances at rates of about 1 μm/s. The beating motion of cilia and eukaryotic flagella also involves dynein and microtubules.
Figure 2–22  Kinesin and dynein are ATP-driven molecular engines that move along microtubular “rails”.
cytoplasmic vesicle or organelle, kinesin, ATP → ADP + PO43−, dynein, ATP → ADP + PO43−, microtubule
cilium, microtubule doublet, plasma membrane, microtubule doublet, radial spoke, dynein arms, 0.1 μm
Cilia and flagella, motile structures extending from the surface of many protists and certain cells of animals and plants, are all constructed on the same microtubule-based architectural plan (Fig. 2–23). (Although they bear the same name, the flagella of bacteria (p. 28) are completely different in structure and in action from the flagella of eukaryotes.) Eukaryotic cilia and flagella, which are sheathed in an extension of the plasma membrane, contain nine fused pairs of microtubules arranged around two central microtubules (the 9 + 2 arrangement; Fig. 2–23). Ciliary and flagellar motion results from the coordinated sliding of outer doublet microtubules relative to their neighbors, driven by ATP. The motions of cilia and flagella propel protists through their surrounding medium, in search of food, or light, or some condition essential to their survival. Sperm are also propelled by flagellar beating. Ciliated cells in tissues such as the trachea and oviduct move extracellular fluids past the surface of the ciliated tissue.
Figure 2–23  Cilia and eukaryotic flagella have the same architecture: nine microtubule doublets surround a central pair of microtubules. Cross section of cilia shows the 9 + 2 arrangement of microtubules.
The contraction of skeletal muscle, the propelling action of cilia and flagella, and the intracellular transport of organelles all rely on the same fundamental mechanism: the splitting of ATP by proteins such as kinesin, myosin, and dynein drives sliding motion along microfilaments or microtubules.
The third type of cytoplasmic filament is a family of structures with dimensions (diameter 8 to 10 nm) intermediate between actin filaments and microtubules. Several different types of monomeric protein subunits form intermediate filaments. Some cells contain large amounts of one type; some types of intermediate filament are absent from certain cells; and some cell types apparently lack intermediate filaments altogether. As is the case for actin filaments and microtubules, intermediate filament formation is reversible, and the cytoplasmic distribution of these structures is subject to regulated changes.
The function of intermediate filaments is probably to provide internal mechanical support for the cell and to position its organelles. Vimentin (Mr 57,000) is the monomeric subunit of the intermediate filaments found in the endothelial cells that line blood vessels, and in adipocytes (fat cells). Vimentin fibers appear to anchor the nucleus and fat droplets in specific cellular locations. Intermediate filaments composed of desmin (Mr 55,000) hold the Z disks of striated muscle tissue in place. Neurofilaments are constructed of three different protein subunits (Mr 70,000, 150,000, and 210,000), and provide rigidity to the long extensions (axons) of neurons. In the glial cells that surround neurons, intermediate filaments are constructed from glial fibrillary acidic protein (Mr 50,000).
The intermediate filaments composed of keratins, a family of structural proteins, are particularly prominent in certain epidermal cells of vertebrates, and form covalently cross-linked meshworks that persist even after the cell dies. Hair, fingernails, and feathers are among the structures composed primarily of keratins.
The picture that emerges from this brief survey is of a eukaryotic cell with a cytoplasm crisscrossed by a meshwork of structural fibers, throughout which extends a complex system of membrane-bounded compartments (see Fig. 2–8). Both the filaments and the organelles are dynamic: the filaments disassemble and reassemble elsewhere; membranous vesicles bud from one organelle, move to and join another. Transport vesicles, mitochondria, chloroplasts, and other organelles move through the cytoplasm along protein filaments, drawn by kinesin, cytoplasmic dynein, myosin, and perhaps other similar proteins. Exocytosis and endocytosis provide paths between the cell interior and the surrounding medium, allowing for the secretion of proteins and other components produced within the cell and the uptake of extracellular components. The intracellular membrane systems segregate specific metabolic processes, and provide surfaces on which certain enzyme-catalyzed reactions occur.
Although complex, this organization of the cytoplasm is far from random. The motion and positioning of organelles and cytoskeletal elements are under tight regulation, and at certain stages in a eukaryotic cell’s life, dramatic, finely orchestrated reorganizations occur, such as spindle formation, chromatid migration to the poles, and nuclear envelope disintegration and re-formation during mitosis. The interactions between the cytoskeleton and organelles are noncovalent, reversible, and subject to regulation in response to various intracellular and extracellular signals. Cytoskeletal rearrangements are modulated by Ca2+ and by a variety of proteins.
differential centrifugation, tissue homogenization, tissue homogenate, low-speed centrifugation (1,000 g, 10 min) → pellet contains whole cells, nuclei, cytoskeletons, plasma membranes, supernatant subjected to medium-speed centrifugation (20,000 g, 20 min) → pellet contains mitochondria, lysosomes, peroxisomes, supernatant subjected to high-speed centrifugation (80,000 g, 1 h) → pellet contains microsomes, small vesicles, supernatant subjected to very high-speed centrifugation (150,000 g, 3 h) → pellet contains ribosomes, viruses, large macromolecules, supernatant contains soluble proteins; isopycnic (sucrose-density) centrifugation, sample, centrifugation → sucrose gradient, less dense component, more dense component, fractionation
A major advance in the biochemical study of cells was the development of methods for separating organelles from the cytosol and from each other. In a typical cellular fractionation, cells or tissues are disrupted by gentle homogenization in a medium containing sucrose (about 0.2 M). This treatment ruptures the plasma membrane but leaves most of the organelles intact. (The sucrose creates a medium with an osmotic pressure similar to that within organelles; this prevents diffusion of water into the organelles, which would cause them to swell, burst, and spill their contents.)
Figure 2–24  A tissue such as liver is mechanically homogenized to break cells and disperse their contents in an aqueous buffer. The large and small particles in this suspension can be separated by centrifugation at different speeds (a), or particles of different density can be separated by isopycnic centrifugation (b). In isopycnic centrifugation, a centrifuge tube is filled with a solution, the density of which increases from top to bottom; some solute such as sucrose is dissolved at different concentrations to produce this density gradient. When a mixture of organelles is layered on top of the density gradient and the tube is centrifuged at high speed, individual organelles sediment until their buoyant density exactly matches that in the gradient. Each layer can be collected separately.
Organelles such as nuclei, mitochondria, and lysosomes differ in size and therefore sediment at different rates during centrifugation. They also differ in specific gravity, and they “float” at different levels in a density gradient (Fig. 2–24). Differential centrifugation results in a rough fractionation of the cytoplasmic contents, which may be further purified by isopycnic centrifugation. In this procedure, organelles of different buoyant densities (the result of different ratios of lipid and protein in each type of organelle) are separated on a density gradient. By carefully removing material from each region of the gradient and observing it with a microscope, the biochemist can establish the position of each organelle and obtain purified organelles for further study. In this way it was established, for example, that lysosomes contain degradative enzymes, mitochondria contain oxidative enzymes, and chloroplasts contain photosynthetic pigments. The isolation of an organelle enriched in a certain enzyme is often the first step in the purification of that enzyme.
One of the most effective approaches to understanding a biological process is to study purified individual molecules such as enzymes, nucleic acids, or structural proteins. The purified components are amenable to detailed characterization in vitro; their physical properties and catalytic activities can be studied without “interference” from other molecules present in the intact cell. Although this approach has been remarkably revealing, it must always be remembered that the inside of a
cell is quite different from the inside of a test tube. The “interfering” components eliminated by purification may be critical to the biological function or regulation of the molecule purified. In vitro studies of pure enzymes are commonly done at very low enzyme concentrations in thoroughly stirred aqueous solutions. In the cell, an enzyme is dissolved or suspended in a gel-like cytosol with thousands of other proteins, some of which bind to that enzyme and influence its activity. Within cells, some enzymes are parts of multienzyme complexes in which reactants are channeled from one enzyme to another without ever entering the bulk solvent. Diffusion is hindered in the gel-like cytosol, and the cytosolic composition varies in different regions of the cell. In short, a given molecule may function somewhat differently within the cell than it does in vitro. One of the central challenges of biochemistry is to understand the influences of cellular organization and macromolecular associations on the function of individual enzymes – to understand function in vivo as well as in vitro.
All modern unicellular eukaryotes – the protists – contain the organelles and mechanisms that we have described, indicating that these organelles and mechanisms must have evolved relatively early. The protists are extraordinarily versatile. The ciliated protist Paramecium, for example, moves rapidly through its aqueous surroundings by beating its cilia; senses mechanical, chemical, and thermal stimuli from its environment, and responds by changing its path; finds, engulfs, and digests a variety of food organisms, and excretes the indigestible fragments; eliminates excess water that leaks through its membrane; and finds and mates with sexual partners. Nonetheless, being unicellular has its disadvantages. Paramecia probably live out their lives in a very small region of the pond in which they began life, because their motility is limited by the small thrust of their microscopic cilia, and their ability to detect a better environment at a distance is limited by the short range of their sensory apparatus.
Figure 2–25  A gallery of differentiated cells. (a) Secretory cells of the pancreas, with an extensive endoplasmic reticulum. (b) Portion of a skeletal muscle cell, with organized actin and myosin filaments. (c) Collenchyma cells of a plant stem. (d) Rabbit sperm cells, with long flagella for motility. (e) Human erythrocyte. (f) Human embryo at the two-celled stage.
At some later stage of evolution, unicellular organisms found it advantageous to cluster together, thereby acquiring greater motility, efficiency, or reproductive success than their free-living single-celled competitors. Further evolution of such clustered organisms led to permanent associations among individual cells and eventually to specialization within the colony – to cellular differentiation.
The advantages of cellular specialization led to the evolution of ever more complex and highly differentiated organisms, in which some cells carried out the sensory functions, others the digestive, photosynthetic, or reproductive functions. Many modern multicellular organisms contain hundreds of different cell types, each specialized for some function that supports the entire organism. Fundamental mechanisms that evolved early have been further refined and embellished through evolution. The simple mechanism responsible for the motion of myosin along actin filaments in slime molds has been conserved and elaborated in vertebrate muscle cells, which are literally filled with actin, myosin, and associated proteins that regulate muscle contraction. The same basic structure and mechanism that underlie the beating motion of cilia in Paramecium and flagella in Chlamydomonas are employed by the highly differentiated vertebrate sperm cell. Figure 2–25 illustrates the range of cellular specializations encountered in multicellular organisms.
The individual cells of a multicellular organism remain delimited by their plasma membranes, but they have developed specialized surface structures for attachment to and communication with each other (Fig. 2–26). At tight junctions, the plasma membranes of adjacent cells are closely apposed, with no extracellular fluid separating them. Desmosomes (occurring only in plant cells) hold two cells together; the small extracellular space between them is filled with fibrous, presumably adhesive, material. Gap junctions provide small, reinforced openings between adjacent cells, through which electric currents, ions, and small molecules can pass. In higher plants, plasmodesmata form channels resembling gap junctions; they provide a path through the cell wall for the movement of small molecules between adjacent cells. Each of these junctions is reinforced by membrane proteins or cytoskeletal filaments. The type of junction(s) between neighboring cells varies from tissue to tissue.
Figure 2–26  Three types of junctions between cells. (a) Tight junctions produce a seal between adjacent cells. (b) Desmosomes, typical of plant cells, weld adjacent cells together and are reinforced by various cytoskeletal elements. (c) Gap junctions allow ions and electric currents to flow between adjacent cells.
cell 1, cell 2, plasma membrane, cytoplasm, tight junction, cytoskeletal filaments, glycoprotein filaments, desmosome, extracellular space, plasma membranes of two adjacent cells, gap junction
Viruses are supramolecular complexes that can replicate themselves in appropriate host cells. They consist of a nucleic acid (DNA or RNA) molecule surrounded by a protective shell, or capsid, made up of protein molecules and, in some cases, a membranous envelope. Viruses exist in two states. Outside the host cells that formed them, viruses are simply nonliving particles called virions, which are regular in size, shape, and composition and can be crystallized. Once a virus or its nucleic acid component gains entry into a specific host cell, it becomes an intracellular parasite. The viral nucleic acid carries the genetic message specifying the structure of the intact virion. It diverts the host cell’s enzymes and ribosomes from their normal cellular roles to the manufacture of many new daughter viral particles. As a result, hundreds of progeny viruses may arise from the single virion that infected the host cell (Fig. 2–27). In some host–virus systems, the progeny virions escape through the host cell’s plasma membrane. Other viruses cause cell lysis (membrane breakdown and host cell death) as they are released.
bacterial virus (bacteriophage) injects its DNA through cell envelope, animal virus enters host cell by endocytosis, viral genome → replication → transcription → translation → assembly and packaging, exit by breakdown of cell envelope, exit by outward budding
Figure 2–27  Infection of a bacterial cell by a bacteriophage (left), and of an animal cell by a virus (right) results in the formation of many copies of the infecting virus.
A different type of response results from some viral infections, in which viral DNA becomes integrated into the host’s chromosome and is replicated with the host’s own genes. Integrated viral genes may have little or no effect on the host’s survival, but they often cause profound changes in the host cell’s appearance and activity.
Hundreds of different viruses are known, each more or less specific for a host cell (Table 2–3), which may be an animal, plant, or bacterial cell. Viruses specific for bacteria are known as bacteriophages, or simply phages (Greek phagein, “to eat”). Some viruses contain only one kind of protein in their capsid – the tobacco mosaic virus, for example, a simple plant virus and the first to be crystallized. Other viruses contain dozens or hundreds of different kinds of proteins. Even some of these large and complex viruses have been crystallized, and their detailed molecular structures are known (Fig. 2–28). Viruses differ greatly in size. Bacteriophage ΦX174, one of the smallest, has a diameter of 18 nm. Vaccinia virus is one of the largest; its virions are almost as large as the smallest bacteria. Viruses also differ in shape and complexity of structure. The human immunodeficiency virus (HIV) (Fig. 2–29) is relatively simple in structure, but devastating in action; it causes AIDS.
Table 2–3 summarizes the type and size of the nucleic acid components of a number of viruses. Some viruses are highly pathogenic in humans; for example, those causing poliomyelitis, influenza, herpes, hepatitis, AIDS, the common cold, infectious mononucleosis, shingles, and certain types of cancer.
Biochemistry has profited enormously from the study of viruses, which has provided new information about the structure of the genome, the enzymatic mechanisms of nucleic acid synthesis, and the regulation of the flow of genetic information.
Figure 2–28  The structures of several viruses, viewed with the electron microscope. Turnip yellow mosaic virus (small, spherical particles), tobacco mosaic virus (long cylinders), and bacteriophage T4 (shaped like a hand mirror).
Table 2–3, Some well-studied animal viruses / Virus, Known hosts, Genomic material, Genome size (kilobases)* / Adenoviruses, Vertebrates, DNA, 36 / SV40, Primates, DNA, 5 / Herpes, Vertebrates, DNA, 150 / Vaccinia, Vertebrates, DNA, 200 / Parvoviruses, Vertebrates, DNA, 1–2 / Retroviruses, Vertebrates and (?), RNA/DNA, 5–8 / Reoviruses, Vertebrates, RNA, 1.2–4.0† / Influenza, Mammals, RNA, 1.0–3.3† / Vesicular stomatitis, Vertebrates, RNA, 12 / Sindbis, Insects and vertebrates, RNA, 10 / Poliomyelitis, Primates, RNA, 7 / Human immunodeficiency (HIV), Primates, RNA, 9.7 / * Size is given in kilobases (1 kilobase = 1,000 nucleotides) for single-stranded nucleic acids, or kilobase pairs for double-stranded molecules. / † Reoviruses have ten double-stranded RNA segments, and influenza has eight single-stranded RNA segments; the length of each segment is in the range indicated.
Figure 2–29  Human immunodeficiency viruses (HIV), the causative agent of AIDS, leaving an infected T lymphocyte of the immune system.
Source: From Darnell, J., Lodish, H., & Baltimore D. (1990) Molecular Cell Biology, 2nd edn, p. 183, Scientific American Books, Inc., New York.
Cells, the structural and functional units of living organisms, are of microscopic dimensions. Their small size, combined with convolutions of their surfaces, results in high surface-to-volume ratios, facilitating the diffusion of fuels, nutrients, and waste products between the cell and its surroundings. All cells share certain features: DNA containing the genetic information, ribosomes, and a plasma membrane that surrounds the cytoplasm. In eukaryotes the genetic material is surrounded by a nuclear envelope; prokaryotes have no such membrane.
The plasma membrane is a tough, flexible permeability barrier, which contains numerous transporters as well as receptors for a variety of extracellular signals. The cytoplasm consists of the cytosol and organelles. The cytosol is a concentrated solution of proteins, RNA, metabolic intermediates and cofactors, and inorganic ions, in which are suspended various particles. Ribosomes are supramolecular complexes on which protein synthesis occurs; bacterial ribosomes are slightly smaller than those of eukaryotic cells, but are similar in structure and function.
Certain organisms, tissues, and cells offer advantages for biochemical studies. E. coli and yeast can be cultured in large quantities, have short generation times, and are especially amenable to genetic manipulation. The specialized functions of liver, muscle, and fat tissue, and of erythrocytes, make them attractive for the study of specific processes.
The first living cells were prokaryotic and anaerobic; they probably arose about 3.5 billion years ago, when the atmosphere was devoid of oxygen. With the passage of time, biological evolution led to cells capable of photosynthesis, with O2 as a byproduct. As O2 accumulated, prokaryotic cells capable of the aerobic oxidation of fuels evolved. The two major groups of bacteria, eubacteria and archaebacteria, diverged early in evolution. The cell envelope of some types of bacteria includes layers outside the plasma membrane that provide rigidity or protection. Some bacteria have flagella for propulsion. The cytoplasm of bacteria contains no membrane-bounded organelles but does contain ribosomes and granules of nutrients, as well as a nucleoid which contains the cell’s DNA. Some photosynthetic bacteria have extensive intracellular membranes that contain light-capturing pigments.
About 1.5 billion years ago, eukaryotic cells emerged. They were larger than bacteria, and their genetic material was more complex. These early cells established symbiotic relationships with prokaryotes that lived
in their cytoplasm; modern mitochondria and chloroplasts are derived from these early endosymbionts. Mitochondria and chloroplasts are intracellular organelles surrounded by a double membrane. They are the principal sites of ATP synthesis in eukaryotic, aerobic cells. Chloroplasts are found only in photosynthetic organisms, but mitochondria are ubiquitous among eukaryotes.
Modern eukaryotic cells have a complex system of intracellular membranes. This endomembrane system consists of the nuclear envelope, rough and smooth endoplasmic reticulum, the Golgi complex, transport vesicles, lysosomes, and endosomes. Proteins synthesized on ribosomes bound to the rough endoplasmic reticulum pass into the endomembrane system, traveling through the Golgi complex on their way to organelles or to the cell surface, where they are secreted by exocytosis. Endocytosis brings extracellular materials into the cell, where they can be digested by degradative enzymes in the lysosomes. In plants, the central vacuole is the site of degradative processes; it also serves as a storage depot for a variety of side products of metabolism and maintains cell turgor.
The genetic material in eukaryotic cells is organized into chromosomes, highly ordered complexes of DNA and histone proteins. Before cell division (cytokinesis), each chromosome is replicated, and the duplicate chromosomes are separated by the process of mitosis.
The cytoskeleton is an intracellular meshwork of actin filaments, microtubules, and intermediate filaments of several types. The cytoskeleton confers shape on the cell, and reorganization of cytoskeletal filaments results in the shape changes accompanying amoeboid movement and cell division. Intracellular organelles move along filaments of the cytoskeleton, propelled by proteins such as dynein, kinesin, and myosin, using the energy of ATP. Dynein and tubulin are central to the motion and structure of cilia and flagella, and myosin and actin are responsible for the contractile motion of skeletal muscle. The organelles can be separated by differential centrifugation and by isopycnic centrifugation.
In multicellular organisms, there is a division of labor among several types of cells. Individual cells are joined to each other by tight junctions or gap junctions, and (in plants) desmosomes or plasmodesmata. Viruses are parasites of living cells, capable of subverting the cellular machinery for their own replication.
Further Reading
Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K., & Watson, J.D. (1989) Molecular Biology of the Cell, 2nd edn, Garland Publishing, Inc., New York. 
A superb textbook on cell structure and function, covering the topics considered in this chapter, and a useful reference for many of the following chapters.
Becker, W.M. & Deamer, D.W. (1991) The World of the Cell, 2nd edn, The Benjamin/Cummings Publishing Company, Redwood City, CA. 
An excellent introductory textbook of cell biology.
Curtis, H. & Barnes, N.S. (1989) Biology, 5th edn, Worth Publishers, Inc., New York. 
A beautifully written and illustrated general biology textbook.
Darnell, J., Lodish, H., & Baltimore, D. (1990) Molecular Cell Biology, 2nd edn, Scientific American Books, Inc., New York. 
Like the book by Alberts and coauthors, a superb text useful for this and later chapters.
Prescott, D.M. (1988) Cells, Jones and Bartlett Publishers, Boston, MA. 
A short, well-illustrated introductory textbook on cell structure and function, with emphasis on structure.
Evolution of Cells
Evolution of Catalytic Function. (1987) Cold Spring Harb. Symp. Quant. Biol. 52. 
A collection of excellent papers on many aspects of molecular and cellular evolution.
Knoll, A.H. (1991) End of the proterozoic eon. Sci. Am. 265 (October), 64–73. 
Discussion of the evidence that an increase in atmospheric oxygen led to the development of multi-cellular organisms, including large animals.
Margulis, L. (1992) Symbiosis in Cell Euolution. Microbial Evolution in the Archean and Proterozoic Eons, 2nd edn, W.H. Freeman and Company, New York. 
Clear discussion of the hypothesis that mitochondria and chloroplasts are descendants of bacteria that became symbiotic with primitive eukaryotic cells.
Schopf, J.W. (1978) The evolution of the earliest cells. Sci. Am. 239 (September), 110–139. 
Vidal, G. (1984) The oldest eukaryotic cell. Sci. Am. 250 (February), 48–57. 
Structure of Cells and Organelles
Bloom, W. & Fawcett, D.W. (1986) A Textbook of Histology, 11th edn, W.B. Saunders Company, Philadelphia, PA. 
A standard textbook, containing detailed descriptions of the structures of animal cells, tissues, and organs.
de Duve, C. (1984) A Guided Tour of the Living Cell, Scientific American Books, Inc., New York. 
An easy-to-read, well-illustrated description of the structure and functions of the organelles of the eukaryotic cell.
Margulis, L. & Schwartz, K.V. (1987) Five Kingdoms: An Illustrated Guide to the Phyla of Life on Earth, 2nd edn, W.H. Freeman and Company, New York. 
Description of unicellular and multicellular organisms, beautifully illustrated with electron micrographs and drawings showing the diversity of structure and function.
Rothman, J.E. (1985) The compartmental organization of the Golgi apparatus. Sci. Am. 253 (September), 74–89. 
Gelfand, V. & Bershadsky, A.D. (1991) Microtubule dynamics: mechanism, regulation, and function. Annu. Rev. Cell Biol. 7, (September), 93–116. 
Organization of the Cytoplasm. (1981) Cold Spring Harb. Symp. Quant. Biol. 46. 
More than 90 excellent papers on microtubules, microfilaments, and intermediate filaments and their biological roles.
Schroer, T.A. & Sheetz, M.P. (1991) Functions of microtubule-based motors. Annu. Rev. Physiol. 53, 629–652. 
Steinert, P.M. & Parry, D.A.D. (1985) Intermediate filaments: conformity and diversity of expression and structure. Annu. Rev. Cell Biol. 1, 41–65. 
Stossel, T.P. (1989) From signal to pseudopod: how cells control cytoplasmic actin assembly. J. Biol. Chem. 264, 18261–18264. 
Vale, R.D. (1990) Microtubule-based motor proteins. Curr. Opinion Cell Biol. 2, 15–22. 
Vallee, R.B. & Shpetner, H.S. (1990) Motor proteins of cytoplasmic microtubules. Annu. Rev. Biochem. 59, 909–932. 
Some problems on the contents of Chapter 2 follow. They involve simple geometrical and numerical relationships concerning cell structure and activities. (For your reference in solving these problems, please see the tables printed on the inside of the back cover.) Each problem has a title for easy reference and discussion.
1. The Size of Cells and Their Components  Given their approximate diameters, calculate the approximate number of (a) hepatocytes (diameter 20 μm), (b) mitochondria (1.5 μm), and (c) actin molecules (3.6 nm) that can be placed in a single layer on the head of a pin (diameter 0.5 mm). Assume each structure is spherical. The area of a circle is πr2, where π = 3.14.
2. Number of Solute Molecules in the Smallest Known Cells  Mycoplasmas are the smallest known cells. They are spherical and have a diameter of about 0.33 μm. Because of their small size they readily pass through filters designed to trap larger bacteria. One species, Mycoplasma pneumoniae, is the causative organism of the disease primary atypical pneumonia.
       (a) D-Glucose is the major energy-yielding nutrient of mycoplasma cells. Its concentration within such cells is about 1.0 mM. Calculate the number of glucose molecules in a single mycoplasma cell. Avogadro’s number, the number of molecules in 1 mol of a nonionized substance, is 6.02 × 1023. The volume of a sphere is 4πr3/3.
       (b) The first enzyme required for the energy-yielding metabolism of glucose is hexokinase (Mr 100,000). Given that the intracellular fluid of mycoplasma cells contains 10 g of hexokinase per liter, calculate the molar concentration of hexokinase.
3. Components of E. coli  E. coli cells are rod-shaped, about 2 μm long and 0.8 μm in diameter. The volume of a cylinder is πr2h, where h is the height of the cylinder.
       (a) If the average density of E. coli (mostly water) is 1.1 × 103 g/L, what is the weight of a single cell?
       (b) The protective cell wall of E. coli is 10 nm thick. What percentage of the total volume of the bacterium does the wall occupy?
       (c) E. coli is capable of growing and multiplying rapidly because of the inclusion of some 15,000 spherical ribosomes (diameter 18 nm) in each cell, which carry out protein synthesis. What percentage of the total cell volume do the ribosomes occupy?
4. Genetic Information in E. coli DNA  The genetic information contained in DNA consists of a linear sequence of successive code words, known as codons. Each codon is a specific sequence of three nucleotides (three nucleotide pairs in double-stranded DNA), and each codon codes for a single amino acid unit in a protein. The molecular weight of an E. coli DNA molecule is about 2.5 × 109. The average molecular weight of a nucleotide pair is 660, and each nucleotide pair contributes 0.34 nm to the length of DNA.
       (a) Calculate the length of an E. coli DNA molecule. Compare the length of the DNA molecule with the actual cell dimensions. How does the DNA molecule fit into the cell?
       (b) Assume that the average protein in E. coli consists of a chain of 400 amino acids. What is the maximum number of proteins that can be coded by an E. coli DNA molecule?
5. The High Rate of Bacterial Metabolism  Bacterial cells have a much higher rate of metabolism than animal cells. Under ideal conditions some bacteria will double in size and divide in 20 min, whereas most animal cells require 24 h. The high rate of bacterial metabolism requires a high ratio of surface area to cell volume.
       (a) Why would the surface-to-volume ratio have an effect on the maximum rate of metabolism?
       (b) Calculate the surface-to-volume ratio for the spherical bacterium Neisseria gonorrhoeae (diameter 0.5 μm), responsible for the disease gonorrhea. Compare it with the surface-to-volume ratio for globular amoeba, a large eukaryotic cell of diameter 150 μm. The surface area of a sphere is 4πr2.
6. A Strategy to Increase the Surface Area of Cells  Certain cells whose function is to absorb nutrients, e.g., the cells lining the small intestine or the root hair cells of a plant, are optimally adapted to their role because their exposed surface area is increased by microvilli. Consider a spherical epithelial cell (diameter 20 μm) lining the small intestine. Since only a part of the cell surface faces the interior of the intestine, assume that a “patch” corresponding to 25% of the cell area is covered with microvilli. Furthermore, assume that the microvilli are cylinders 0.1 μm in diameter, 1.0 μm long, and spaced in a regular grid 0.2 μm on center.
       (a) Calculate the number of microvilli on the patch.
       (b) Calculate the surface area of the patch, assuming it has no microvilli.
       (c) Calculate the surface area of the patch, assuming it does have microvilli.
       (d) What percentage improvement of the absorptive capacity (reflected by the surface-to-volume ratio) does the presence of microvilli provide?
7. Fast Axonal Transport  Some neurons have long, thin extensions (axons) as long as 2 m. Small membrane vesicles carrying materials essential to axonal function move along microtubules from the cell body to the tip of the axon by kinesin-dependent “fast axonal transport”. If the average velocity of a vesicle is 1 μm/s, how long does it take a vesicle to move the 2 m from cell body to axonal tip? What are the possible advantages of this ATP-dependent process over simple diffusion to move materials to the axonal tip?
8. Toxic Effects of Phalloidin  Phalloidin is a toxin produced by the mushroom Amanita phalloides. It binds specifically to actin microfilaments and blocks their disassembly. Cytochalasin B is another toxin, which blocks microfilament assembly from actin monomers (see p. 42).
       (a) Predict the effect of phalloidin on cytokinesis, phagocytosis, and amoeboid movement, given the effects of cytochalasins on these processes.
       (b) A specific antibody (a protein of Mr ≈ 150,000) binds actin tightly and is found to block microfilament assembly in vitro (in the test tube). Would you expect this antibody to mimic the effects of cytochalasin in vivo (in living cells)?
9. Osmotic Breakage of Organelles  In the isolation of cytosolic enzymes, cells are often broken in the presence of 0.2 M sucrose to prevent osmotic swelling and bursting of the intracellular organelles. If the desired enzymes are in the cytosol, why is it necessary to be concerned about possible damage to particulate organelles?
Chapter 3
The chemical composition of living material, such as this jellyfish, differs from that of its physical environment, which for this organism is salt water.
Biochemistry aims to explain biological form and function in chemical terms. One of the most fruitful approaches to understanding biological phenomena has been to purify an individual chemical component, such as a protein, from a living organism and to characterize its chemical structure or catalytic activity. As we begin the study of biomolecules and their interactions, some basic questions deserve attention. What chemical elements are found in cells? What kinds of molecules are present in living matter? In what proportions do they occur? How did they come to be there? In what ways are the kinds of molecules found in living cells especially suited to their roles?
We review here some of the chemical principles that govern the properties of biological molecules: the covalent bonding of carbon with itself and with other elements, the functional groups that occur in common biological molecules, the three-dimensional structure and stereochemistry of carbon compounds, and the common classes of chemical reactions that occur in living organisms. Next, we discuss the monomeric units and the contribution of entropy to the free-energy changes of reactions in which these units are polymerized to form macromolecules. Finally, we consider the origin of the monomeric units from simple compounds in the earth’s atmosphere during prebiological times – that is, chemical evolution.
By the beginning of the nineteenth century, it had become clear to chemists that the composition of living matter is strikingly different from that of the inanimate world. Antoine Lavoisier (1743–1794) noted the relative chemical simplicity of the “mineral world”, and contrasted it with the complexity of the “plant and animal worlds”; the latter, he knew, were composed of compounds rich in the elements carbon, oxygen, nitrogen, and phosphorus. The development of organic chemistry preceded, and provided invaluable insights for, the development of biochemistry.
We will briefly review some fundamental concepts of organic chemistry: the nature of bonding between atoms of carbon and of hydrogen, oxygen, and nitrogen; the functional groups that result from these combinations; and the diversity of organic compounds that are derived from these elements.
Figure 3–1  Elements essential to animal life and health. Bulk elements (shaded orange) are structural components of cells and tissues and are required in the diet in gram quantities daily. For trace elements (shaded yellow), the requirements are much smaller: for humans, a few milligrams per day of Fe, Cu, and Zn, even less of the others. The elemental requirements for plants and microorganisms are very similar to those shown here.
Only about 30 of the more than 90 naturally occurring chemical elements are essential to living organisms. Most of the elements in living matter have relatively low atomic numbers; only five have atomic numbers above that of selenium, 34 (Fig. 3–1). The four most abundant elements in living organisms, in terms of the percentage of the total number of atoms, are hydrogen, oxygen, nitrogen, and carbon, which together make up over 99% of the mass of most cells. They are the lightest elements capable of forming one, two, three, and four bonds, respectively (Fig. 3–2). In general, the lightest elements form the strongest bonds. Six of the eight most abundant elements in the
H, He, Li, Be, B, C, N, O, F, Ne, Na, Mg, Al, Si, P, S, Cl, Ar, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, As, Se, Br, Kr, Rb, Sr, Y, Zr, Nb, Mo, Tc, Ru, Rh, Pd, Ag, Cd, In, Sn, Sb, Te, I, Xe, Cs, Ba, lanthanides, Hf, Ta, W, Re, Os, Ir, Pt, Au, Hg, Tl, Pb, Bi, Po, At, Rn, Fr, Ra, actinides
Figure 3–2  Covalent bonding. Two atoms with unpaired electrons in their outer shells can form covalent bonds with each other by sharing electron pairs. Atoms participating in covalent bonding tend to fill their outer electron shells.
atom, number of unpaired electrons (in red), number of electrons in complete outer shell, dihydrogen, water, ammonia, methane, sulfur dioxide, phosphoric acid
human body are also among the nine most abundant elements in seawater (Table 3–1), and several of the elements abundant in humans are components of the atmosphere and were probably present in the atmosphere before the appearance of life on earth. Primitive seawater was most likely the liquid medium in which living organisms first arose, and the primitive atmosphere was probably a source of methane, ammonia, water, and hydrogen, the starting materials for the evolution of life. The trace elements (Fig. 3–1) represent a miniscule fraction of the weight of the human body, but all are absolutely essential to life, usually because they are essential to the function of specific enzymes (Table 3–2).
Table 3–1, Elemental abundance in seawater, the human body, and the earth’s crust* / Seawater (%), Human body (%), Earth’s crust (%) / H, 66, H, 63, O, 47 / O, 33, O, 25.5, Si, 28 / Cl, 0.33, C, 9.5, Al, 7.9 / Na, 0.28, N, 1.4, Fe, 4.5 / Mg, 0.033, Ca, 0.31, Ca, 3.5 / S, 0.017, P, 0.22, Na, 2.5 / Ca, 0.0062, Cl, 0.08, K, 2.5 / K, 0.0060, K, 0.06, Mg, 2.2 / C, 0.0014 / * Values are given as percentage of total number of atoms.
Table 3–2, The biological functions of some trace elements / Element, Example of biological function / Fe, Electron carrier in oxidation–reduction reactions / Cu, Component of mitochondrial oxidase / Mn, Cofactor of the enzyme arginase and other enzymes / Zn, Cofactor of dehydrogenases / Co, Component of vitamin B12 / Mo, Component of N2-fixing enzyme / Se, Component of the enzyme glutathione peroxidase / V, Cofactor of the enzyme nitrate reductase / Ni, Cofactor of the enzyme urease / I, Component of thyroid hormone / Mg, Cofactor in photosynthesis
Figure 3–3  Versatility of carbon in forming covalent single, double, and triple bonds (in red), particularly between carbon atoms. Triple bonds occur only rarely in biomolecules.
The chemistry of living organisms is organized around the element carbon, which accounts for more than one-half the dry weight of cells. In methane (CH4), a carbon atom shares four electron pairs with four hydrogen atoms; each of the shared electron pairs forms a single bond. Carbon can also form single and double bonds to oxygen and nitrogen atoms (Fig. 3–3). Of greatest significance in biology is the ability of carbon atoms to share electron pairs with each other to form very stable carbon–carbon single bonds. Each carbon atom can form single bonds with one, two, three, or four other carbon atoms. Two carbon atoms also can share two (or three) electron pairs, thus forming carbon–carbon double (or triple) bonds (Fig. 3–3). Covalently linked carbon atoms can form linear chains, branched chains, and cyclic and cagelike structures. To these carbon skeletons are added groups of other atoms, called functional groups, which confer specific chemical properties on the molecule. Molecules containing covalently bonded carbon backbones are called organic compounds; they occur in an almost limitless variety. Most biomolecules are organic compounds; we can therefore infer that the bonding versatility of carbon was a major factor in the selection of carbon compounds for the molecular machinery of cells during the origin and evolution of living organisms.
The four covalent single bonds that can be formed by a carbon atom are arranged tetrahedrally, with an angle of about 109.5° between any two bonds (Fig. 3–4) and an average length of 0.154 nm. There is free rotation around each carbon–carbon single bond unless very large or highly charged groups are attached to both carbon atoms, in which case rotation may be restricted. A carbon–carbon double bond is shorter (about 0.134 nm long) and rigid and allows little rotation about its axis (Fig. 3–4). No other chemical element can form molecules of such widely different sizes and shapes or with such a variety of functional groups.
Figure 3–4  (a) Carbon atoms have a characteristic tetrahedral arrangement of their four single bonds, which are about 0.154 nm long and at an angle of 109.5° to each other. (b) Carbon–carbon single bonds have freedom of rotation, shown for the compound ethane (CH3–CH3). (c) Carbon–carbon double bonds are shorter and do not allow free rotation. The single bonds on each doubly bonded carbon make an angle of 120° with each other. The two doubly bonded carbons and the atoms designated A, B, X, and Y all lie in the same rigid plane.
Most biomolecules can be regarded as derivatives of hydrocarbons, compounds with a covalently linked carbon backbone to which only hydrogen atoms are bonded. The backbones of hydrocarbons are very stable. The hydrogen atoms may be replaced by a variety of functional groups to yield different families of organic compounds. Typical families of organic compounds are the alcohols, which have one or more hydroxyl groups; amines, which have amino groups; aldehydes and ketones, which have carbonyl groups; and carboxylic acids, which have carboxyl groups (Fig. 3–5).
Figure 3–5  Some functional groups frequently encountered in biomolecules. All groups are shown in their uncharged (un-ionized) form.
hydroxyl, carbonyl (aldehyde), carbonyl (ketone), carboxyl, methyl, ethyl, phenyl, ester, ether, amino, amido, guanidino, imidazole, sulfhydryl, disulfide, phosphoryl
Many biomolecules are polyfunctional, containing two or more different kinds of functional groups (Fig. 3–6), each with its own chemical characteristics and reactions. Amino acids, an important family of molecules that serve primarily as monomeric subunits of proteins, contain at least two different kinds of functional groups: an amino group and a carboxyl group, as shown for histidine in Figure 3–6. The ability of an amino acid to condense (see Fig. 3–14e) with other amino acids to form proteins is dependent on the chemical properties of these two functional groups.
Figure 3–6  Representative biomolecules with multiple functional groups. Note that secondary (s) and tertiary (t) amino groups have, respectively, one and two of their amino hidrogens replaced by other groups.
histidine, amino, carboxyl, imidazole; epinephrine, methyl, s-amino, alcohol, phenyl, hydroxyls; cocaine, methyl, t-amino, methyl ester, ester, phenyl; coenzyme A, sulfhydryl, amido, amido, hydroxyl, methyl, methyl, diphosphoryl, phosphoryl, imidazole, amino
Although the covalent bonds and functional groups of biomolecules are central to their function, they do not tell the whole story. The arrangement in three-dimensional space of the atoms of a biomolecule is also crucially important. Compounds of carbon can often exist in two or more chemically indistinguishable three-dimensional forms, only one of which is biologically active. This specificity for one particular molecular configuration is a universal feature of biological interactions. All biochemistry is three-dimensional.
Figure 3–7  Models of the structure of the amino acid alanine. (a) Structural formula in perspective form. The symbol ◅ represents a bond in which the atom at the wide end projects out of the plane of the paper, toward the reader; dashes represent a bond extending behind the plane of the paper. (b) Ball-and-stick model, showing relative bond lengths and the bond angles. The balls indicate the approximate size of the atomic nuclei. (c) Space-filling model, in which each atom is shown having its correct van der Waals radius (see Table 3–3).
Table 3–3, van der Waals radii and covalent (single-bond) radii of some elements* / Element, van der Waals radius (nm), Covalent radius for single bond (nm) / H, 0.1, 0.030 / O, 0.14, 0.074 / F, 0.14, 0.071 / N, 0.15, 0.073 / C, 0.17, 0.077 / S, 0.18, 0.103 / Cl, 0.18, 0.099 / P, 0.19, 0.110 / Br, 0.20, 0.114 / I, 0.22, 0.133 / * The van der Waals radius is about twice the covalent radius for each element. The distance between nuclei in a van der Waals interaction or a covalent bond is about equal to the sum of the values for the two atoms. Thus the length of a carbon–carbon sigle bond is about 0.077 + 0.077 = 0.154 nm.
Figure 3–8  Complementary fit of a substrate molecule to the active or catalytic site on an enzyme molecule. The enzyme shown here is chymotrypsin, an enzyme that acts in the intestine to degrade dietary protein. Its substrate (shown in red) fits into a groove at the active site of the enzyme.
Biomolecules have characteristic sizes and three-dimensional structures, which derive from their backbone structures and their substituent functional groups. Figure 3–7 shows three ways to illustrate the three-dimensional structures of molecules. The perspective diagram specifies unambiguously the three-dimensional structure (stereochemistry) of a compound. Bond angles and center-to-center bond lengths are best represented with ball-and-stick models, whereas the outer contours of molecules are better represented by space-filling models. In space-filling models, the radius of each atom is proportional to its van der Waals radius (Table 3–3), and the contours of the molecule represent the outer limits of the region from which atoms of other molecules are excluded.
The three-dimensional conformation of biomolecules is of the utmost importance in their interactions; for example, in the binding of a substrate (reactant) to the catalytic site of an enzyme (Fig. 3–8), the two molecules must fit each other closely, in a complementary fashion, for biological function. Such complementarity also is required in the binding of a hormone molecule to its receptor on a cell surface, or in the recognition of an antigen by a specific antibody.
The study of the three-dimensional structure of biomolecules with precise physical methods is an important part of modern research on cell structure and biochemical function. The most informative method is x-ray crystallography. If a compound can be crystallized, the diffraction of x rays by the crystals can be used to determine with great precision the position of every atom in the molecule relative to every other atom. The structures of most small biomolecules (those with less than about 50 atoms), and of many larger molecules such as proteins, have been deduced by this means. X-ray crystallography yields a static picture of the molecule within the confines of the crystal. However, biomolecules almost never exist within cells as crystals; rather, they are dissolved in the cytosol or associated with some other component(s) of the cell. Molecules have more freedom of intramolecular motion in solution than in a crystal. In large molecules such as proteins, the small variations allowed in the three-dimensional structures of their monomeric subunits add up to extensive flexibility. Techniques such as nuclear magnetic resonance (NMR) spectroscopy complement x-ray crystallography by providing information about the three-dimensional structure of biomolecules in solution. Knowledge of the detailed three-dimensional structure of a molecule often sheds light on the mechanisms of the reactions in which the molecule participates.
Figure 3–9  Molecular asymmetry: chiral and achiral molecules. (a) When a carbon atom has four different substituent groups (A, B, X, Y), they can be arranged in two ways that represent nonsuperimposable mirror images of each other (enantiomers). Such a carbon atom is asymmetric and is called a chiral atom or chiral center. (b) When there are only three dissimilar groups around the carbon atom (i.e., the same group occurs twice), only one configuration in space is possible and the molecule is symmetric, or achiral. In this case the molecule is superimposable on its mirror image: the molecule on the left can be rotated counterclockwise (when looking down its vertical bond from A to C) to create the molecule on the right.
The tetrahedral arrangement of single bonds around a carbon atom confers on some organic compounds another property of great importance in biology. When four different atoms or functional groups are bonded to a carbon atom in an organic molecule, the carbon atom is said to be asymmetric; it can exist in two different isomeric forms (stereoisomers) that have different configurations in space. A special class of stereoisomers, called enantiomers, are nonsuperimposable mirror images of each other (Fig. 3–9). The two enantiomers of a compound have identical chemical properties, but differ in a characteristic physical property, the ability to rotate the plane of plane-polarized light. A solution of one enantiomer rotates the plane of such light to the right, and a solution of the other, to the left. Compounds without an asymmetric carbon atom do not rotate the plane of plane-polarized light.
original molecule, mirror image of original molecule, chiral molecule: rotated molecule cannot be superimposed on mirror image of original; original molecule, mirror image of original molecule, achiral molecule: rotated molecule can be superimposed on mirror image of original
Louis Pasteur, in 1843, was the first to arrive at the correct explanation for this phenomenon of optical activity. Investigating the crystalline material that accumulated in wine casks (“paratartaric acid”, also called racemic acid, from Latin racemus, “grape”), he had used a fine forceps to separate two types of crystals identical in shape, but mirror images of each other (Fig. 3–10). Both proved to have all of the chemical properties of tartaric acid, but one type rotated polarized light to the left, the other, to the right, but to the same extent. He later described the experiment and its interpretation:

In isomeric bodies, the elements and the proportions in which they are combined are the same, only the arrangement of the atoms is different. . . . We know, on the one hand, that the molecular arrangements of the two tartaric acids are asymmetric, and, on the other hand, that these arrangements are absolutely identical, excepting that they exhibit asymmetry in opposite directions. Are the atoms of the dextro acid grouped in the form of a right-handed spiral, or are they placed at the apex of an irregular tetrahedron, or are they disposed according to this or that asymmetric arrangement? We do not know.*

* From Pasteur’s lecture to the Société Chimique de Paris in 1883, quoted in DuBos, R. (1976) Louis Pasteur: Free Lance of Science, p. 95, Charles Scribner’s Sons, New York.
2R,3R-tartaric acid (dextrorotatory), 2S,3S-tartaric acid (levorotatory)
Figure 3–10  Pasteur separated crystals of two stereoisomers of tartaric acid and showed that solutions of the separated forms each rotated polarized light to the same extent but in opposite directions. Pasteur’s dextrorotatory and levorotatory forms were later shown to be the R,R and S,S isomers shown here. For compounds with more than one chiral center, the RS system of nomenclature is often more useful than the D and L system described in Chapter 5. In the RS system, each group attached to a chiral carbon is assigned a priority. The priorities of some common substituents are: –OCH2 > –OH > –NH2 > –COOH > –CHO > –CH2OH > –CH3 > –H. The chiral carbon atom is viewed with the group of lowest priority pointing away from the viewer. If the priority of the other three groups decreases in counterclockwise order, the configuration is S; if in clockwise order, R. In this way each chiral carbon is designated as either R or S, and the inclusion of these designations in the name of the compound provides an unambiguous description of the stereochemistry at each chiral center.
Now we do know. X-ray crystallographic studies in 1951 confirmed that the levorotatory and dextrorotatory forms of tartaric acid are mirror images of each other, and established the absolute configuration of each (Fig. 3–10). The same approach has been used to demonstrate that the amino acid alanine exists in two enantiomeric forms (Chapter 5). The central carbon atom of the alanine molecule is bonded to four different substituent groups: a methyl group, an amino group, a carboxyl group, and a hydrogen atom. The two stereoisomers of alanine are nonsuperimposable mirror images of each other, and thus are enantiomers.
Compounds with asymmetric carbon atoms can be regarded as occurring in left- and right-handed forms, and are therefore called chiral compounds (Greek chiros, “hand”). Correspondingly, the asymmetric atom or center of chiral compounds is called the chiral atom or chiral center (Fig. 3–9). All but one of the 20 amino acids have chiral centers; glycine is the exception.
More generally, variations in the three-dimensional structure of biomolecules are described in terms of configuration and conformation. These terms are not synonyms. Configuration denotes the spatial arrangement of an organic molecule that is conferred by the presence of either (1) double bonds, around which there is no freedom of rotation, or (2) chiral centers, around which substituent groups are arranged in a specific sequence. The identifying characteristic of configurational isomers is that they cannot be interconverted without breaking one or more covalent bonds.
maleic acid (cis), fumaric acid (trans); 11-cis-retinal, light → all-trans-retinal
Figure 3–11a shows the configurations of maleic acid, which occurs in some plants, and its isomer fumaric acid, an intermediate in sugar metabolism. These compounds are geometric or cis–trans isomers; they differ in the arrangement of their substituent groups with respect to the nonrotating double bond. Maleic acid is the cis isomer and fumaric acid the trans isomer; each is a well-defined compound that can be isolated and purified. These two compounds are stereoisomers but not enantiomers; they are not mirror images of each other.
Figure 3–11  Configurations of stereoisomers. (a) Isomers such as maleic acid and fumaric acid cannot be interconverted without breaking covalent bonds, which requires the input of much energy. (b) In the vertebrate retina, the initial event in light detection is the absorption of visible light by 11-cis-retinal. The energy of the absorbed light (about 250 kJ/mol) converts 11-cis-retinal to all-trans-retinal, triggering electrical changes in the retinal cell that lead to a nerve impulse.
Louis Pasteur
Molecular conformation refers to the spatial arrangement of substituent groups that are free to assume different positions in space, without breaking any bonds, because of the freedom of bond rotation. In the simple hydrocarbon ethane, for example, there is nearly complete freedom of rotation around the carbon–carbon single bond. Many different, interconvertible conformations of the ethane molecule are therefore possible, depending upon the degree of rotation (Fig. 3–12). Two conformations are of special interest: the staggered conformation, which is more stable than all others and thus predominates, and the eclipsed form, which is least stable. It is not possible to isolate either of these conformational forms, because they are freely interconvertible and in equilibrium with each other. However, when one or more of the hydrogen atoms on each carbon is replaced by a functional group that is either very large or electrically charged, freedom of rotation around the carbon–carbon single bond is hindered. This limits the number of stable conformations of the ethane derivative.
potential energy (kJ/mol), 12.1 kJ/mol, torsion angle (degrees), eclipsed, staggered
Figure 3–12  Many conformations of ethane are possible because of freedom of rotation around the carbon–carbon single bond. When the front carbon atom (as viewed by the reader) and its three attached hydrogens are rotated relative to the rear carbon atom, the potential energy of the molecule rises in the fully eclipsed conformation (torsion angle 0°, 120°, etc.), then falls in the fully staggered conformation (torsion angle 60°, 180°, etc.). The energy differences are small enough to allow rapid interconversion of the two forms (millions of times per second), thus the eclipsed and staggered forms cannot be isolated separately.
Many biomolecules besides amino acids are chiral, containing one or more asymmetric carbon atoms. The chiral molecules in living organisms are usually present in only one of their chiral forms. For example, the amino acids occur in proteins only as the L isomers. Glucose, the monomeric subunit of starch, has five asymmetric carbons, but occurs biologically in only one of its chiral forms, the D isomer. (The conventions for naming stereoisomers of the amino acids are described in Chapter 5; those for sugars, in Chapter 11). In contrast, when a compound having an asymmetric carbon atom is chemically synthesized in the laboratory, the nonbiological reactions usually produce all possible chiral forms in an equimolar mixture that does not rotate polarized light (a racemic mixture). The chiral forms in such a mixture can be separated only by painstaking physical methods. Chiral compounds in living cells are produced in only one chiral form because the enzymes that synthesize them are also chiral molecules.
Stereospecificity, the ability to distinguish between stereoisomers, is a common property of enzymes and other proteins and a characteristic feature of the molecular logic of living cells. If the binding site on a protein is complementary to one isomer of a chiral compound, it will not be complementary to the other isomer, for the same reason that a left glove does not fit a right hand. Two striking examples of the ability of biological systems to distinguish stereoisomers are shown in Figure 3–13.
R-carvone (spearmint), S-carvone (caraway), L-aspartyl-L-phenylalanyl methyl ester (aspartame) (sweet), L-aspartyl-D-phenylalanyl methyl ester (bitter)
Figure 3–13  Stereoisomers that are distinguished by sensory receptors for smell and taste in humans. (a) Two stereoisomers of carvone, designated R and S (see Fig. 3–10, legend). R-carvone (from spearmint oil) has the characteristic fragrance of spearmint; S-carvone (from caraway seed oil) smells like caraway. (b) Aspartame, the artificial sweetener sold under the trade name NutraSweet, is easily distinguishable by taste from its bitter-tasting stereoisomer, although the two differ only in the configuration about one of the two chiral carbon atoms (in red).
Saturated hydrocarbons – molecules with carbon–carbon single bonds and without double bonds or substituent groups – are not easily attacked by most chemical reagents; biomolecules, with their various functional groups, are much more chemically reactive. Functional groups alter the electron distribution and the geometry of neighboring atoms and thus affect the chemical reactivity of the entire molecule. The breakage and formation of chemical bonds during cellular metabolism release energy, some in the form of heat.
It is possible to analyze and predict the chemical behavior and reactions of biomolecules from the functional groups they bear. Enzymes recognize a specific pattern of functional groups in a biomolecule and catalyze characteristic chemical changes in the compound that contains these groups. Although a large number of different chemical reactions occur in a typical cell, these reactions are of only a few types, readily understandable in terms that apply to all reactions of organic compounds.
When the two atoms sharing electrons in a covalent bond have equal affinities for the electrons, as in the case of two carbon atoms, the resulting bond is nonpolar. When two elements that differ in electron affinity, or electronegativity (Table 3–4), form a covalent bond (e.g., C and O), that bond is polarized; the shared electrons are more likely to be in the region of the more electronegative atom (O) than of the less electronegative (C). In the extreme case of two atoms of very different electronegativity (Na and Cl, for example), one of the atoms actually gives up the electron(s) to the other atom, resulting in the formation of ions and ionic interactions such as those in solid NaCl.
Table 3–4, The electronegativities of some elements / Element, Electronegativity* / F, 4.0 / O, 3.5 / Cl, 3.0 / N, 3.0 / Br, 2.8 / S, 2.5 / C, 2.5 / I, 2.5 / Se, 2.4 / P, 2.1 / H, 2.1 / Cu, 1.9 / Fe, 1.8 / Co, 1.8 / Ni, 1.8 / Mo, 1.8 / Zn, 1.6 / Mn, 1.5 / Mg, 1.2 / Ca, 1.0 / Li, 1.0 / Na, 0.9 / K, 0.8 / * The higher the number, the more electronegative is the element.
The strength of chemical bonds (Table 3–5) depends upon the relative electronegativities of the elements involved, the distance of the bonding electrons from each nucleus, and the nuclear charge. The number of electrons shared also influences bond strength; double bonds are stronger than single bonds, and triple bonds are stronger yet. The strength of a bond is expressed as bond energy, in joules. (In biochemistry, calories have often been used as units of energy – bond energy and free energy, for example. The joule is the unit of energy in the International System of Units, and is used throughout this book. For conversions, 1 cal is equal to 4.18 J.) Bond energy can be thought of as either the amount of energy required to break a bond or the amount of energy gained by the surroundings when two atoms form the bond. One way to put energy into a system is to heat it, which gives the molecules more kinetic energy; temperature is a measurement of the average kinetic energy of a population of molecules. When molecular motion is sufficiently violent, intramolecular vibrations and intermolecular collisions sometimes break chemical bonds. Heating raises the fraction of molecules with energies high enough to react.
Table 3–5, Strengths of bonds common in biomolecules / Type of bond, Bond dissociation energy (kJ/mol) / Single bonds / O–H, 461 / H–H, 435 / P–O, 419 / C–H, 414 / N–H, 389 / C–O, 352 / C–C, 348 / S–H, 339 / C–N, 293 / C–S, 260 / N–O, 222 / S–S, 214 / Double bonds / C=O, 712 / C=N, 615 / C=C, 611 / P=O, 502 / Triple bonds / C≡C, 816 / N≡N, 930 / Noncovalent bonds or interactions / Hydrogen bonds | van der Waals interactions | Hydrophobic interactions | Ionic interactions, 4–20
In chemical reactions, bonds are broken and new ones are formed. The difference between the energy from the surroundings used to break bonds and the energy gained by the surroundings in the formation of new ones is virtually identical to the enthalpy change for the reaction, ΔH. (The energy difference becomes exactly equal to the enthalpy change after a slight correction for any volume change in the
system at constant pressure.) If heat energy is absorbed by the system as the change occurs (that is, if the reaction is endothermic), then ΔH has, by definition, a positive value; when heat is produced, as in exothermic reactions, ΔH is negative. In short, the change in enthalpy for a covalent reaction reflects the kinds and numbers of bonds that are made and broken. As we shall see later in this chapter, the enthalpy change is one of three factors that determine the free-energy change for a reaction; the other two are the temperature and the change in entropy.
Most cells have the capacity to carry out thousands of specific, enzyme-catalyzed reactions: transformation of simple nutrients such as glucose into amino acids, nucleotides, or lipids; extraction of energy from fuels by oxidation; or polymerization of subunits into macromolecules, for example. Fortunately for the student of biochemistry, there is a pattern in this multitude of reactions; we do not need to learn all of these reactions to comprehend the molecular logic of life.
group transfer, D-glucose + ATP (adenosine triphosphate) → D-glucose-6-phosphate + ADP (adenosine diphosphate), oxidation–reduction, D-glucose-6-phosphate + H2O, NADP+ → NADPH, 6-phosphogluconic acid + H+, rearrangement, D-glucose-6-phosphate → fructose-6-phosphate, cleavage, fructose-1,6-bisphosphate → dihydroxyacetone-phosphate + glyceraldehyde-3-phosphate, condensation, two amino acids → H2O, dipeptide
Most of the reactions in living cells fall into one of five general categories (Fig. 3–14): functional-group transfers (a), oxidations and reductions (b), reactions that rearrange the bond structure around one or more carbons (c), reactions that form or break carbon–carbon bonds (d), and reactions in which two molecules condense, with the elimination of a molecule of water (e). Reactions within one category generally occur by similar mechanisms.
Figure 3–14  Examples of five general types of chemical transformations that occur in cells. The reactions (a) through (d) are enzyme-catalyzed reactions that take place in your tissues as you use glucose as a source of energy (Chapter 14). In (a) a phosphoryl group is transferred from ATP to glucose; (b) an aldehyde is oxidized to a carboxylic acid and an oxidized electron carrier (NADP+) is reduced; (c) a rearrangement converts an aldehyde to a ketone; (d) a molecule is cleaved to form two smaller molecules. Reaction (e) represents the condensation of two amino acids with the elimination of H2O to form a peptide bond; condensation reactions occur in many cellular processes in which larger molecules are assembled from small precursors.
The mechanisms of biochemical reactions are not fundamentally different from other chemical reactions. Many biochemical reactions involve interactions between nucleophiles, functional groups rich in electrons and capable of donating them, and electrophiles, electron-deficient functional groups that seek electrons. Nucleophiles combine with, and give up electrons to, electrophiles. Functional groups containing oxygen, nitrogen, and sulfur are important biological nucleophiles (Table 3–6). Positively charged hydrogen atoms (protons) and positively charged metals (cations) frequently act as electrophiles in cells. A carbon atom can act as either a nucleophilic or an electrophilic center, depending upon which bonds and functional groups surround it.
Table 3–6, Some functional groups that act as nucleophiles within cells / Water / Hydroxyl (alcohol) / Alkoxyl / Sulfhydryl, RSH / Sulfide / Amino / Carboxylate / Imidazole
Many of the molecules found within cells are macromolecules, polymers of high molecular weight assembled from relatively simple precursors. Polysaccharides, proteins, and nucleic acids, which may have molecular weights ranging from tens of thousands to (in the case of DNA) billions, are produced by the polymerization of relatively small subunits with molecular weights of 500 or less. The synthesis of macromolecules is a major energy-consuming activity of cells. Macromolecules themselves may be further assembled into supramolecular complexes, forming functional units such as ribosomes, membranes, and organelles.
Table 3–7 shows the major classes of biomolecules in a representative single-celled organism, Escherichia coli. Water is the most abundant single compound in E. coli and in all other cells and organisms. Inorganic salts and mineral elements, on the other hand, constitute only a very small fraction of the total dry weight, but many of them are in approximate proportion to their distribution in seawater (see Table 3–1). Nearly all of the solid matter in all kinds of cells is organic and is present in four forms: proteins, nucleic acids, polysaccharides, and lipids.
Table 3–7, Molecular components of an E. coli cell / Percentage of total weight of cell, Approximate number of different molecular species / Water, 70, 1 / Proteins, 15, 3,000 / Nucleic acids / DNA, 1, 1 / RNA, 6, >3,000 / Polysaccharides, 3, 5 / Lipids, 2, 20 / Monomeric subunits and intermediates, 2, 500 / Inorganic ions, 1, 20
Proteins, long polymers of amino acids, constitute the largest fraction (besides water) of cells. Some proteins have catalytic activity and function as enzymes, others serve as structural elements, and still others carry specific signals (in the case of receptors) or specific substances (in the case of transport proteins) into or out of cells. Proteins are perhaps the most versatile of all biomolecules. The nucleic acids, DNA and RNA, are polymers of nucleotides. They store, transmit, and translate genetic information. The polysaccharides, polymers of simple sugars such as glucose, have two major functions: they serve as energy-yielding fuel stores and as extracellular structural elements. Shorter polymers of sugars (oligosaccharides) attached to proteins or lipids at the cell surface serve as specific cellular signals. The lipids, greasy or oily hydrocarbon derivatives, serve as structural components of membranes, as a storage form of energy-rich fuel, and in other roles. These four classes of large biomolecules are all synthesized in condensation reactions (Fig. 3–14e). In macromolecules – proteins, nucleic acids, and polysaccharides – the number of monomeric subunits is
very large. Proteins have molecular weights in the range of 5,000 to over 1 million; the nucleic acids have molecular weights ranging up to several billion; and polysaccharides, such as starch, also have molecular weights into the millions. Individual lipid molecules are much smaller (Mr 750 to 1,500), and are not classed as macromolecules. However, when large numbers of lipid molecules associate noncovalently, very large structures result. Cellular membranes are built of enormous aggregates containing millions of lipid molecules.
Although living organisms contain a very large number of different proteins and different nucleic acids, a fundamental simplicity underlies their structure (Chapter 1). The simple monomeric subunits from which all proteins and all nucleic acids are constructed are few in number and identical in all living species. Proteins and nucleic acids are informational macromolecules: each protein and each nucleic acid has a characteristic information-rich subunit sequence (Fig. 3–15).
Figure 3–15  Informational and structural macromolecules. A, T, C, and G represent the four deoxynucleotides of DNA, and glucose (Glc) is the repeating monomeric subunit of starch and cellulose. The number of possible permutations and combinations of four deoxynucleotides is virtually limitless, as is the number of melodies possible with a few musical notes. A polymer of one subunit type is information-poor and monotonous.
–A–C–T–C–G–A–C–G–A– (DNA), Glc–Glc–Glc–Glc–Glc– (cellulose)
Polysaccharides built from only a single kind of unit, or from two different alternating units, are not informational molecules in the same sense as are proteins and nucleic acids (Fig. 3–15). However, complex polysaccharides made up of six or more different kinds of sugars connected in branched chains do have the structural and stereochemical variety that enables them to carry information recognizable by other macromolecules.
Figure 3–16 shows the structures of some monomeric units, arranged in families. We have already seen that the most abundant polysaccharides in nature, starch and cellulose, are constructed of repeating units of D-glucose. The monomeric subunits of proteins are 20 different amino acids; all have an amino group (an imino group in the case of proline) and a carboxyl group attached to the same carbon atom, called, by convention, the α carbon. These α-amino acids differ from each other only in their side chains (Fig. 3–16).
The recurring structural units of all nucleic acids are eight different nucleotides; four kinds of nucleotides are the structural units of DNA, and four others are the units of RNA. Each nucleotide is made up of three components: (1) a nitrogenous organic base, (2) a five-carbon sugar, and (3) phosphate (Fig. 3–16). The eight different nucleotides of DNA and RNA are built from five different organic bases combined with two different sugars.
Figure 3–16  The organic compounds from which most larger structures in cells are constructed: the ABCs of biochemistry. Shown on these two pages are (a) the 20 amino acids from which the proteins of all organisms are built (the side chains are shaded red), (b) the five nitrogenous bases, two five-carbon sugars, and phosphoric acid from which all nucleic acids are built, (c) five components found in many membrane lipids, and (d) α-D-glucose, the parent sugar from which most carbohydrates are derived. Note that phosphoric acid is a subunit of both nucleic acids and membrane lipids. The five-carbon and six-carbon sugars are shown here in their ring forms rather than their straight-chain forms (Chapter 11). All components are shown in their un-ionized form.
the 20 amino acids of proteins, alanine, valine, leucine, isoleucine, proline, glycine, serine, threonine, cysteine, tyrosine, tryptophan, aspartic acid, glutamic acid, histidine, asparagine, phenylalanine, arginine, lysine, methionine, glutamine, the components of nucleic acids, uracil, thymine, cytosine, adenine, guanine, α-D-ribose, 2-deoxy-α-D-ribose; phosphoric acid; some components of lipids, choline, glycerol, oleic acid, palmitic acid; the parent sugar, α-D-glucose
Lipids also are constructed from relatively few kinds of subunits. Most lipid molecules contain one or more long-chain fatty acids, of which palmitic acid and oleic acid are parent compounds. Many lipids also contain an alcohol, e.g., glycerol, and some contain phosphate (Fig. 3–16). Thus, only three dozen different organic compounds are the parents of most biomolecules.
Each of the compounds in Figure 3–16 has multiple functions in living organisms (Fig. 3–17). Amino acids are not only the monomeric subunits of proteins; some also act as neurotransmitters and as precursors of hormones and toxins. Adenine serves both as a subunit in the structure of nucleic acids and of ATP, and as a neurotransmitter. Fatty acids serve as components of complex membrane lipids, energy-rich fuel-storage fats, and the protective waxy coats on leaves and fruits. D-Glucose is the monomeric subunit of starch and cellulose, and also is the precursor of other sugars such as D-mannose and sucrose.
Figure 3–17  Each simple component in Fig. 3–16 is a precursor of many other kinds of biomolecules.
amino acids → proteins, peptide hormones, neurotransmitters, toxic alkaloids; adenine → nucleic acids, ATP, coenzymes, neurotransmitters; palmitic acid → membrane lipids, fats, waxes; glucose → cellulose, starch, fructose, mannose, sucrose, lactose
It is extremely improbable that amino acids in a mixture would spontaneously condense into a protein with a unique sequence. This would represent increased order in a population of molecules; but according to the second law of thermodynamics (Chapter 13) the tendency is toward ever-greater disorder in the universe. To bring about the synthesis of macromolecules from their monomeric subunits, free energy must be supplied to the system (the cell).
J. Willard Gibbs
The randomness of the components of a chemical system is expressed as entropy, symbolized S. Any change in randomness of the system is the entropy change, ΔS, which has a positive value when randomness increases. J. Willard Gibbs, who developed the theory of energy changes during chemical reactions, showed that the free-energy content (G; recall Chapter 1) of any isolated system can be defined in terms of three quantities: enthalpy (H) (reflecting the number and kinds of bonds; see p. 66), entropy (S), and T, the absolute temperature (Kelvin). The definition of free energy is: G = HTS. When a chemical reaction occurs at constant temperature, the free-energy change is determined by ΔH, reflecting the kinds and numbers of chemical bonds and noncovalent interactions broken and formed, and ΔS, the change in the system’s randomness:
Recall from Chapter 1 that a process tends to occur spontaneously only if ΔG is negative. How, then, can cells synthesize polymers such as proteins and nucleic acids, if the free-energy change for polymerizing subunits is positive? They couple these thermodynamically unfavorable (endergonic) reactions to other cellular reactions that liberate free energy (exergonic reactions), so that the sum of the free-energy changes is negative:
Amino acids →  proteins ΔG1 is positive (endergonic)
ATP →  AMP + 2 PO43− ΔG2 is negative (exergonic)

Sum: Amino acids + ATP   →  proteins + AMP + 2 PO43−
The sum of ΔG1 and ΔG2 is negative (the overall process is exergonic).
The monomeric subunits in Figure 3–16 are very small compared with biological macromolecules. An amino acid molecule such as alanine is less than 0.5 nm long. Hemoglobin, the oxygen-carrying protein of erythrocytes, consists of nearly 600 amino acid units covalently linked into four long chains, which are folded into globular shapes and associated in a tetrameric structure with a diameter of 5.5 nm. Protein molecules in turn are small compared with ribosomes (about 20 nm in diameter), which contain about 70 different proteins and several different RNA molecules. Ribosomes, in their turn, are much smaller than organelles such as mitochondria, typically 1,000 nm in diameter. It is a long jump from the simple biomolecules to the larger cellular structures that can be seen with the light microscope. Figure 3–18 illustrates the structural hierarchy in cellular organization.
Figure 3–18  The structural hierarchy in the molecular organization of cells. The nucleus of this plant cell, for example, contains several types of supramolecular complexes, including chromosomes. Chromosomes consist of macromolecules – DNA and many different proteins. Each type of macromolecule is constructed from simple subunits – DNA from the deoxyribonucleotides, for example.
level 4: the cell and its organelles; level 3: supramolecular complexes, chromosome, plasma membrane, cell wall; level 2: macromolecules, DNA, protein, cellulose; level 1: biomolecules, nucleotides, amino acids, sugars
(Adapted from Becker, W.M. and Deamer, D.W. (1991) The World of the Cell, 2nd edn, Fig. 2–15, The Benjamin/Cummings Publishing Company, Menlo Park, CA)
In proteins, nucleic acids, and polysaccharides, the individual subunits are joined by covalent bonds. By contrast, in supramolecular complexes, the different macromolecules are held together by noncovalent interactions – much weaker, individually, than covalent bonds. Among these are hydrogen bonds (between polar groups), ionic interactions (between charged groups), hydrophobic interactions (between nonpolar groups), and van der Waals interactions, all of which have energies of only a few kilojoules, compared with covalent bonds, which have bond energies of 200 to 900 kJ/mol (see Table 3–5). The nature of these noncovalent interactions will be described in the next chapter.
The large numbers of weak interactions between macromolecules in supramolecular complexes stabilize the resulting noncovalent structures.
Although the monomeric subunits of macromolecules are so much smaller than cells and organelles, they influence the shape and function of these much larger structures. In sickle-cell anemia, a hereditary human disorder, the hemoglobin molecule is defective. In the two β chains of hemoglobin from healthy individuals, a glutamic acid residue occurs at position 6. In people with sickle-cell anemia, a valine residue occurs at position 6. This single difference in the sequence of the 146 amino acids of the β chain affects only a tiny portion of the molecule, yet it causes the hemoglobin to form large aggregates within the erythrocytes, which become deformed (sickled) and function abnormally.
Because all biological macromolecules are made from the same three dozen subunits, it seems likely that all living organisms descended from a single primordial cell line. These subunits are proposed to have had, singly and collectively, the most successful combination of chemical and physical properties for their function as the raw materials of biological macromolecules and for carrying out the basic energy-transforming and self-replicating features of a living cell. These primordial organic compounds may have been retained during biological evolution over billions of years because of their unique fitness.
Figure 3–19  Lightning evoked by a volcanic eruption that resulted in the formation of the island of Surtsey off the coast of Iceland in 1963. The intense fields of electrical, thermal, and shock-wave energy generated by such cataclysms, which were frequent on the primitive earth, could have been a major factor in the origin of organic compounds.
We come now to a puzzle. Apart from their occurrence in living organisms, organic compounds, including the basic biomolecules, occur only in trace amounts in the earth’s crust, the sea, and the atmosphere. How did the first living organisms acquire their characteristic organic building blocks? In 1922, the biochemist Aleksandr I. Oparin proposed a theory for the origin of life early in the history of the earth, postulating that the atmosphere was once very different from that of today. Rich in methane, ammonia, and water, and essentially devoid of oxygen, it was a reducing atmosphere, in contrast to the oxidizing environment of our era. In Oparin’s theory, electrical energy of lightning discharges or heat energy from volcanoes (Fig. 3–19) caused ammonia, methane, water vapor, and other components of the primitive atmosphere to react, forming simple organic compounds. These compounds then dissolved in the ancient seas, which over many millenia became enriched with a large variety of simple organic compounds. In this warm solution (the “primordial soup”) some organic molecules had a greater tendency than others to associate into larger complexes. Over millions of years, these in turn assembled spontaneously to form membranes and catalysts (enzymes), which came together to become precursors of the first primitive cells. For many years, Oparin’s views remained speculative and appeared untestable.
A classic experiment on the abiotic (nonbiological) origin of organic biomolecules was carried out in 1953 by Stanley Miller in the laboratory of Harold Urey. Miller subjected gaseous mixtures of NH3, CH4, water vapor, and H2 to electrical sparks produced across a pair of electrodes (to simulate lightning) for periods of a week or more (Fig. 3–20), then analyzed the contents of the closed reaction vessel. The gas phase of the resulting mixture contained CO and CO2, as well as the starting
electrodes, spark gap, condenser, mixture of NH3, CH4, H2, and H2O at 80 °C
Figure 3–20  Spark-discharge apparatus of the type used by Miller and Urey in experiments demonstrating abiotic formation of organic compounds under primitive atmospheric conditions. After subjecting the gaseous contents of the system to electrical sparks, products were collected by condensation. Biomolecules such as amino acids were among the products (see Table 3–8).
materials. The water phase contained a variety of organic compounds, including some amino acids, hydroxy acids, aldehydes, and hydrogen cyanide (HCN). This experiment established the possibility of abiotic production of biomolecules in relatively short times under relatively mild conditions.
Several developments have allowed more refined studies of the type pioneered by Miller and Urey, and have yielded strong evidence that a wide variety of biomolecules, including proteins and nucleic acids, could have been produced spontaneously from simple starting materials probably present on the earth at the time life arose.
Modern extensions of the Miller experiments have employed “atmospheres” that include CO2 and HCN, and much improved technology for identifying small quantities of products. The formation of hundreds of organic compounds has been demonstrated (Table 3–8). These compounds include more than ten of the common amino acids, a variety of mono-, di-, and tricarboxylic acids, fatty acids, adenine, and formaldehyde. Under certain conditions, formaldehyde polymerizes to form sugars containing three, four, five, and six carbons. The sources of energy that are effective in bringing about the formation of these compounds include heat, visible and ultraviolet (UV) light, x rays, gamma radiation, ultrasound and shock waves, and alpha and beta particles.
Table 3–8, Some of the products shown to form under prebiotic conditions / Amino acids / Glycine / Alanine / α-Aminobutyric acid / Valine / Leucine / Isoleucine / Proline / Aspartic acid / Glutamic acid / Serine / Threonine / Sugars / Straight and branched pentoses and hexoses / Carboxylic acids / Formic acid / Acetic acid / Propionic acid / Straight and branched fatty acids (C4–C10) / Glycolic acid / Lactic acid / Succinic acid / Nucleic acid bases / Adenine / Guanine / Xanthine / Hypoxanthine / Cytosine / Uracil
Source: From Miller, S.L. (1987) Which organic compounds could have occurred on the prebiotic earth? Cold Spring Harb. Symp. Quant. Biol. 52, 17–27.
HCN, light → diaminomaleonitrile → diiminosuccinonitrile; two glycine molecules, condensation → H2O, glycylglycine
In addition to the many monomers that form in these experiments, polymers of nucleotides (nucleic acids) and of amino acids (proteins) also form. Some of the products of the self condensation of HCN are effective promoters of such polymerization reactions (Fig. 3–21), and inorganic ions present in the earth’s crust (Cu2+, Ni2+, and Zn2+) also enhance the rate of polymerization.
Figure 3–21  Among the products of electrical discharge through an atmosphere containing HCN are compounds such as those in (a). These compounds promote the polymerization of monomers such as amino acids into polymers (b).
In short, laboratory experiments on the spontaneous formation of biomolecules under prebiotic conditions have provided good evidence that many of the chemical components of living cells, including proteins and RNA, can form under these conditions. Short polymers of RNA can act as catalysts in biologically significant reactions (Chapter 25), and it seems likely that RNA played a crucial role in prebiotic evolution, both as catalyst and as information repository.
In modern organisms, nucleic acids encode the genetic information that specifies the structure of enzymes, and enzymes have the ability to catalyze the replication and repair of nucleic acids. The mutual dependence of these two classes of biomolecules poses the perplexing question: which came first, DNA or protein?
creation of prebiotic soup, including nucleotides, from earth’s primitive atmosphere → production of short RNA molecules with random sequences → selective replication of self-duplicating catalytic RNA segments → synthesis of specific peptides, catalyzed by RNA → increasing role of peptides in RNA replication; coevolution of RNA and protein → primitive translation system develops, with RNA genome and RNA–protein catalysts → genomic RNA begins to be copied into DNA → DNA genome, translated on RNA–protein complex (ribosome) with protein catalysts
Figure 3–22  One possible “RNA world” scenario, showing the transition from the prebiotic RNA world (shades of yellow) to the biotic DNA world (orange).
The answer may be: neither. The discovery that RNA molecules can act as catalysts in their own formation suggests that RNA may have been the first gene and the first catalyst. According to this scenario (Fig. 3–22), one of the earliest stages of biological evolution was the chance formation, in the primordial soup, of an RNA molecule that had the ability to catalyze the formation of other RNA molecules of the same sequence – a self-replicating, self-perpetuating RNA. The concentration of a self-replicating RNA molecule would increase exponentially, as one molecule formed two, two formed four, and so on. The fidelity of self replication was presumably less than perfect, so the process would generate variants of the RNA, some of which might be even better able to self-replicate. In the competition for nucleotides, the most efficient of the self-replicating sequences would win, and less efficient replicators would fade from the population.
The division of function between DNA (genetic information storage) and protein (catalysis) was, according to the “RNA world” hypothesis, a later development (Fig. 3–22). New variants of self-replicating RNA molecules developed, with the additional ability to catalyze the condensation of amino acids into peptides. Occasionally, the peptide(s) thus formed would reinforce the self-replicating ability of the RNA, and the pair – RNA molecule and helping peptide – could undergo further modifications in sequence, generating even more efficient self-replicating systems. Sometime after the evolution of this primitive protein-synthesizing system, there was a further development: DNA molecules with sequences complementary to the self-replicating RNA molecules took over the function of conserving the “genetic” information, and RNA molecules evolved to play roles in protein synthesis. Proteins proved to be versatile catalysts, and over time, assumed that function. Lipidlike compounds in the primordial soup formed relatively impermeable layers surrounding self-replicating collections of molecules. The concentration of proteins and nucleic acids within these lipid enclosures favored the molecular interactions required in self-replication.
This “RNA world” hypothesis is plausible but by no means universally accepted. The hypothesis does make testable predictions, and to the extent that experimental tests are possible within finite times (less than or equal to the life span of a scientist!), the hypothesis will be tested and refined.
The earth was formed about 4.5 billion years ago, and the first definitive evidence of life dates to about 3.5 billion years ago. An international group of scientists showed in 1980 that certain ancient rock formations (stromatolites; Fig. 3–23) in western Australia contained fossils of primitive microorganisms. Somewhere on earth during that first billion-year period, there arose the first simple organism, capable
of replicating its own structure from a template (RNA?) that was the first genetic material. Because the terrestrial atmosphere at the dawn of life was nearly devoid of oxygen, and because there were few microorganisms to scavenge organic compounds formed by natural processes, these compounds were relatively stable. Given this stability and eons of time, the improbable became inevitable: the organic compounds were incorporated into evolving cells to produce more and more effective self reproducing catalysts. The process of biological evolution had begun. Organisms developed mechanisms for harnessing the energy of sunlight through photosynthesis, to make sugars and other organic molecules from carbon dioxide, and to convert molecular nitrogen from the atmosphere into nitrogenous biomolecules such as amino acids. By developing their own capacities to synthesize biomolecules, cells became independent of the random processes by which such compounds had first appeared on earth. As evolution proceeded, organisms began to interact and to derive mutual benefits from each other’s products, forming increasingly complex ecological systems.
Figure 3–23  Ancient reefs in Australia contain fossil evidence of microbial life in the sea of 3.5 billion years ago. Bits of sand and limestone became trapped in the sticky extracellular coats of cyanobacteria, gradually building up these stromatolites found in Hamelin Bay, Western Australia (a). Microscopic examination of thin sections of stromatolite reveals microfossils of filamentous bacteria (b).
Most of the dry weight of living organisms consists of organic compounds, molecules containing covalently bonded carbon backbones to which other carbon, hydrogen, oxygen, or nitrogen atoms may be attached. Carbon appears to have been selected in the course of biological evolution because of the ability of carbon atoms to form single and double bonds with each other, making possible formation of linear, cyclic, and branched backbone structures in great variety. To these backbones are attached different kinds of functional groups, which determine the chemical properties of the molecules. Organic biomolecules also have characteristic shapes (configurations and conformations) in three dimensions. Many biomolecules occur in asymmetric or chiral forms called enantiomers, stereoisomers that are nonsuperimposable mirror images of each other. Usually, only one of a pair of enantiomers has biological activity.
The strength of covalent chemical bonds, measured in joules, depends on the electronegativities and sizes of the atoms that share electrons. The enthalpy change (ΔH) for a chemical reaction reflects the number and kind of bonds made and broken. For endothermic reactions, ΔH is positive; for exothermic reactions, negative. The many different chemical reactions that occur within a cell fall into five general categories: group transfers, oxidation–reduction reactions, rearrangements of the bonds around carbon atoms, breakage or formation of carbon–carbon bonds, and condensations.
Most of the organic matter in living cells consists of macromolecules: nucleic acids, proteins, and polysaccharides. Each type of macromolecule is composed of small, covalently linked monomeric subunits of relatively few kinds. Proteins are polymers of 20 different kinds of amino acids, nucleic acids are polymers of different nucleotide units (four in DNA, four in RNA), and polysaccharides are polymers of recurring sugar
units. Nucleic acids and proteins are informational macromolecules; the characteristic sequences of their subunits constitute the genetic individuality of a species. Simple polysaccharides act as structural components, but some complex polysaccharides also are informational macromolecules.
There is a structural hierarchy in the molecular organization of cells. Cells contain organelles, such as nuclei, mitochondria, and chloroplasts, which in turn contain supramolecular complexes, such as membranes and ribosomes, and these consist in turn of clusters of macromolecules that are bound together by many relatively weak, noncovalent forces. The macromolecules consist of covalently linked subunits. The formation of macromolecules from simple subunits creates order (decreases entropy); this synthesis requires energy and therefore must be coupled to exergonic reactions.
The small biomolecules such as amino acids and sugars probably first arose spontaneously from atmospheric gases and water under the influence of electrical energy (lightning) during the early history of the earth. Such processes, called chemical evolution, can be simulated in the laboratory. The monomeric subunits of cellular macromolecules appear to have been selected during early biological evolution as being the most fit for their biological functions. These subunit molecules are relatively few in number, but are very versatile; evolution has combined small biomolecules to yield macromolecules capable of diverse biological functions. The first macromolecules may have been RNA molecules that were capable of catalyzing their own replication. Later in evolution, DNA took over the function of storing genetic information, proteins became the cellular catalysts, and RNA mediated between these, allowing the expression of genetic information as proteins.
Further Reading
Baker, J.J. & Allen, G.E. (1981) Matter, Energy, and Life: An Introduction to Chemical Concepts, 4th edn, Addison-Wesley Publishing Co., Inc., Reading, MA. 
Callewaert, D.M. & Genyea, J. (1980) Basic Chemistry: General, Organic, Biological, Worth Publishers, Inc., New York. 
Dickerson, R.E. & Geis, I. (1976) Chemistry, Matter, and the Universe, The Benjamin/Cummings Publishing Company, Menlo Park, CA. 
Frieden, E. (1972) The chemical elements of life. Sci. Am. 227 (July), 52–61. 
The Molecules of Life. (1985) Sci. Am. 253 (October). 
An entire issue devoted to the structure and function of biomolecules. It includes articles on DNA, RNA, and proteins, and their subunits.
Chemistry and Stereochemistry
Brewster, J.H. (1986) Stereochemistry and the origins of life. J. Chem. Educ. 8, 667–670. 
An interesting and lucid discussion of the ways in which evolution could have selected only one of two stereoisomers for the construction of proteins and other molecules.
Hegstrom, R.A. & Kondepudi, D.K. (1990) The handedness of the universe. Sci. Am. 262 (January), 108–115. 
Stereochemistry and the asymmetry of biomolecules, viewed in the context of the universe.
Loudon, M. (1988) Organic Chemistry, 2nd edn, The Benjamin/Cummings Publishing Company, Menlo Park, CA. 
This and the following two books provide details on stereochemistry and the chemical reactivity of functional groups. All excellent textbooks.
Morrison, R.T. & Boyd, R.N. (1992) Organic Chemistry, 6th edn, Allyn & Bacon, Inc., Boston, MA. 
Streitweiser, A. Jr. & Heathcock, C.H. (1981) Introduction to Organic Chemistry, 2nd edn, Macmillan Publishing Co., Inc., New York. 
Prebiotic Evolution
Cavalier-Smith, T. (1987) The origin of cells: a symbiosis between genes, catalysts, and membranes. Cold Spring Harb. Symp. Quant. Biol. 52, 805–824 
Darnell, J.E. & Doolittle, W.F. (1986) Speculations on the early course of evolution. Proc. Natl. Acad. Sci. USA 83, 1271–1275 
A clear statement of the RNA world scenario.
Evolution of Catalytic Function. (1987) Cold Spring Harb. Symp. Quant. Biol. 52. 
A collection of almost 100 articles on all aspects of prebiotic and early biological evolution; probably the single best source on molecular evolution.
Ferris, J.P. (1984) The chemistry of life’s origin. Chem. Eng. News 62, 21–35 
A short, clear description of the experimental evidence for the synthesis of biomolecules under prebiotic conditions.
Horgan, J. (1991) In the beginning . . . Sci. Am. 264 (February), 116–125 
A brief, clear statement of current theories regarding prebiotic evolution.
Miller, S.L. (1987) Which organic compounds could have occurred on the prebiotic earth? Cold Spring Harb. Symp. Quant. Biol. 52, 17–27 
Schopf, J.W. (ed) (1983) Earth’s Earliest Biosphere, Princeton University Press, Princeton, NJ. 
A comprehensive discussion of geologic history and its relation to the development of life.
1. Vitamin C: Is the Synthetic Vitamin as Good as the Natural One?  One claim put forth by purveyors of health foods is that vitamins obtained from natural sources are more healthful than those obtained by chemical synthesis. For example, it is claimed that pure L-ascorbic acid (vitamin C) obtained from rose hips is better for you than pure L-ascorbic acid manufactured in a chemical plant. Are the vitamins from the two sources different? Can the body distinguish a vitamin’s source?
2. Identification of Functional Groups  Figure 3–5 shows the common functional groups of biomolecules. Since the properties and biological activities of biomolecules are largely determined by their functional groups, it is important to be able to identify them. In each of the molecules at right, identify the constituent functional groups.
3. Drug Activity and Stereochemistry  The quantitative differences in biological activity between the two enantiomers of a compound are sometimes quite large. For example, the D-isomer of the drug isoproterenol, used to treat mild asthma, is 50 to 80 times more effective as a bronchodilator than the L-isomer. Identify the chiral center in isoproterenol. Why would the two enantiomers have such radically different bioactivity?

4. Drug Action and Shape of Molecules  Some years ago two drug companies marketed a drug under the trade names Dexedrine and Benzedrine. The structure of the drug is shown below.

The physical properties (C, H, and N analysis, melting point, solubility, etc.) of Dexedrine and Benzedrine were identical. The recommended oral dosage of Dexedrine (which is still available) was 5 mg/d, but the recommended dosage of Benzedrine was significantly higher. Apparently it required considerably more Benzedrine than Dexedrine to yield the same physiological response. Explain this apparent contradiction.
5. Components of Complex Biomolecules  Figure 3–16 shows the structures of the major components of complex biomolecules. For each of the three important biomolecules below (shown in their ionized forms at physiological pH), identify the constituents.
       (a) Guanosine triphosphate (GTP), an energy-rich nucleotide that serves as precursor to RNA:

       (b) Phosphatidylcholine, a component of many membranes:

       (c) Methionine enkephalin, the brain’s own opiate:

6. Determination of the Structure of a Biomolecule  An unknown substance, X, was isolated from rabbit muscle. The structure of X was determined from the following observations and experiments. Qualitative analysis showed that X was composed entirely of C, H, and O. A weighed sample of X was completely oxidized, and the amount of H2O and CO2 produced was measured. From this quantitative analysis, it was concluded that X contains 40.00% C, 6.71% H, and 53.29% O by weight. The molecular mass of X was determined by a mass spectrometer and found to be 90.00. An infrared spectrum of X showed that it contained one double bond. X dissolved readily in water to give an acidic solution. A solution of X was tested in a polarimeter and demonstrated optical activity.
       (a) Determine the empirical and molecular formula of X.
       (b) Draw the possible structures of X that fit the molecular formula and contain one double bond. Consider only linear or branched structures and disregard cyclic structures. Note that oxygen makes very poor bonds to itself.
       (c) What is the structural significance of the observed optical activity? Which structures in (b) does this observation eliminate? Which structures are consistent with the observation?
       (d) What is the structural significance of the observation that a solution of X was acidic? Which structures in (b) are now eliminated? Which structures are consistent with the observation?
       (e) What is the structure of X? Is more than one structure consistent with all the data?
This view of the earth from space shows that most of the planet’s surface is covered with water. The seas, where living organisms probably first arose, are today the habitat of countless modern organisms.
Chapter 4
Water: Its Effect on Dissolved Biomolecules
Water is the most abundant substance in living systems, making up 70% or more of the weight of most organisms. Water pervades all portions of every cell and is the medium in which the transport of nutrients, the enzyme-catalyzed reactions of metabolism, and the transfer of chemical energy occur. The first living organisms probably arose in the primeval oceans; evolution was shaped by the properties of the medium in which it occurred. All aspects of cell structure and function are adapted to the physical and chemical properties of water. This chapter begins with descriptions of these physical and chemical properties. The strong attractive forces between water molecules result in water’s solvent properties. The slight tendency of water to ionize is also of crucial importance to the structure and function of biomolecules, and we will review the topic of ionization in terms of equilibrium constants, pH, and titration curves. Finally, we will consider the way in which aqueous solutions of weak acids or bases and their salts act as buffers against pH changes in biological systems. The water molecule and its ionization products, H+ and OH, profoundly influence the structure, self-assembly, and properties of all cellular components, including enzymes and other proteins, nucleic acids, and lipids. The noncovalent interactions responsible for the specificity of “recognition” among biomolecules are decisively influenced by the solvent properties of water.
Hydrogen bonds between water molecules provide the cohesive forces that make water a liquid at room temperature and that favor the extreme ordering of molecules typical of crystalline water (ice). Polar biomolecules dissolve readily in water because they can replace energetically favorable water–water interactions with even more favorable water–solute interactions (hydrogen bonds and electrostatic interactions). In contrast, nonpolar biomolecules interfere with favorable water–water interactions and are poorly soluble in water. In aqueous solutions, these molecules tend to cluster together to minimize the energetically unfavorable effects of their presence.
Hydrogen bonds and ionic, hydrophobic (Greek, “water-fearing”), and van der Waals interactions, although individually weak, are numerous in biological macromolecules and collectively have a very significant influence on the three-dimensional structures of proteins, nucleic acids, polysaccharides, and membrane lipids. Before we begin a
detailed discussion of these biomolecules in the following chapters, it is useful to review the properties of the solvent, water, in which they are assembled and carry out their functions.
hydrogen bond, 0.177 nm; covalent bond, 0.0965 nm
Water has a higher melting point, boiling point, and heat of vaporization than most other common liquids (Table 4–1). These unusual properties are a consequence of strong attractions between adjacent water molecules, which give liquid water great internal cohesion.
Table 4–1, Melting point, boiling point, and heat of vaporization of some common liquids / Melting point (°C), Boiling point (°C), Heat of vaporization (J/g)* / Water, 0, 100, 2,260 / Methanol (CH3OH), –98, 65, 1,100 / Ethanol (CH3CH2OH), –117, 78, 854 / Propanol (CH3CH2CH2OH), –127, 97, 687 / Butanol (CH3(CH2)2CH2OH), –90, 117, 590 / Acetone (CH3COCH3), –95, 56, 523 / Hexane (CH3(CH2)4CH3), –98, 69, 423 / Benzene (C6H6), 6, 80, 394 / Butane (CH3(CH2)2CH3), –135, –0.5, 381 / Chloroform (CHCl3), –63, 61, 247
* The heat energy required to convert 1.0 g of a liquid at its boiling point, at atmospheric
pressure, into its gaseous state at the same temperature. It is a direct measure of the
energy required to overcome attractive forces between molecules in the liquid phase.
What is the cause of these strong intermolecular attractions in liquid water? Each hydrogen atom of a water molecule shares an electron pair with the oxygen atom. The geometry of the water molecule is dictated by the shapes of the outer electron orbitals of the oxygen atom, which are similar to the bonding orbitals of carbon (see Fig. 3–4a). These orbitals describe a rough tetrahedron, with a hydrogen atom at each of two corners and unshared electrons at the other two (Fig. 4–1). The H–O–H bond angle is 104.5° slightly less than the 109.5° of a perfect tetrahedron; the nonbonding orbitals of the oxygen atom slightly compress the orbitals shared by hydrogen.
The oxygen nucleus attracts electrons more strongly than does the hydrogen nucleus (i.e., the proton); oxygen is more electronegative (see Table 3–4). The sharing of electrons between H and O is therefore unequal; the electrons are more often in the vicinity of the oxygen atom than of the hydrogen. The result of this unequal electron sharing is two electric dipoles in the water molecule, one along each of the H–O bonds; the oxygen atom bears a partial negative charge (δ), and each hydrogen a partial positive charge (δ+). The resulting electrostatic attraction between the oxygen atom of one water molecule and the hydrogen of another (Fig. 4–1c) constitutes a hydrogen bond.
Figure 4–1  The dipolar nature of the H2O molecule, shown (a) by ball-and-stick and (b) by space-filling models. The dashed lines in (a) represent the nonbonding orbitals. There is a nearly tetrahedral arrangement of the outer shell electron pairs around the oxygen atom; the two hydrogen atoms have localized partial positive charges and the oxygen atom has two localized partial negative charges. (c) Two H2O molecules joined by a hydrogen bond (designated by three blue lines) between the oxygen atom of the upper molecule and a hydrogen atom of the lower one. Hydrogen bonds are longer and weaker than covalent O–H bonds.
Hydrogen bonds are weaker than covalent bonds. The hydrogen bonds in liquid water have a bond energy (the energy required to break a bond) of only about 20 kJ/mol, compared with 460 kJ/mol for the covalent O–H bond. At room temperature, the thermal energy of an aqueous solution (the kinetic energy resulting from the motion of individual atoms and molecules) is of the same order as that required to break hydrogen bonds. When water is heated, its temperature
increase reflects the faster motion of individual water molecules. Although at any given time most of the molecules in liquid water are hydrogen-bonded, the lifetime of each hydrogen bond is less than 1 × 10−9 s. The apt phrase “flickering clusters” has been applied to the short-lived groups of hydrogen-bonded molecules in liquid water. The very large number of hydrogen bonds between molecules nevertheless confers great internal cohesion on liquid water.
Figure 4–2  In ice, each water molecule forms the maximum of four hydrogen bonds, creating a regular crystal lattice. In liquid water at room temperature, by contrast, each water molecule forms an average of 3.4 hydrogen bonds with other water molecules. The crystal lattice of ice occupies more space than the same number of H2O molecules occupy in liquid water; ice is less dense than liquid water, and thus floats.
The nearly tetrahedral arrangement of the orbitals about the oxygen atom (Fig. 4–1a) allows each water molecule to form hydrogen bonds with as many as four neighboring water molecules. At any given instant in liquid water at room temperature, each water molecule forms hydrogen bonds with an average of 3.4 other water molecules. The water molecules are in continuous motion in the liquid state, so hydrogen bonds are constantly and rapidly being broken and formed. In ice, however, each water molecule is fixed in space and forms hydrogen bonds with four other water molecules, to yield a regular lattice structure (Fig. 4–2). To break the large numbers of hydrogen bonds in such a lattice requires much thermal energy, which accounts for the relatively high melting point of water (Table 4–1). When ice melts or water evaporates, heat is taken up by the system:
H2O(s)  →  H2O(l)   ΔH = +5.9 kJ/mol
H2O(l)  →  H2O(g)   ΔH = +44.0 kJ/mol
During melting or evaporation, the entropy of the aqueous system increases as more highly ordered arrays of water molecules relax into the less orderly hydrogen-bonded arrays in liquid water, or the wholly disordered water molecules in the gaseous state. At room temperature, both the melting of ice and the evaporation of water occur spontaneously; the tendency of the water molecules to associate through hydrogen bonds is outweighed by the energetic push toward randomness. Recall that the free-energy change (ΔG) must have a negative value for a process to occur spontaneously: ΔG = ΔHTΔS, where ΔG represents the driving force, ΔH the energy from making and breaking bonds, and ΔS the increase in randomness. Since ΔH is positive for melting and evaporation, it is clearly the increase in entropy (ΔS) that makes ΔG negative and drives these transformations.
Hydrogen bonds are not unique to water. They readily form between an electronegative atom (usually oxygen or nitrogen) and a hydrogen atom covalently bonded to another electronegative atom in the same or another molecule (Fig. 4–3). However, hydrogen atoms covalently bonded to carbon atoms, which are not electronegative, do not participate in hydrogen bonding. The distinction explains why butanol (CH3CH2CH2CH2OH) has a relatively high boiling point of 117 °C, whereas butane (CH3CH2CH2CH3) has a boiling point of only –0.5 °C. Butanol has a polar hydroxyl group and thus can form hydrogen bonds with other butanol molecules.
Figure 4–3  Common types of hydrogen bonds. In biological systems, the electronegative atom (the hydrogen acceptor) is usually oxygen or nitrogen. The distance between two hydrogen-bonded atoms varies from 0.26 to 0.31 nm.
hydrogen donor, hydrogen acceptor
Figure 4–4  Some hydrogen bonds of biological importance.
between the hydroxyl group of an alcohol and water; between the carbonyl group of a ketone and water; between two polypeptide chains; between two complementary bases of two strands of DNA, thymine, adenine
Uncharged but polar biomolecules such as sugars dissolve readily in water because of the stabilizing effect of the many hydrogen bonds that form between the hydroxyl groups or the carbonyl oxygen of the sugar and the polar water molecules. Alcohols, aldehydes, and ketones all form hydrogen bonds with water, as do compounds containing N–H bonds (Fig. 4–4), and molecules containing such groups tend to be soluble in water.
Hydrogen bonds are strongest when the bonded molecules are oriented to maximize electrostatic interaction, which occurs when the hydrogen atom and the two atoms that share it are in a straight line (Fig. 4–5). Hydrogen bonds are thus highly directional and capable of holding two hydrogen-bonded molecules or groups in a specific geometric arrangement. We shall see later that this property of hydrogen bonds confers very precise three-dimensional structures upon protein and nucleic acid molecules, in which there are many intramolecular hydrogen bonds.
Figure 4–5  Directionality of the hydrogen bond. The attraction between the partial electric charges (see Fig. 4–1) is greatest when the three atoms involved (in this case O, H, and O) lie in a straight line.
strong hydrogen bond, weaker hydrogen bond
Water is a polar solvent. It readily dissolves most biomolecules, which are generally charged or polar compounds (Table 4–2); compounds that dissolve easily in water are hydrophilic (Greek, “water-loving”). In contrast, nonpolar solvents such as chloroform and benzene are poor solvents for polar biomolecules, but easily dissolve nonpolar biomolecules such as lipids and waxes.
Water dissolves salts such as NaCl by hydrating and stabilizing the Na+ and Cl ions, weakening their electrostatic interactions and thus counteracting their tendency to associate in a crystalline lattice (Fig. 4–6). The solubility of charged biomolecules in water is also a result of hydration and charge screening. Compounds with functional groups such as ionized carboxylic acids (–COO), protonated amines (–NH3+), and phosphate esters or anhydrides are generally soluble in water for the same reason.
Water is especially effective in screening the electrostatic interactions between dissolved ions. The strength, or force (F), of these ionic interactions depends upon the magnitude of the charges (Q), the distance between the charged groups (r), and the dielectric constant (ϵ) of the solvent through which the interactions occur:
F  =  
The dielectric constant is a physical property reflecting the number of dipoles in a solvent. For water at 25 °C, ϵ (which is dimensionless) 78.5, and for the very nonpolar solvent benzene, ϵ is 4.6. Thus ionic interactions are much stronger in less polar environments. The dependence on r2 is such that ionic attractions or repulsions operate over limited distances, in the range of 10 to 40 nm (depending on the electrolyte concentration) when the solvent is water.
Table 4–2, Some examples of polar, nonpolar, and amphipathic biomolecules / Biomolecule, Ionic form at pH 7 / Polar / Glucose / Glycine, +NH3–CH2–COO / Aspartic acid / Lactic acid / Glycerol / Nonpolar / Typical wax / Amphipathic / Phenylalanine / Phosphatidylcholine
Cl, Na+, H2O, hydrated Cl ion, hydrated Na+ ion
As a salt such as NaCl dissolves, the Na+ and Cl ions leaving the crystal lattice acquire far greater freedom of motion (Fig. 4–6). The resulting increase in the entropy (randomness) of the system is largely responsible for the ease of dissolving salts such as NaCl in water. In thermodynamic terms, formation of the solution occurs with a favorable change in free energy: ΔG = ΔHTΔS, where ΔH has a small positive value and TΔS a large positive value; thus ΔG is negative.
Figure 4–6  Water dissolves many crystalline salts by hydrating their component ions. The NaCl crystal lattice is disrupted as water molecules cluster about the Cl and Na+ ions. The ionic charges are thus partially neutralized, and the electrostatic attractions necessary for lattice formation are weakened.
The biologically important gases CO2, O2, and N2 are nonpolar. In the diatomic molecules O2 and N2, electrons are shared equally by both atoms. In CO2, each C=O bond is polar, but the two dipoles are oppositely directed and cancel each other (Table 4–3). The movement of these molecules from the disordered gas phase into aqueous solution constrains their motion and therefore represents a decrease in entropy. These gases are consequently very poorly soluble in water (Table 4–3). Some organisms have water-soluble carrier proteins (hemoglobin and myoglobin, for example) that facilitate the transport of O2. Carbon dioxide forms carbonic acid (H2CO3) in aqueous solution, and is transported in that form.
Two other gases, NH3 and H2S, also have biological roles in some organisms; these are polar and dissolve readily in water (Table 4–3).
Table 4–3, Solubilities of some gases in water / Gas, Structure*, Polarity, Solubility in water (g/L), Temperature (°C) / Nitrogen, N≡N, Nonpolar, 0.018, 40 / Oxygen, O=O, Nonpolar, 0.035, 50 / Carbon dioxide, Nonpolar, 0.97, 45 / Ammonia, Polar, 900, 10 / Hydrogen sulfide, Polar, 1,860, 40
* The arrows represent electric dipoles; there is a partial negative charge (δ) at the
head of the arrow, a partial positive charge (δ+; not shown here) at the tail.
When water is mixed with a hydrocarbon such as benzene or hexane, two phases form; neither liquid is soluble in the other. Shorter hydrocarbons such as ethane have small but measurable solubility in water. Nonpolar compounds such as benzene, hexane, and ethane are hydrophobic – they are unable to undergo energetically favorable interactions with water molecules, and they actually interfere with the hydrogen bonding among water molecules. All solute molecules or ions dissolved in water interfere with the hydrogen bonding of some water molecules in their immediate vicinity, but polar or charged solutes (such as NaCl) partially compensate for lost hydrogen bonds by forming new solute–water interactions. The net change in enthalpy (ΔH) for dissolving these solutes is generally small. Hydrophobic solutes offer no such compensation, and their addition to water may therefore result in a small gain of enthalpy; the breaking of hydrogen bonds requires the addition of energy to the system. Furthermore, dissolving hydrophobic solutes in water results in a measurable decrease in entropy. Water molecules in the immediate vicinity of a nonpolar solute are constrained in their possible orientations, resulting in a shell of
highly ordered water molecules around each solute molecule. The number of water molecules in the highly ordered shell is proportional to the surface area of the hydrophobic solute. The free-energy change for dissolving a nonpolar solute in water is thus unfavorable: ΔG = ΔHTΔS, where ΔH has a positive value, ΔS a negative value, and thus ΔG is positive.
Amphipathic compounds contain regions that are polar (or charged) and regions that are nonpolar (Table 4–2). When amphipathic compounds are mixed with water, the two regions of the solute molecule experience conflicting tendencies; the polar or charged, hydrophilic region interacts favorably with the solvent and tends to dissolve, but the nonpolar, hydrophobic region has the opposite tendency, to avoid contact with the water (Fig. 4–7a). The nonpolar regions of the molecules cluster together to present the smallest hydrophobic area to the solvent, and the polar regions are arranged to maximize their interaction with the aqueous solvent (Fig. 4–7b). These stable structures of amphipathic compounds in water, called micelles, may contain hundreds or thousands of molecules. The forces that hold the nonpolar regions of the molecules together are called hydrophobic interactions. The strength of these interactions is not due to any intrinsic attraction between nonpolar molecules. Rather, it results from the system’s achieving greatest thermodynamic stability by minimizing the entropy decrease that results from the ordering of water molecules around hydrophobic portions of the solute molecule.
“flickering clusters” of H2O molecules in bulk phase, hydrophilic “head group”, highly ordered H2O molecules form “cages” around the hydrophobic alkyl chains;
dispersion of lipids in H2O, each lipid molecule forces surrounding H2O molecules to become highly ordered → clusters of lipid molecules, only lipid portions at the edge of the cluster force the ordering of water, fewer H2O molecules are ordered, and entropy is increased → micelles, all hydrophobic groups are sequestered from water, no highly ordered shell of H2O molecules is present, and entropy is increased
Figure 4–7  (a) The long-chain fatty acids have very hydrophobic alkyl chains, each of which is surrounded by a layer of highly ordered water molecules. (b) By clustering together in micelles, the fatty acid molecules expose a smaller hydrophobic surface area to the water, and fewer water molecules are found in the shell of ordered water. The energy gained by freeing immobilized water molecules stabilizes the micelle.
Many biomolecules are amphipathic (Table 4–2); proteins, pigments, certain vitamins, and the sterols and phospholipids of membranes all have polar and nonpolar surface regions. Structures composed of these molecules are stabilized by hydrophobic interactions among the nonpolar regions. Hydrophobic interactions among lipids, and between lipids and proteins, are the most important determinants of structure in biological membranes; and hydrophobic interactions between nonpolar amino acids stabilize the three-dimensional folding patterns of proteins.
Hydrogen bonding between water and polar solutes also causes some ordering of water molecules, but the effect is less significant than with nonpolar solutes. Part of the driving force for the binding of a polar substrate to the complementary polar surface of an enzyme is the entropy increase resulting from the disordering of ordered water molecules around the substrate (reactant), as the enzyme displaces hydrogen-bonded water from the substrate.
Figure 4–8  The changes in energy as two atoms approach. Two opposite forces operate on the atoms, plotted here as a function of the distance between the atoms: an attraction that increases as the two approach (blue), and a repulsion that increases very sharply as the atoms come so close that their outer electron orbitals overlap (black). The net energy of the interaction is the sum of these two (red); an energy minimum occurs just before the repulsive effect dominates (at rme). The closest approach that is energetically feasible, rv, defines the van der Waals radii; it is the sum of the van der Waals radii of the two atoms.
When two uncharged atoms are brought very close together, their surrounding electron clouds influence each other. Random variations in the positions of the electrons around one nucleus may create a transient electric dipole, which induces a transient, opposite electric dipole in the nearby atom. The two dipoles are weakly attracted to each other, bringing the two nuclei closer. The force of this weak attraction is the van der Waals interaction. As the two nuclei draw closer together, their electron clouds begin to repel each other, and at some point the van der Waals attraction exactly balances this repulsive force (Fig. 4–8); the nuclei cannot be brought closer, and are said to be in van der Waals contact. For each atom, there is a characteristic van der Waals radius, a measure of how close that atom will allow another to approach (see Table 3–3).
van der Waals radii, energy of interaction, rv, rme, distance between centers of atoms
The noncovalent interactions we have described (hydrogen bonds and ionic, hydrophobic, and van der Waals interactions) are much weaker than covalent bonds (see Table 3–5). The input of about 350 kJ of energy is required to break a mole (6 × 1023) of C–C single bonds, and of about 410 kJ to break a mole of C–H bonds, but only 4 to 8 kJ is sufficient to disrupt a mole of typical van der Waals interactions (Table 4–4). Hydrophobic interactions are similarly weak, and ionic interactions and hydrogen bonds are only a little stronger; a typical hydrogen bond in aqueous solvent can be broken by the input of about 20 kJ/mol.
Table 4–4, Four weak interactions among biomolecules in aqueous solvent / Weak interaction, Stabilization energy (kJ/mol) / Hydrogen bonds / Between neutral groups, 8–21 / Between peptide bonds, 8–21 / Ionic interactions / Attraction, 42 / Repulsion, ≈–21 / Hydrophobic interactions, 4–8 / van der Waals interactions, Any two atoms in close proximity, 4
In aqueous solvent at 25 °C, the available thermal energy is of the same order as the strength of these weak interactions. Furthermore, the interaction between solute and solvent (water) molecules is nearly as favorable as solute–solute interactions. Consequently, hydrogen bonds and ionic, hydrophobic, and van der Waals interactions are continually formed and broken.
Although these four types of interactions are individually weak relative to covalent bonds, the cumulative effect of many such interactions in a protein or nucleic acid can be very significant. For example, the noncovalent binding of an enzyme to its substrate may involve several hydrogen bonds and one or more ionic interactions, as well as hydrophobic and van der Waals interactions. The formation of each of these weak bonds contributes to a net decrease in free energy; this binding free energy is released as bond formation stabilizes the system. The stability of a noncovalent interaction such as that of a small molecule hydrogen-bonded to its macromolecular partner is calculable from the binding energy. Stability, as measured by the equilibrium constant (see below) of the binding reaction, varies exponentially with binding energy. The unfolding of a molecule stabilized by numerous weak interactions requires many of these interactions to be disrupted at the same time; because the interactions fluctuate randomly, such simultaneous disruptions are very unlikely. The molecular stability bestowed by two or five or 20 weak interactions is therefore much greater than would be expected from a simple addition of binding energies.
Macromolecules such as proteins, DNA, and RNA contain so many sites of potential hydrogen bonding or ionic, van der Waals, or hydrophobic interactions that the cumulative effect of the many small binding forces is enormous. The most stable (native) structure of most macromolecules is that in which weak-bonding possibilities are maximized. The folding of a single polypeptide or polynucleotide chain into its three-dimensional shape is determined by this principle. The binding of an antigen to a specific antibody depends on the cumulative effects of many weak interactions. The energy released when an enzyme binds noncovalently to its substrate is the main source of catalytic power for the enzyme. The binding of a hormone or a neurotransmitter to its cellular receptor protein is the result of weak interactions. One consequence of the size of enzymes and receptors is that their large surfaces provide many opportunities for weak interactions. At the molecular level, the complementarity between interacting biomolecules reflects the complementarity and weak interactions between polar, charged, and hydrophobic groups on the surfaces of the molecules.
Although many of the solvent properties of water can be explained in terms of the uncharged H2O molecule, the small degree of ionization of water to hydrogen ions (H+) and hydroxide ions (OH) must also be taken into account. Like all reversible reactions, the ionization of water can be described by an equilibrium constant. When weak acids or weak bases are dissolved in water, they can contribute H+ by ionizing (if acids) or consume H+ by being protonated (if bases); these processes are also governed by equilibrium constants. The total hydrogen ion concentration from all sources is experimentally measurable; it is expressed as the pH of the solution. To predict the state of ionization of solutes in water, we must take into account the relevant equilibrium constants for each ionization reaction. We therefore turn now to a brief discussion of the ionization of water and of weak acids and bases dissolved in water.
Water molecules have a slight tendency to undergo reversible ionization to yield a hydrogen ion and a hydroxide ion, giving the equilibrium
H2O ⇌ H+ + OH
This reversible ionization is crucial to the role of water in cellular function, so we must have a means of expressing the extent of ionization of water in quantitative terms. A brief review of some properties of reversible chemical reactions will show how this can be done.
The position of equilibrium of any chemical reaction is given by its equilibrium constant. For the generalized reaction
A + B ⇌ C + D
an equilibrium constant can be defined in terms of the concentrations of reactants (A and B) and products (C and D) present at equilibrium:
Keq  =  
(Strictly speaking, the concentration terms should be the activities, or effective concentrations in nonideal solutions, of each species. Except in very accurate work, the equilibrium constant may be approximated by measuring the concentrations at equilibrium.)
The equilibrium constant is fixed and characteristic for any given chemical reaction at a specified temperature. It defines the composition of the final equilibrium mixture of that reaction, regardless of the starting amounts of reactants and products. Conversely, one can calculate the equilibrium constant for a given reaction at a given temperature if the equilibrium concentrations of all its reactants and products are known. We will show in a later chapter that the standard free-energy change (ΔG°) is directly related to Keq.
The degree of ionization of water at equilibrium (Eqn 4–1) is small; at 25 °C only about one of every 107 molecules in pure water is ionized at any instant. The equilibrium constant for the reversible ionization of water (Eqn 4–1) is
Keq  =  
In pure water at 25 °C, the concentration of water is 55.5 M (grams of H2O in 1 L divided by the gram molecular weight, or 1000/18 = 55.5 M), and is essentially constant in relation to the very low concentrations of H+ and OH, namely, 1 × 10−7 M. Accordingly, we can substitute 55.5 M in the equilibrium constant expression (Eqn 4–3) to yield
Keq  =  
55.5 M
which, on rearranging, becomes
(55.5 M)(Keq)  =  [H+][OH]  =  Kw
where Kw designates the product (55.5 M)(Keq), the ion product of water at 25 °C.
The value for Keq has been determined by electrical-conductivity measurements of pure water (in which only the ions arising from the dissociation of H2O can carry current) and found to be 1.8 × 10−16 M at 25 °C. Substituting this value for Keq in Equation 4–4 gives
(55.5 M)(1.8 × 10−16 M)   =  [H+][OH]
99.9 × 10−16 M2   =  [H+][OH]
1.0 × 10−14 M2   =  [H+][OH]   =  Kw
Thus the product [H+][OH] in aqueous solutions at 25 °C always equals 1 × 10−14 M2. When there are exactly equal concentrations of both H+ and OH, as in pure water, the solution is said to be at neutral pH. At this pH, the concentration of H+ and OH can be calculated from the ion product of water as follows:
Kw  =  [H+][OH]  =  [H+]2
Solving for [H+] gives
[H+]  =  Kw0.5  =  (1 × 10−14 M2)0.5
[H+]  =  [OH]  =  10−7 M
As the ion product of water is constant, whenever the concentration of H+ ions is greater than 1 × 10−7 M, the concentration of OH must become less than 1 × 10−7 M, and vice versa. When the concentration of H+ is very high, as in a solution of hydrochloric acid, the OH concentration must be very low. From the ion product of water we can calculate the H+ concentration if we know the OH concentration, and vice versa (Box 4–1).

The ion product of water makes it possible to calculate the concentration of H+, given the concentration of OH, and vice versa; the following problems demonstrate this.
  1.  What is the concentration of H+ in a solution of 0.1 M NaOH?
Kw  =  [H+][OH]
       Solving for [H+] gives
Kw 1 × 10−14 M2 10−14 M2
[H+]  =  
  =  10−13 M   (answer)
[OH] 0.1 M 10−1 M
  2.  What is the concentration of OH in a solution in which the H+ concentration is 0.00013 M?
Kw  =  [H+][OH]
       Solving for [OH] gives
Kw 1 × 10−14 M2 1 × 10−14 M2
[OH]  =  
  =  7.7 × 10−11 M   (answer)
[H+] 0.00013 M 1.3 × 10−4 M
The ion product of water, Kw, is the basis for the pH scale (Table 4–5). It is a convenient means of designating the actual concentration of H+ (and thus of OH) in any aqueous solution in the range between 1.0 M H+ and 1.0 M OH. The term pH is defined by the expression
pH  =  log 
  =  –log[H+]
The symbol p denotes “negative logarithm of”. For a precisely neutral solution at 25 °C, in which the concentration of hydrogen ions is 1.0 × 10−7 M, the pH can be calculated as follows:
pH   =   log 
  =  log (1 × 107)  =  log 1.0 + log 107
 1 × 10−7 
= 0 +  7.0
= 7.0
The value of 7.0 for the pH of a precisely neutral solution is not an arbitrarily chosen figure; it is derived from the absolute value of the ion product of water at 25 °C, which by convenient coincidence is a round number. Solutions having a pH greater than 7 are alkaline or basic; the concentration of OH is greater than that of H+. Conversely, solutions having a pH less than 7 are acidic (Table 4–5).
Table 4–5, The pH scale / [H+] (M), pH, [OH] (M), pOH* / 100(1), 0, 10−14, 14 / 10−1, 1, 10−13, 13 / 10−2, 2, 10−12, 12 / 10−3, 3, 10−11, 11 / 10−4, 4, 10−10, 10 / 10−5, 5, 10−9, 9 / 10−6, 6, 10−8, 8 / 10−7, 7, 10−7, 7 / 10−8, 8, 10−6, 6 / 10−9, 9, 10−5, 5 / 10−10, 10, 10−4, 4 / 10−11, 11, 10−3, 3 / 10−12, 12, 10−2, 2 / 10−13, 13, 10−1, 1 / 10−14, 14, 10−0(1), 0
* The expression pOH is sometimes used to describe the basicity, or OH concentration, of a solution; pOH is defined by the expression pOH = –log[OH], which is analogous to the expression for pH. Note that for all cases, pH + pOH = 14.
Note that the pH scale is logarithmic, not arithmetic. To say that two solutions differ in pH by 1 pH unit means that one solution has ten times the H+ concentration of the other, but it does not tell us the
absolute magnitude of the difference. Figure 4–9 gives the pH of some common aqueous fluids. A cola drink (pH 3.0) or red wine (pH 3.7) has an H+ concentration approximately 10,000 times greater than that of blood (pH 7.4).
The pH of an aqueous solution can be approximately measured using various indicator dyes, including litmus, phenolphthalein, and phenol red, which undergo color changes as a proton dissociates from the dye molecule. Accurate determinations of pH in the chemical or clinical laboratory are made with a glass electrode that is selectively sensitive to H+ concentration but insensitive to Na+, K+, and other cations. In a pH meter the signal from such an electrode is amplified and compared with the signal generated by a solution of accurately known pH.
Measurement of pH is one of the most important and frequently used procedures in biochemistry. The pH affects the structure and activity of biological macromolecules; for example, the catalytic activity of enzymes. Measurements of the pH of the blood and urine are commonly used in diagnosing disease. The pH of the blood plasma of severely diabetic people, for example, is often lower than the normal value of 7.4; this condition is called acidosis. In certain other disease states the pH of the blood is higher than normal, the condition of alkalosis.
1 M NaOH, household bleach, household ammonia, solution of baking soda (NaHCO3), seawater, egg white, human blood, tears, increasingly basic, neutral, increasingly acidic, milk, saliva, black coffee, beer, tomato juice, red wine, cola, vinegar, lemon juice, gastric juice, 1 M HCl
Figure 4–9  The pH of some aqueous fluids.
Hydrochloric, sulfuric, and nitric acids, commonly called strong acids, are completely ionized in dilute aqueous solutions; the strong bases NaOH and KOH are also completely ionized.
Biochemists are often more concerned with the behavior of weak acids and bases – those not completely ionized when dissolved in water. These are common in biological systems and play important roles in metabolism and its regulation. The behavior of aqueous solutions of weak acids and bases is best understood if we first define some terms.
Table 4–6, Some conjugate acid–base pairs* / Proton donor, Proton acceptor / CH3COOH (acetic acid), CH3COO / H3PO4 (phosphoric acid), H2PO4 / H2PO4 (dihydrogen phosphate), HPO42− / HPO42− (hydrogen phosphate), PO43− / NH4+ (ammonium), NH3 / H2CO3 (carbonic acid), HCO3 / HCO3 (bicarbonate), CO32− / (glycine) / * Each pair consists of a proton donor and a proton acceptor. Some compounds, such as acetic acid, are monoprotic; they can give up only one proton. Others are diprotic (H2CO3 and glycine) or triprotic (H3PO4).
Acids may be defined as proton donors and bases as proton acceptors. A proton donor and its corresponding proton acceptor make up a conjugate acid–base pair (Table 4–6). Acetic acid (CH3COOH), a proton donor, and the acetate anion (CH3COO), the corresponding proton acceptor, constitute a conjugate acid–base pair, related by the reversible reaction
Each acid has a characteristic tendency to lose its proton in an aqueous solution. The stronger the acid, the greater its tendency to lose its proton. The tendency of any acid (HA) to lose a proton and form its conjugate base (A) is defined by the equilibrium constant (K) for the reversible reaction
HA  ⇌  H+ + A
which is
K  =  
Equilibrium constants for ionization reactions are more usually called ionization or dissociation constants. The dissociation constants of some acids, often designated Ka, are given in Table 4–7. Stronger acids, such as formic and lactic acids, have higher dissociation constants; weaker acids, such as dihydrogen phosphate (H2PO4), have lower dissociation constants.
Table 4–7, Dissociation constant and pKa of some common weak acids (proton donors) at 25 °C / Acid, Ka (M), pKa / HCOOH (formic acid), 1.78 × 10−4, 3.75 / CH3COOH (acetic acid), 1.74 × 10−5, 4.76 / CH3CH2COOH (propionic acid), 1.35 × 10−5, 4.87 / CH3CH(OH)COOH (lactic acid), 1.38 × 10−4, 3.86 / H3PO4 (phosphoric acid), 7.25 × 10−3, 2.14 / H2PO4 (dihydrogen phosphate), 1.38 × 10−7, 6.86 / HPO42− (monohydrogen phosphate), 3.98 × 10−13, 12.4 / H2CO3 (carbonic acid), 1.70 × 10−4, 3.77 / HCO3 (bicarbonate), 6.31 × 10−11, 10.2 / NH4+ (ammonium), 5.62 × 10−10, 9.25
Also included in Table 4–7 are values of pKa, which is analogous to pH and is defined by the equation
pKa  =  log 
  =  – log Ka
The more strongly dissociated the acid, the lower its pKa. As we shall now see, the pKa of any weak acid can be determined quite easily.
Figure 4–10  The titration curve of acetic acid. After the addition of each increment of NaOH to the acetic acid solution, the pH of the mixture is measured. This value is plotted against the fraction of the total amount of NaOH required to neutralize the acetic acid (i.e., to bring it to pH ≈ 7). The points so obtained yield the titration curve. Shown in the boxes are the predominant ionic forms at the points designated. At the midpoint of the titration, the concentrations of the proton donor and proton acceptor are equal. The pH at this point is numerically equal to the pKa of acetic acid. The shaded zone is the useful region of buffering power.
pH, OH (equivalents), CH3COOH, [CH3COOH] = [CH3COO], pH = pKa = 4.76, CH3COO, pH 4.26 ← buffering region → pH 5.26
Titration is used to determine the amount of an acid in a given solution. In this procedure, a measured volume of the acid is titrated with a solution of a strong base, usually sodium hydroxide (NaOH), of known concentration. The NaOH is added in small increments until the acid is consumed (neutralized), as determined with an indicator dye or with a pH meter. The concentration of the acid in the original solution can be calculated from the volume and concentration of NaOH added.
A plot of the pH against the amount of NaOH added (a titration curve) reveals the pKa of the weak acid. Consider the titration of a 0.1 M solution of acetic acid (HAc) with 0.1 M NaOH at 25 °C (Fig. 4–10). Two reversible equilibria are involved in the process:
H2O   ⇌   H+ + OH
HAc   ⇌   H+ + Ac
The equilibria must simultaneously conform to their characteristic equilibrium constants, which are, respectively,
Kw  =  [H+][OH]  =  1 × 10−14 M2
Ka  =  
  =  1.74 × 10−5 M
At the beginning of the titration, before any NaOH is added, the acetic acid is already slightly ionized, to an extent that can be calculated from its dissociation constant (Eqn 4–8).
As NaOH is gradually introduced, the added OH combines with the free H+ in the solution to form H2O, to an extent that satisfies the equilibrium relationship in Equation 4–7. As free H+ is removed, HAc dissociates further to satisfy its own equilibrium constant (Eqn 4–8). The net result as the titration proceeds is that more and more HAc ionizes, forming Ac, as the NaOH is added. At the midpoint of the titration (Fig. 4–10), at which exactly 0.5 equivalent of NaOH has been added, one-half of the original acetic acid has undergone dissociation,
Figure 4–11  Comparison of the titration curves of three weak acids, CH3COOH, H2PO4, and NH4+. The predominant ionic forms at designated points in the titration are given in boxes. The regions of buffering capacity are indicated at the right. Conjugate acid–base pairs are effective buffers between approximately 25 and 75% neutralization of the proton-donor species.
so that the concentration of the proton donor, [HAc], now equals that of the proton acceptor, [Ac]. At this midpoint a very important relationship holds: the pH of the equimolar solution of acetic acid and acetate is exactly equal to the pKa of acetic acid (pKa = 4.76; see Table 4–7 and Fig. 4–10). The basis for this relationship, which holds for all weak acids, will soon become clear.
As the titration is continued by adding further increments of NaOH, the remaining undissociated acetic acid is gradually converted into acetate. The end point of the titration occurs at about pH 7.0: all the acetic acid has lost its protons to OH, to form H2O and acetate. Throughout the titration the two equilibria (Eqns 4–5 and 4–6) coexist, each always conforming to its equilibrium constant.
Figure 4–11 compares the titration curves of three weak acids with very different dissociation constants: acetic acid (pKa = 4.76); dihydrogen phosphate (pKa = 6.86); and ammonium ion, or NH4+ (pKa = 9.25). Although the titration curves of these acids have the same shape, they are displaced along the pH axis because these acids have different strengths. Acetic acid is the strongest and loses its proton most readily, since its Ka is highest (pKa lowest) of the three. Acetic acid is already half dissociated at pH 4.76. H2PO4 loses a proton less readily, being half dissociated at pH 6.86. NH4+ is the weakest acid of the three and does not become half dissociated until pH 9.25.
The most important point about the titration curve of a weak acid is that it shows graphically that a weak acid and its anion – a conjugate acid–base pair – can act as a buffer.
pH, OH (equivalents); NH4+, midpoint of titration → [NH4+] = [NH3], pKa = 9.25, NH3; H2PO4, [H2PO4] = [HPO42−], pKa = 6.86, HPO42−; CH3COOH, [CH3COOH] = [CH3COO], pKa = 4.76, CH3COO; buffering regions: 8.75 ← NH3 → 9.75, 6.36 ← phosphate → 7.36, 4.26 ← acetate → 5.26
Almost every biological process is pH dependent; a small change in pH produces a large change in the rate of the process. This is true not only for the many reactions in which the H+ ion is a direct participant, but also for those in which there is no apparent role for H+ ions. The enzymes that catalyze cellular reactions, and many of the molecules on which they act, contain ionizable groups with characteristic pKa values. The protonated amino (–NH3+) and carboxyl groups of amino acids and the phosphate groups of nucleotides, for example, function as weak acids; their ionic state depends upon the pH of the solution in which they are dissolved. As we noted above, ionic interactions are among the forces that stabilize a protein molecule and allow an enzyme to recognize and bind its substrate.
Cells and organisms maintain a specific and constant cytosolic pH, keeping biomolecules in their optimal ionic state, usually near pH 7. In multicellular organisms, the pH of the extracellular fluids (blood, for example) is also tightly regulated. Constancy of pH is achieved primarily by biological buffers: mixtures of weak acids and their conjugate bases.
We describe here the ionization equilibria that account for buffering, and show the quantitative relationship between the pH of a buffered solution and the pKa of the buffer. Biological buffering is illustrated by the phosphate and carbonate buffering systems of humans.
Figure 4–12  Capacity of the acetic acid–acetate couple to act as a buffer system, capable of absorbing either H+ or OH through the reversibility of the dissociation of acetic acid. The proton donor, in this case acetic acid (HAc), contains a reserve of bound H+, which can be released to neutralize an addition of OH to the system, forming H2O. This happens because the product [H+][OH] transiently exceeds Kw (1 × 10−14 M2). The equilibrium quickly adjusts so that this product equals 1 × 10−14 M2 (at 25 °C), thus transiently reducing the concentration of H+. But now the quotient [H+][Ac]/[HAc] is less then Ka, so HAc dissociates further to restore equilibrium. Similarly, the conjugate base, Ac, can react with H+ ions added to the system; again, the two ionization reactions simultaneously come to equilibrium. Thus a conjugate acid–base pair, such as acetic acid and acetate ion, tends to resist a change in pH when small amounts of acid or base are added. Buffering action is simply the consequence of two reversible reactions taking place simultaneously and reaching their points of equilibrium as governed by their equilibrium constants, Kw and Ka.
acetic acid (CH3COOH), HAc, OH, Kw = [H+][OH] → H2O, Ac, acetate (CH3COO), H+, Ka = [H+][Ac]/[HAc] →
Buffers are aqueous systems that tend to resist changes in their pH when small amounts of acid (H+) or base (OH) are added. A buffer system consists of a weak acid (the proton donor) and its conjugate base (the proton acceptor). As an example, a mixture of equal concentrations of acetic acid and acetate ion, found at the midpoint of the titration curve in Figure 4–10, is a buffer system. The titration curve of acetic acid has a relatively flat zone extending about 0.5 pH units on either side of its midpoint pH of 4.76. In this zone there is only a small change in pH when increments of either H+ or OH are added to the system. This relatively flat zone is the buffering region of the acetic acid–acetate buffer pair. At the midpoint of the buffering region, where the concentration of the proton donor (acetic acid) exactly equals that of the proton acceptor (acetate), the buffering power of the system is maximal; that is, its pH changes least on addition of an increment of H+ or OH. The pH at this point in the titration curve of acetic acid is equal to its pKa. The pH of the acetate buffer system does change slightly when a small amount of H+ or OH is added, but this change is very small compared with the pH change that would result if the same amount of H+ (or OH) were added to pure water or to a solution of the salt of a strong acid and strong base, such as NaCl, which have no buffering power.
Buffering results from two reversible reaction equilibria occurring in a solution of nearly equal concentrations of a proton donor and its conjugate proton acceptor. Figure 4–12 helps to explain how a buffer system works. Whenever H+ or OH is added to a buffer, the result is a small change in the ratio of the relative concentrations of the weak
acid and its anion and thus a small change in pH. The decrease in concentration of one component of the system is balanced exactly by an increase in the other. The sum of the buffer components does not change, only their ratio.
Each conjugate acid–base pair has a characteristic pH zone in which it is an effective buffer (Fig. 4–11). The H2PO4/HPO42− pair has a pKa of 6.86 and thus can serve as a buffer system near pH 6.86; the NH4+/NH3 pair, with a pKa of 9.25, can act as a buffer near pH 9.25.
The quantitative relationship among pH, the buffering action of a mixture of weak acid with its conjugate base, and the pKa of the weak acid is given by the Henderson–Hasselbalch equation. The titration curves of acetic acid, H2PO4, and NH4+ (Fig. 4–11) have nearly identical shapes, suggesting that they all reflect a fundamental law or relationship. This is indeed the case. The shape of the titration curve of any weak acid is expressed by the Henderson–Hasselbalch equation, which is important for understanding buffer action and acid–base balance in the blood and tissues of the vertebrate organism. This equation is simply a useful way of restating the expression for the dissociation constant of an acid. For the dissociation of a weak acid HA into H+ and A, the Henderson–Hasselbalch equation can be derived as follows:
Ka  =  
First solve for [H+]:
[H+]  =  Ka 
Then take the negative logarithm of both sides:
– log [H+]  =  – log Ka – log 
Substitute pH for –log [H+] and pKa for –log Ka:
pH  =  pKa – log 
Now invert –log [HA]/[A], which involves changing its sign, to obtain the Henderson–Hasselbalch equation:
pH  =  pKa + log 
which is stated more generally as
 [proton acceptor] 
pH  =  pKa  +  log 
[proton donor]
This equation fits the titration curve of all weak acids and enables us to deduce a number of important quantitative relationships. For example, it shows why the pKa of a weak acid is equal to the pH of the solution at the midpoint of its titration. At this point [HA] = [A], and
pH = pKa + log 1.0 = pKa + 0 = pKa
The Henderson–Hasselbalch equation also makes it possible to calculate the pKa of any acid from the molar ratio of proton-donor and proton-acceptor species at any given pH; to calculate the pH of a conjugate acid–base pair of a given pKa and a given molar ratio; and to calculate the molar ratio of proton donor and proton acceptor at any pH given the pKa of the weak acid (Box 4–2).

1. Calculate the pKa of lactic acid, given that when the concentration of free lactic acid is 0.010 M and the concentration of lactate is 0.087 M, the pH is 4.80.
pH   =  pKa  +  log 
[lactic acid]             
pKa   =  pH  –  log 
[lactic acid]               
          =  4.80  –  log 
  =  4.80 – log 8.7
  =  4.80 – 0.94  =  3.86   (answer)
2. Calculate the pH of a mixture of 0.1 M acetic acid and 0.2 M sodium acetate. The pKa of acetic acid is 4.76.
pH   =  pKa  +  log 
[acetic acid]         
          =  4.76  +  log 
  =  4.76 + 0.301
=  5.06   (answer)             
3. Calculate the ratio of the concentrations of acetate and acetic acid required in a buffer system of pH 5.30.
   pH   =  pKa  +  log 
[acetic acid] 
  =  pH  –  pKa                                     
[acetic acid] 
 =  5.30 – 4.76  =  0.54

  =  antilog 0.54  =  3.47   (answer)   
[acetic acid]
Figure 4–13  The amino acid histidine, a component of proteins, is a weak acid. The pKa of the protonated nitrogen of the side chain is 6.0.
The cytoplasm of most cells contains high concentrations of proteins, which contain many amino acids with functional groups that are weak acids or weak bases. The side chain of the amino acid histidine (Fig. 4–13) has a pKa of 6.0, and proteins containing histidine residues can therefore buffer effectively near neutral pH. Nucleotides such as ATP, as well as many low molecular weight metabolites, contain ionizable groups that can contribute buffering power to the cytoplasm. Some highly specialized organelles and extracellular compartments have high concentrations of compounds that contribute buffering capacity: organic acids buffer the vacuoles of plant cells; ammonia buffers urine.
The intracellular and extracellular fluids of all multicellular organisms have a characteristic and nearly constant pH, which is regulated by various biological activities. The organism’s first line of defense against changes in internal pH is provided by buffer systems. Two important biological buffers are the phosphate and bicarbonate systems. The phosphate buffer system, which acts in the cytoplasm of all cells, consists of H2PO4 as proton donor and HPO42− as proton acceptor:
H2PO4   ⇌   H+  +  HPO42−
The phosphate buffer system works exactly like the acetate buffer system, except for the pH range in which it functions. The phosphate buffer system is maximally effective at a pH close to its pKa of 6.86 (see Table 4–7 and Fig. 4–11), and thus tends to resist pH changes in the range between about 6.4 and 7.4. It is therefore effective in providing buffering power in intracellular fluids; in mammals, for example, extracellular fluids and most cytoplasmic compartments have a pH in the range of 6.9 to 7.4.
Blood plasma is buffered in part by the bicarbonate system, consisting of carbonic acid (H2CO3) as proton donor and bicarbonate (HCO3) as proton acceptor:
H2CO3   ⇌   H+  +  HCO3
This system has an equilibrium constant
K1  =  
and functions as a buffer in the same way as other conjugate acid–base pairs. It is unique, however, in that one of its components, carbonic acid (H2CO3), is formed from dissolved (d) carbon dioxide and water, according to the reversible reaction
CO2(d) + H2O   ⇌   H2CO3
which has an equilibrium constant given by the expression
K2  =  
Carbon dioxide is a gas under normal conditions, and the concentration of dissolved CO2 is the result of equilibration with CO2 of the gas phase:
CO2(g)   ⇌   CO2(d)
This process has an equilibrium constant given by
K3  =  
The pH of a bicarbonate buffer system depends on the concentration of H2CO3 and HCO3, the proton donor and acceptor components. The concentration of H2CO3 in turn depends on the concentration of dissolved CO2, which in turn depends on the concentration or partial pressure of CO2 in the gas phase; thus the pH of a bicarbonate buffer exposed to a gas phase is ultimately determined by the concentration of HCO3 in the aqueous phase and the partial pressure of CO2 in the gas phase (Box 4–3).
percent maximum activity, pH, pepsin, trypsin, alkaline phosphatase
Figure 4–14  The pH optima of some enzymes: pepsin, a digestive enzyme secreted into gastric juice (black); trypsin, a digestive enzyme that acts in the small intestine (red); alkaline phosphatase of bone tissue (blue).
Human blood plasma normally has a pH close to 7.40. Should the pH-regulating mechanisms fail or be overwhelmed, as may happen in severe uncontrolled diabetes when an overproduction of metabolic acids causes acidosis, the pH of the blood can fall to 6.8 or below, leading to irreparable cell damage and death. In other diseases the pH may rise to lethal levels. Although many aspects of cell structure and function are influenced by pH, it is the catalytic activity of enzymes that is especially sensitive. Enzymes typically show maximal catalytic activity at a characteristic pH, called the optimum pH (Fig. 4–14). On either side of the optimum pH their catalytic activity often declines sharply. Thus a small change in pH can make a large difference in the rate of some crucial enzyme-catalyzed reaction. Biological control of the pH of cells and body fluids is therefore of central importance in all aspects of metabolism and cellular activities.

In animals with lungs, the bicarbonate buffer system is an effective physiological buffer near pH 7.4 because the H2CO3 of the blood plasma is in equilibrium with a large reserve capacity of CO2(g) in the air space of the lungs. This buffer system involves three reversible equilibria between gaseous CO2 in the lungs and bicarbonate (HCO3) in the blood plasma (Fig. 1).
Figure 1  The CO2 in the air space of the lungs is in equilibrium with the bicarbonate buffer in the blood plasma passing through the lung capillaries. Because the concentration of dissolved CO2 can be adjusted rapidly through changes in the rate of breathing, the bicarbonate buffer system of the blood is in near-equilibrium with a large potential reservoir of CO2.
When H+ is added to blood as it passes through the tissues, reaction 1 proceeds toward a new equilibrium, in which the concentration of H2CO3 is increased. This increases the concentration of CO2(d) in the blood (reaction 2), and thus increases the pressure of CO2(g) in the air space of the lungs (reaction 3); the extra CO2 is exhaled.
Conversely, when OH is added to the blood plasma, the opposite events occur: the H+ concentration is lowered, causing more H2CO3 to dissociate into H+ and HCO3. This in turn causes more CO2(g) from the lungs to dissolve in the blood plasma. The rate of breathing, that is, the rate of inhaling and exhaling CO2, can quickly adjust these equilibria to keep the blood pH nearly constant.
aqueous phase (blood), H+ + HCO3, reaction 1 ⇌ H2CO3, reaction 2 ⇌ H2O, CO2(d), reaction 3 ⇌ CO2(g), gas phase (lung air space)
phosphate anhydride, (ATP), (ADP); phosphate ester; carboxylate ester; acylphosphate anhydride
Figure 4–15  Water participates directly in a variety of reactions. (a) ATP is a phosphate anhydride formed by a condensation reaction (loss of the elements of water) between ADP and phosphate. R represents adenosine monophosphate (AMP). This condensation reaction requires energy. The hydrolysis (addition of the elements of water) of ATP releases an equivalent amount of energy. (b), (c), and (d) represent similar condensation and hydrolysis reactions common in biological systems.
Water is not just the solvent in which the chemical reactions of living cells occur; it is very often a direct participant in those reactions. The formation of ATP from ADP and inorganic phosphate is a condensation reaction (see Fig. 3–14) in which the elements of water are eliminated (Fig. 4–15a). The compound formed by this condensation is called a phosphate anhydride. Hydrolysis reactions are responsible for the enzymatic depolymerization of proteins, carbohydrates, and nucleic acids ingested in the diet. Hydrolytic enzymes (hydrolases) catalyze the addition of the elements of water to the bonds that connect monomeric subunits in these macromolecules (Fig. 4–15). Hydrolysis reactions are almost invariably exergonic, and the formation of cellular polymers from their subunits by simple reversal of hydrolysis would be endergonic and as such does not occur. We shall see that cells circumvent this thermodynamic obstacle by coupling the endergonic condensation reactions to exergonic processes, such as breakage of the anhydride bond in ATP.
You are (we hope!) consuming oxygen as you read. Water and carbon dioxide are the end products of the oxidation of fuels such as glucose. The overall reaction of this process can be summarized by the equation:
C6H12O6   +  6O2   →   6CO2  +  6H2O
The “metabolic water” thus formed from stored fuels is actually enough to allow some animals in very dry habitats (gerbils, kangaroo rats, camels) to survive without drinking water for extended periods.
Green plants and algae use the energy of sunlight (represented by hν, the energy of light of frequency ν; h is Planck’s constant) to split water in the process of photosynthesis:
2H2O  +  2A   →   O2  +  2AH2
In this reaction, A is an electron-accepting species, which varies with the type of photosynthetic organism.
Organisms have effectively adapted to their aqueous environment and have even evolved means of exploiting the unusual properties of water. The high specific heat of water (the heat energy required to raise the temperature of 1 g of water by 1 °C) is useful to cells and organisms because it allows water to act as a “heat buffer”, permitting the temperature of an organism to remain relatively constant as the temperature of the air fluctuates and as heat is generated as a byproduct of metabolism. Furthermore, some vertebrates exploit the high heat of vaporization of water (see Table 4–1) by using (thus losing) excess body heat to evaporate sweat. The high degree of internal cohesion of liquid water, due to hydrogen bonding, is exploited by plants as a means of transporting dissolved nutrients from the roots to the leaves during the process of transpiration. Even the lower density of ice than of liquid water has important biological consequences in the life cycles of aquatic organisms. Ponds freeze from the top down, and the layer of ice at the top insulates the water below from frigid air, preventing the pond (and the organisms in it) from freezing solid. Most fundamental to all living organisms is the fact that many physical and biological properties of cell macromolecules, particularly the proteins and nucleic acids, derive from their interactions with water molecules of the surrounding medium. The influence of water on the course of biological evolution has been profound and determinative. If life forms have evolved elsewhere in the universe, it is unlikely that they resemble those of earth, unless their extraterrestrial origin is also a place in which plentiful liquid water is available as solvent.
Aqueous environments support a myriad of species. Soft corals, sponges, bryozoans, and algae compete for space on this reef substrate off the Philippine Islands.
Water is the most abundant compound in living organisms. Its relatively high freezing point, boiling point, and heat of vaporization are the result of strong intermolecular attractions in the form of hydrogen bonding between adjacent water molecules. Liquid water has considerable short-range order and consists of short-lived hydrogen-bonded clusters. The polarity and hydrogen-bonding properties of water make it a potent solvent for many ionic compounds and other polar molecules. Nonpolar compounds, including the gases CO2, O2, and N2, are poorly soluble in water. Water disperses amphipathic molecules to form micelles, clusters of molecules in which the hydrophobic groups are hidden from water and the polar groups are exposed on the external surface.
Four types of weak interactions occur within and between biomolecules in an aqueous solvent: hydrogen bonds and ionic, hydrophobic, and van der Waals interactions. Although weak individually, these interactions collectively create a very strong stabilizing force for proteins, nucleic acids, and membranes. Weak (noncovalent) interactions are also at the heart of enzyme catalysis, antibody function, and receptor–ligand interactions.
Water ionizes very slightly to form H+ and OH ions. In dilute aqueous solutions, the concentrations of H+ and OH ions are inversely related by the expression Kw = [H+][OH] = 1 × 10−14 M2 (at 25 °C). The hydrogen-ion concentration of biological systems is usually expressed in terms of pH, defined as pH = –log [H+]. The pH of aqueous solutions is measured by means of glass electrodes sensitive to H+ concentration.
Acids are defined as proton donors and bases as proton acceptors. A conjugate acid–base pair consists of a proton donor (HA) and its corresponding proton acceptor (A). The tendency of an acid HA to donate protons is expressed by its dissociation constant (Ka = [H+][A]/[HA]) or by the function pKa, defined as –log Ka, which can be determined from an experimental titration curve. The pH of a solution of a weak acid is quantitatively related to its pKa and to the ratio of the concentrations of its proton-donor and proton-acceptor species by the Henderson–Hasselbalch equation.
A conjugate acid–base pair can act as a buffer and resist changes in pH; its capacity to do so is greatest at a pH equal to its pKa. Many types of biomolecules have functional groups that contribute buffering capacity. H2CO3/HCO3 and H2PO4/HPO42− are important biological buffer systems. The catalytic activity of enzymes is strongly influenced by pH, and it is essential that the environments in which they function be buffered against large pH changes.
Water is not only the solvent in which metabolic reactions occur; it participates directly in many of the reactions, including hydrolysis and condensation reactions.
The physical and chemical properties of water are central to biological structure and function. The evolution of life on earth was doubtless influenced greatly by both the solvent and reactant properties of water.
Further Reading
Dick, D.A.T. (1966) Cell Water, Butterworth Publishers, Inc., Stoneham, MA. 
A classic description of the properties and functions of water in living organisms.
Edsall, J.T. & Wyman, J. (1958) Biophysical Chemistry, Vol. 1, Academic Press, Inc., New York. 
An excellent discussion of water and its fitness as a biological solvent.
Eisenberg, D. & Kauzmann, W. (1969) The Structure and Properties of Water, Oxford University Press, New York. 
An advanced treatment of the physical chemistry of water.
Franks, F. (ed) (1975) Water – A Comprehensive Treatise, Vol. 4, Plenum Press, New York. 
Franks, F. & Mathias, S.F. (eds) (1982) Biophysics of Water, John Wiley & Sons, Inc., New York. 
A large collection of papers on the structure of pure water and of the cytoplasm.
Henderson, L.J. (1927) The Fitness of the Environment, Beacon Press, Boston, MA. [Reprinted (1958).] 
This book is a classic; it includes a discussion of the suitability of water as the solvent for life on earth.
Kuntz, I.D. & Zipp, A. (1977) Water in biological systems. New Engl. J. Med. 297, 262–266. 
A brief review of the physical state of cytosolic water and its interactions with dissolved biomolecules.
Solomon, A.K. (1971) The state of water in red cells. Sci. Am. 224 (February), 88–96. 
A description of research on the structure of water within cells.
Stillinger, F.H. (1980) Water revisited. Science 209, 451–457. 
A short review of the physical structure of water, including the importance of hydrogen bonding and the nature of hydrophobic interactions.
Symons, M.C.R. (1981) Water structure and reactivity. Acc. Chem. Res. 14, 179–187. 
Wiggins, P.M. (1990) Role of water in some biological processes. Microbiol. Rev. 54, 432–449. 
A recent and excellent review of water in biology, including discussion of the physical structure of liquid water, its interaction with biomolecules, and the state of water in living cells.
Weak Interactions in Aqueous Systems
Fersht, A.R. (1987) The hydrogen bond in molecular recognition. Trends Biochem. Sci. 12, 301–304. 
A clear, brief, quantitative discussion of the contribution of hydrogen bonding to molecular recognition and enzyme catalysis.
Frieden, E. (1975) Non-covalent interactions: key to biological flexibility and specificity. J. Chem. Educ. 52, 754–761. 
Review of the four kinds of weak interactions that stabilize macromolecules and confer biological specificity, with clear examples.
Tanford, C. (1978) The hydrophobic effect and the organization of living matter. Science 200, 1012–1018. 
An excellent review of the chemical and energetic basis for hydrophobic interactions between biomolecules in aqueous solutions.
Weak Acids, Weak Bases, and Buffers
Montgomery, R. & Swenson, C.A. (1976) Quantitative Problems in the Biochemical Sciences, 2nd edn, W.H. Freeman and Company, New York. 
This and the following book are excellent compilations of solved problems, many of which concern pH, the ionization of weak acids and bases, and buffers.
Segel, I.H. (1976) Biochemical Calculations, 2nd edn, John Wiley & Sons, Inc., New York. 
1. Artificial Vinegar  One way to make vinegar (not the preferred way) is to prepare a solution of acetic acid, the sole acid component of vinegar, at the proper pH (see Fig. 4–9) and add appropriate flavoring agents. Acetic acid (Mr 60) is a liquid at 25 °C with a density of 1.049 g/mL. Calculate the amount (volume) that must be added to distilled water to make 1 L of simulated vinegar (see Table 4–7).
2. Acidity of Gastric HCl  In a hospital laboratory, a 10.0 mL sample of gastric juice, obtained several hours after a meal, was titrated with 0.1 M NaOH to neutrality; 7.2 mL of NaOH was required. The stomach contained no ingested food or drink, thus assume that no buffers were present. What was the pH of the gastric juice?
3. Measurement of Acetylcholine Levels by pH Changes  The concentration of acetylcholine, a neurotransmitter, can be determined from the pH changes that accompany its hydrolysis. When incubated with a catalytic amount of the enzyme acetylcholinesterase, acetylcholine is quantitatively converted into choline and acetic acid, which dissociates to yield acetate and a hydrogen ion:

In a typical analysis, 15 mL of an aqueous solution containing an unknown amount of acetylcholine had a pH of 7.65. When incubated with acetylcholinesterase, the pH of the solution decreased to a final value of 6.87. Assuming that there was no buffer in the assay mixture, determine the number of moles of acetylcholine in the 15 mL of unknown.
4. Significance of the pKa of an Acid  One common description of the pKa of an acid is that it represents the pH at which the acid is half ionized, that is, the pH at which it exists as a 50:50 mixture of the acid and the conjugate base. Demonstrate this relationship for an acid HA, starting from the equilibrium-constant expression.
5. Properties of a Buffer  The amino acid glycine is often used as the main ingredient of a buffer in biochemical experiments. The amino group of glycine, which has a pKa of 9.6, can exist either in the protonated form (–NH3+) or as the free base (–NH2) because of the reversible equilibrium
R–NH3+   ⇌   R–NH2  +  H+
       (a) In what pH range can glycine be used as an effective buffer due to its amino group?
       (b) In a 0.1 M solution of glycine at pH 9.0, what fraction of glycine has its amino group in the –NH3+ form?
       (c) How much 5 M KOH must be added to 1.0 L of 0.1 M glycine at pH 9.0 to bring its pH to exactly 10.0?
       (d) In order to have 99% of the glycine in its –NH3+ form, what must the numerical relation be between the pH of the solution and the pKa of the amino group of glycine?
6. The Effect of pH on Solubility  The strongly polar hydrogen-bonding nature of water makes it an excellent solvent for ionic (charged) species. By contrast, un-ionized, nonpolar organic molecules, such as benzene, are relatively insoluble in water. In principle, the aqueous solubility of all organic acids or bases can be increased by deprotonation or protonation of the molecules, respectively, to form charged species. For example, the solubility of benzoic acid in water is low. The addition of sodium bicarbonate raises the pH of the solution and deprotonates the benzoic acid to form benzoate ion, which is quite soluble in water.

Are the molecules in (a) to (c) (below) more soluble in an aqueous solution of 0.1 M NaOH or 0.1 M HCl? (The dissociable protons are shown in red.)

7. Treatment of Poison Ivy Rash  Catechols substituted with long-chain alkyl groups are the components of poison ivy and poison oak that produce the characteristic itchy rash.

If you were exposed to poison ivy, which of the treatments below would you apply to the affected area? Justify your choice.
       (a) Wash the area with cold water.
       (b) Wash the area with dilute vinegar or lemon juice.
       (c) Wash the area with soap and water.
       (d) Wash the area with soap, water, and baking soda (sodium bicarbonate).
8. pH and Drug Absorption  Aspirin is a weak acid with a pKa of 3.5.

It is absorbed into the blood through the cells lining the stomach and the small intestine. Absorption requires passage through the cell membrane, which is determined by the polarity of the molecule: charged and highly polar molecules pass slowly, whereas neutral hydrophobic ones pass rapidly. The pH of the gastric juice in the stomach is about 1.5 and the pH of the contents of the small intestine is about 6. Is more aspirin absorbed into the bloodstream from the stomach or from the small intestine? Clearly justify your choice.
9. Preparation of Standard Buffer for Calibration of a pH Meter  The glass electrode used in commercial pH meters gives an electrical response proportional to the hydrogen-ion concentration. To convert these responses into pH, glass electrodes must be calibrated against standard solutions of known hydrogen-ion concentration. Determine the weight in grams of sodium dihydrogen phosphate (NaH2PO4 ‧ H2O; formula weight (FW) 138.01) and disodium hydrogen phosphate (Na2HPO4; FW 141.98) needed to prepare 1 L of a standard buffer at pH 7.00 with a total phosphate concentration of 0.100 M (see Table 4–7).
10. Control of Blood pH by the Rate of Respiration 
       (a) The partial pressure of CO2 in the lungs can be varied rapidly by the rate and depth of breathing. For example, a common remedy to alleviate hiccups is to increase the concentration of CO2 in the lungs. This can be achieved by holding one’s breath, by very slow and shallow breathing (hypoventilation), or by breathing in and out of a paper bag. Under such conditions, the partial pressure of CO2 in the air space of the lungs rises above normal. Qualitatively explain the effect of these procedures on the blood pH.
       (b) A common practice of competitive short-distance runners is to breathe rapidly and deeply (hyperventilation) for about half a minute to remove CO2 from their lungs just before running in, say, a 100 m dash. Their blood pH may rise to 7.60. Explain why the blood pH goes up.
       (c) During a short-distance run the muscles produce a large amount of lactic acid from their glucose stores. In view of this fact, why might hyperventilation before a dash be useful?
Part II
Structure and Catalysis
Facing page:  End-on view of the triple-stranded collagen superhelix. Collagen,
a component of connective tissue, provides tensile strength and resiliency. Its
strength is derived in part from the three tightly wrapped identical helical strands
(shown in gray, purple, and blue), much the way a length of rope is stronger than
its constituent fibers. The tight wrapping is made possible by the presence of glycine,
shown in red, at every third position along each strand, where the strands are in
contact. Glycine’s small size allows for very close contact.
In Part I we contrasted the complex structure and function of living cells with the relative simplicity of the monomeric units from which the enzymes, supramolecular complexes, and organelles of the cells are constructed. Part II is devoted to the structure and function of the major classes of cellular constituents: amino acids and proteins (Chapters 5 through 8), fatty acids, lipids, and membranes (Chapters 9 and 10), sugars and polysaccharides (Chapter 11), and nucleotides and nucleic acids (Chapter 12). We begin in each case by considering the covalent structure of the simple subunits (amino acids, fatty acids, monosaccharides, and nucleotides). These subunits are a major part of the language of biochemistry; familiarity with them is a prerequisite for understanding more advanced topics covered in this book, as well as the rapidly growing and exciting literature of biochemistry.
After describing the covalent chemistry of the monomeric units, we consider the structure of the macromolecules and supramolecular complexes derived from them. An overriding theme is that the polymeric macromolecules in living systems, though large, are highly ordered chemical entities, with specific sequences of monomeric subunits giving rise to discrete structures and functions. This fundamental theme can be broken down into three interrelated principles: (1) the unique structure of each macromolecule determines its function; (2) noncovalent interactions play a critical role in the structure and function of macromolecules; and (3) the specific sequences of monomeric subunits in polymeric macromolecules contain the information upon which the ordered living state depends. Each of these principles deserves further comment.
The relationship between structure and function is especially evident in proteins, which exhibit an extraordinary diversity of functions. One particular polymeric sequence of amino acids produces a strong, fibrous structure found in hair and wool; another produces a protein that transports oxygen in the blood. Similarly, the special functions of lipids, polysaccharides, and nucleic acids can be understood as a direct manifestation of their chemical structure, with their characteristic monomeric subunits linked in precise functional groups or polymers. Lipids aggregate to form membranes; sugars linked together become energy stores and structural fibers; nucleotides in a polymer become the blueprint for an entire organism.
As we move from monomeric units to larger and larger polymers, the chemical focus shifts from covalent bonds to noncovalent interactions. The covalent nature of monomeric units, and of the bonds that connect them in polymers, places strong constraints upon the shapes
assumed by large molecules. It is the numerous noncovalent interactions, however, that dictate the stable native conformation and provide the flexibility necessary for the biological function of these large molecules. We will see that noncovalent interactions are essential to the catalytic power of enzymes, the arrangement and properties of lipids in a membrane, and the critical interaction of complementary base pairs in nucleic acids.
The principle that sequences of monomeric subunits are information-rich emerges fully in the discussion of nucleic acids in Chapter 12. However, proteins and some polysaccharides are also information-rich molecules. The amino acid sequence is a form of information that directs the folding of the protein into its unique three-dimensional structure, and ultimately determines the function of the protein. Some polysaccharides also have unique sequences and three-dimensional structures that can be recognized by other macromolecules.
For each class of molecules we find a similar structural hierarchy, in which subunits of fixed structure are connected by bonds of limited flexibility, to form macromolecules with three-dimensional structures determined by noncovalent interactions. Together, the molecules described in Part II are the “stuff” of life. We begin with the amino acids.
Chapter 5
Amino Acids and Peptides
Proteins are the most abundant macromolecules in living cells, occurring in all cells and all parts of cells. Proteins also occur in great variety; thousands of different kinds may be found in a single cell. Moreover, proteins exhibit great diversity in their biological function. Their central role is made evident by the fact that proteins are the most important final products of the information pathways discussed in Part IV of this book. In a sense, they are the molecular instruments through which genetic information is expressed. It is appropriate to begin the study of biological macromolecules with the proteins, whose name derives from the Greek prōtos, meaning “first” or “foremost”.
Relatively simple monomeric subunits provide the key to the structure of the thousands of different proteins. All proteins, whether from the most ancient lines of bacteria or from the most complex forms of life, are constructed from the same ubiquitous set of 20 amino acids, covalently linked in characteristic linear sequences. Because each of these amino acids has a distinctive side chain that determines its chemical properties, this group of 20 precursor molecules may be regarded as the alphabet in which the language of protein structure is written.
Figure 5–1  The protein keratin is formed by all vertebrates. It is the chief structural component of hair, scales, horn, wool, nails, and feathers. The black rhinoceros is nearing extinction in the wild because of the myths prevalent in some parts of the world that a powder derived from its horn has aphrodisiac properties. In reality, the chemical properties are no different from those of powdered bovine hooves or human fingernails.
Proteins are chains of amino acids, each joined to its neighbor by a specific type of covalent bond. What is most remarkable is that cells can produce proteins that have strikingly different properties and activities by joining the same 20 amino acids in many different combinations and sequences. From these building blocks different organisms can make such widely diverse products as enzymes, hormones, antibodies, the lens protein of the eye, feathers, spider webs, rhinoceros horns (Fig. 5–1), milk proteins, antibiotics, mushroom poisons, and a myriad of other substances having distinct biological activities.
Protein structure and function is the topic for the next four chapters. In this chapter we begin with a description of amino acids and the covalent bonds that link them together in peptides and proteins.
Proteins can be reduced to their constituent amino acids by a variety of methods, and the earliest studies of proteins naturally focused on the free amino acids derived from them. The first amino acid to be discovered in proteins was asparagine, in 1806. The last of the 20 to be found, threonine, was not identified until 1938. All the amino acids have trivial or common names, in some cases derived from the source from which they were first isolated. Asparagine was first found in asparagus, as one might guess; glutamate was found in wheat gluten; tyrosine was first isolated from cheese (thus its name is derived from the Greek tyros, “cheese”); and glycine (Greek glykos, “sweet”) was so named because of its sweet taste.
amino acid, glycine
Figure 5–2  General structure of the amino acids found in proteins. With the exception of the nature of the R group, this structure is common to all the α-amino acids. (Proline, because it is an imino acid, is an exceptional component of proteins.) The α carbon is shown in blue. R (in red) represents the R group or side chain, which is different in each amino acid. In all amino acids except glycine (shown for comparison) the α-carbon atom has four different substituent groups.
All of the 20 amino acids found in proteins have a carboxyl group and an amino group bonded to the same carbon atom (the α carbon) (Fig. 5–2). They differ from each other in their side chains, or R groups, which vary in structure, size, and electric charge, and influence the solubility of amino acids in water. When the R group contains additional carbons in a chain, they are designated β, γ, δ, ε, etc., proceeding out from the α carbon. The 20 amino acids of proteins are often referred to as the standard, primary, or normal amino acids, to distinguish them from amino acids within proteins that are modified after the proteins are synthesized, and from many other kinds of amino acids present in living organisms but not in proteins. The standard amino acids have been assigned three-letter abbreviations and one-letter symbols (Table 5–1), which are used as shorthand to indicate the composition and sequence of amino acids in proteins.
We note in Figure 5–2 that for all the standard amino acids except one (glycine) the α carbon is asymmetric, bonded to four different substituent groups: a carboxyl group, an amino group, an R group, and a hydrogen atom. The α-carbon atom is thus a chiral center (see Fig. 3–9). Because of the tetrahedral arrangement of the bonding orbitals around the α-carbon atom of amino acids, the four different substituent groups can occupy two different arrangements in space, which are nonsuperimposable mirror images of each other (Fig. 5–3). These two forms are called enantiomers or stereoisomers (see Fig. 3–9). All molecules with a chiral center are also optically active – i.e., they can rotate plane-polarized light, with the direction of the rotation differing for different stereoisomers.
L-alanine, D-alanine
Figure 5–3  (a) The two stereoisomers of alanine. L- and D-alanine are nonsuperimposable mirror images of each other. (b, c) Two different conventions for showing the configurations in space of stereoisomers. In perspective formulas (b) the wedge-shaped bonds project out of the plane of the paper, the dashed bonds behind it. In projection formulas (c) the horizontal bonds are assumed to project out of the plane of the paper, the vertical bonds behind. However, projection formulas are often used casually without reference to stereochemical configuration.
Table 5–1  Properties and conventions associated with the standard amino acids
  Abbre­viated     pK1  pK2  pKR      Hydro­pathy   Occurrence 
Amino acid              names    Mr    (–COOH)   (–NH3+)   (R group)     pI   index* in Proteins (%)†
Nonpolar, aliphatic
R groups
  Glycine   Gly G 75 2.34 9.60   5.97 –0.4 7.5
  Alanine   Ala A 89 2.34 9.69   6.01 1.8 9.0
  Valine   Val V 117 2.32 9.62   5.97 4.2 6.9
  Leucine   Leu L 131 2.36 9.60   5.98 3.8 7.5
  Isoleucine   Ile I 131 2.36 9.68   6.02 4.5 4.6
  Proline   Pro P 115 1.99 10.96   6.48 –1.6 4.6
Aromatic R groups
  Phenylalanine   Phe F 165 1.83 9.13   5.48 2.8 3.5
  Tyrosine   Tyr Y 181 2.20 9.11 10.07 5.66 –1.3 3.5
  Tryptophan   Trp 204 2.38 9.39   5.89 –0.9 1.1
Polar, uncharged
R groups
  Serine   Ser S 105 2.21 9.15 13.60 5.68 –0.8 7.1
  Threonine   Thr T 119 2.11 9.62 13.60 5.87 –0.7 6.0
  Cysteine   Cys C 121 1.96 8.18 10.28 5.07 2.5 2.8
  Methionine   Met    M 149 2.28 9.21   5.74 1.9 1.7
  Asparagine   Asn N 132 2.02 8.80   5.41 –3.5 4.4
  Glutamine   Gln Q 146 2.17 9.13   5.65 –3.5 3.9
Negatively charged
R groups
  Aspartate   Asp D 133 1.88 9.60 3.65 2.77 –3.5 5.5
  Glutamate   Glu E 147 2.19 9.67 4.25 3.22 –3.5 6.2
Positively charged
R groups
  Lysine   Lys K 146 2.18 8.95 10.53 9.74 –3.9 7.0
  Arginine   Arg R 174 2.17 9.04 12.48 10.76 –4.5 4.7
  Histidine   His H    155      1.82      9.17      6.00      7.59      –3.2            2.1          

* A scale combining hydrophobicity and hydrophilicity; can be used to predict which amino acids will be found in an aqueous environment (– values) and which will be found in a hydrophobic environment (+ values). See Box 10–2. From Kyte, J. & Doolittle, R.F. (1982) J. Mol. Biol. 157, 105–132.

† Average occurrence in over 200 proteins. From Klapper, M.H. (1977) Biochem. Biophys. Res. Commun. 78, 1018–1024.
L-glyceraldehyde, D-glyceraldehyde, L-alanine, D-alanine
Figure 5–4  Steric relationship of the stereoisomers of alanine to the absolute configuration of L- and D-glyceraldehyde. In these perspective formulas, the carbons are lined up vertically, with the chiral atom in the center. The carbons in these molecules are numbered beginning with the aldehyde or carboxyl carbons on the end, or 1 to 3 from top to bottom as shown. When presented in this way, the R group of the amino acid (in this case the methyl group of alanine) is always below the α carbon. L-Amino acids are those with the α-amino group on the left, and D-amino acids have the α-amino group on the right.
The classification and naming of stereoisomers is based on the absolute configuration of the four substituents of the asymmetric carbon atom. For this purpose a reference compound has been chosen, to which all other optically active compounds are compared. This reference compound is the 3-carbon sugar glyceraldehyde (Fig. 5–4), the smallest sugar to have an asymmetric carbon atom. The naming of configurations of both simple sugars and amino acids is based on the absolute configuration of glyceraldehyde, as established by x-ray diffraction analysis. The stereoisomers of all chiral compounds having a configuration related to that of L-glyceraldehyde are designated L (for levorotatory, derived from levo, meaning “left”), and the stereoisomers related to D-glyceraldehyde are designated D (for dextrorotatory, derived from dextro, meaning “right”). The symbols L and D thus refer to the absolute configuration of the four substituents around the chiral carbon.
Nearly all biological compounds with a chiral center occur naturally in only one stereoisomeric form, either D or L. The amino acids in protein molecules are the L stereoisomers. D-Amino acids have been found only in small peptides of bacterial cell walls and in some peptide antibiotics (see Fig. 5–19).
It is remarkable that the amino acids of proteins are all L stereoisomers. As we noted in Chapter 3, when chiral compounds are formed by ordinary chemical reactions, a racemic mixture of D and L isomers results. Whereas the L and D forms of chiral molecules are difficult for a chemist to distinguish and isolate, they are as different as night and day to a living system. The ability of cells to specifically synthesize the L isomer of amino acids reflects one of many extraordinary properties of enzymes (Chapter 8). The stereospecificity of the reactions catalyzed by some enzymes is made possible by the asymmetry of their active sites. The characteristic three-dimensional structures of proteins (Chapter 7), which dictate their diverse biological activities, require that all their constituent amino acids be of one stereochemical series.
Amino acids in aqueous solution are ionized and can act as acids or bases. Knowledge of the acid–base properties of amino acids is extremely important in understanding the physical and biological properties of proteins. Moreover, the technology of separating, identifying, and quantifying the different amino acids, which are necessary steps in determining the amino acid composition and sequence of protein molecules, is based largely on their characteristic acid–base behavior.
nonionic form, zwitterionic form
Figure 5–5  Nonionic and zwitterionic forms of amino acids. Note the separation of the + and – charges in the zwitterion, which makes it an electric dipole. The nonionic form does not occur in significant amounts in aqueous solutions. The zwitterion predominates at neutral pH.
Those α-amino acids having a single amino group and a single carboxyl group crystallize from neutral aqueous solutions as fully ionized species known as zwitterions (German for “hybrid ions”), each having both a positive and a negative charge (Fig. 5–5). These ions are electrically neutral and remain stationary in an electric field. The dipolar nature of amino acids was first suggested by the observation that crystalline amino acids have melting points much higher than those of other organic molecules of similar size. The crystal lattice of amino acids is held together by strong electrostatic forces between positively and negatively charged functional groups of neighboring molecules, resembling the stable ionic crystal lattice of NaCl (see Fig. 4–6).
An understanding of the chemical properties of the standard amino acids is central to an understanding of much of biochemistry. The topic can be simplified by grouping the amino acids into classes based on the properties of their R groups (Table 5–1), in particular, their polarity or tendency to interact with water at biological pH (near pH 7.0). The polarity of the R groups varies widely, from totally nonpolar or hydrophobic (water-insoluble) to highly polar or hydrophilic (water-soluble).
The structures of the 20 standard amino acids are shown in Figure 5–6, and many of their properties are listed in Table 5–1. There are five main classes of amino acids, those whose R groups are: nonpolar and aliphatic; aromatic (generally nonpolar); polar but uncharged; negatively charged; and positively charged. Within each class there are gradations of polarity, size, and shape of the R groups.
Nonpolar, Aliphatic R Groups  The hydrocarbon R groups in this class of amino acids are nonpolar and hydrophobic (Fig. 5–6). The bulky side chains of alanine, valine, leucine, and isoleucine, with their distinctive shapes, are important in promoting hydrophobic interactions within protein structures. Glycine has the simplest amino acid structure. Where it is present in a protein, the minimal steric hindrance of the glycine side chain allows much more structural flexibility than the other amino acids. Proline represents the opposite structural extreme. The secondary amino (imino) group is held in a rigid conformation that reduces the structural flexibility of the protein at that point.
nonpolar, aliphatic R groups, glycine, alanine, valine, leucine, isoleucine, proline;
aromatic R groups, phenylalanine, tyrosine, tryptophan;
polar, uncharged R groups, serine, threonine, cysteine, methionine, asparagine, glutamine; positively charged R groups, lysine, arginine, histidine; negatively charged R groups, aspartate, glutamate
Figure 5–6  The 20 standard amino acids of proteins. They are shown with their amino and carboxyl groups ionized, as they would occur at pH 7.0. The portions in black are those common to all the amino acids; the portions shaded in red are the R groups.
Aromatic R Groups  Phenylalanine, tyrosine, and tryptophan, with their aromatic side chains (Fig. 5–6), are relatively nonpolar (hydrophobic). All can participate in hydrophobic interactions, which are particularly strong when the aromatic groups are stacked on one another. The hydroxyl group of tyrosine can form hydrogen bonds, and it acts as an important functional group in the activity of some enzymes. Tyrosine and tryptophan are significantly more polar than phenylalanine because of the tyrosine hydroxyl group and the nitrogen of the tryptophan indole ring.
Tryptophan and tyrosine, and to a lesser extent phenylalanine, absorb ultraviolet light (Fig. 5–7 and Box 5–1). This accounts for the characteristic strong absorbance of light by proteins at a wavelength of 280 nm, and is a property exploited by researchers in the characterization of proteins.
Polar, Uncharged R Groups  The R groups of these amino acids (Fig. 5–6) are more soluble in water, or hydrophilic, than those of the nonpolar amino acids, because they contain functional groups that form hydrogen bonds with water. This class of amino acids includes serine, threonine, cysteine, methionine, asparagine, and glutamine. The polarity of serine and threonine is contributed by their hydroxyl groups; that of cysteine and methionine by their sulfur atom; and that of asparagine and glutamine by their amide groups.
absorbance, wavelength (nm), phenylalanine, tyrosine, tryptophan
Figure 5–7  Comparison of the light absorbance spectra of the aromatic amino acids at pH 6.0. The amino acids are present in equimolar amounts (10−3 M) under identical conditions. The light absorbance of tryptophan is as much as fourfold higher than that of tyrosine. Phenylalanine absorbs less light than either tryptophan or tyrosine. Note that the absorbance maximum for tryptophan and tyrosine occurs near a wavelength of 280 nm.
Asparagine and glutamine are the amides of two other amino acids also found in proteins, aspartate and glutamate, respectively, to which asparagine and glutamine are easily hydrolyzed by acid or base. Cysteine has an R group (a thiol group) that is approximately as acidic as the hydroxyl group of tyrosine. Cysteine requires special mention for another reason. It is readily oxidized to form a covalently linked dimeric amino acid called cystine, in which two cysteine molecules are joined by a disulfide bridge. Disulfide bridges of this kind occur in many proteins, stabilizing their structures.
cysteine, cysteine ⇌ 2H, cystine
Negatively Charged (Acidic) R Groups  The two amino acids having R groups with a net negative charge at pH 7.0 are aspartate and glutamate, each with a second carboxyl group (Fig. 5–6). These amino acids are the parent compounds of asparagine and glutamine, respectively.
Positively Charged (Basic) R Groups  The amino acids in which the R groups have a net positive charge at pH 7.0 are lysine, which has a second amino group at the ϵ position on its aliphatic chain; arginine, which has a positively charged guanidino group; and histidine, containing an imidazole group (Fig. 5–6). Histidine is the only standard amino acid having a side chain with a pKa near neutrality.

Measurement of light absorption is an important tool for analysis of many biological molecules. The fraction of the incident light absorbed by a solution at a given wavelength is related to the thickness of the absorbing layer (path length) and the concentration of the absorbing species. These two relationships are combined into the Lambert–Beer law, given in integrated form as
  =  ϵcl
where I0 is the intensity of the incident light, I is the intensity of the transmitted light, ϵ is the molar absorption coefficient (in units of liters per mole-centimeter), c the concentration of the absorbing species (in moles per liter), and l the path length of the light-absorbing sample (in centimeters). The Lambert–Beer law assumes
that the incident light is parallel and monochromatic and that the solvent and solute molecules are randomly oriented. The expression log (I0/I) is called the absorbance, designated A.
It is important to note that each millimeter path length of absorbing solution in a 1.0 cm cell absorbs not a constant amount but a constant fraction of the incident light. However, with an absorbing layer of fixed path length, the absorbance A is directly proportional to the concentration of the absorbing solute.
The molar absorption coefficient varies with the nature of the absorbing compound, the solvent, the wavelength, and also with pH if the light-absorbing species is in equilibrium with another species having a different spectrum through gain or loss of protons.
In practice, absorbance measurements are usually made on a set of standard solutions of known concentration at a fixed wavelength. A sample of unknown concentration can then be compared with the resulting standard curve, as shown in Figure 1.
A595, protein concentration (μg/mL)
Figure 1  Eight standard solutions containing known amounts of protein and one sample containing an unknown amount of protein were reacted with the Bradford reagent. This reagent contains a dye that shifts its absorption maximum to 595 nm when it binds amino acid residues. The A595, (absorbance at 595 nm) of the standard samples was plotted against the protein concentration to create the standard curve, shown here. The A595 of the unknown sample, 0.58, corresponds to a protein concentration of 122 μg/mL.
4-hydroxyproline, 5-hydroxylysine, 6-N-methyllysine, γ-carboxyglutamate, desmosine, selenocysteine; ornithine, citrulline
In addition to the 20 standard amino acids that are common in all proteins, other amino acids have been found as components of only certain types of proteins (Fig. 5–8a). Each of these is derived from one of the 20 standard amino acids, in a modification reaction that occurs after the standard amino acid has been inserted into a protein. Among the nonstandard amino acids are 4-hydroxyproline, a derivative of proline, and 5-hydroxylysine; the former is found in plant cell-wall proteins, and both are found in the fibrous protein collagen of connective tissues. N-Methyllysine is found in myosin, a contractile protein of muscle. Another important nonstandard amino acid is
γ-carboxyglutamate, found in the blood-clotting protein prothrombin as well as in certain other proteins that bind Ca2+ in their biological function. More complicated is the nonstandard amino acid desmosine, a derivative of four separate lysine residues, found in the fibrous protein elastin. Selenocysteine contains selenium rather than the oxygen of serine, and is found in glutathione peroxidase and a few other proteins.
Some 300 additional amino acids have been found in cells and have a variety of functions but are not substituents of proteins. Ornithine and citrulline (Fig. 5–8b) deserve special note because they are key intermediates in the biosynthesis of arginine and in the urea cycle. These pathways are described in Chapters 21 and 17, respectively.
Figure 5–8  (a) Some nonstandard amino acids found in proteins; all are derived from standard amino acids. The extra functional groups are shown in red. Desmosine is formed from four residues of lysine, whose carbon backbones are shaded in gray. Selenocysteine is derived from serine. (b) Ornithine and citrulline are intermediates in the biosynthesis of arginine and in the urea cycle. Note that two systems are used to number carbons in the naming of these amino acids. The α, β, γ system used for γ-carboxyglutamate begins at the α carbon (see Fig. 5–2) and extends into the R group. The α-carboxyl group is not included. In contrast, the numbering system used to identify the modified carbon in 4-hydroxyproline, 5-hydroxylysine, and 6-N-methyllysine includes the α-carboxyl carbon, which is designated carbon 1 (or C-1).
When a crystalline amino acid, such as alanine, is dissolved in water, it exists in solution as the dipolar ion, or zwitterion, which can act either as an acid (proton donor):
or as a base (proton acceptor):
Substances having this dual nature are amphoteric and are often called ampholytes, from “amphoteric electrolytes”. A simple monoamino monocarboxylic α-amino acid, such as alanine, is actually a diprotic acid when it is fully protonated, that is, when both its carboxyl group and amino group have accepted protons. In this form it has two groups that can ionize to yield protons, as indicated in the following equation:
pH, pK1 = 2.34, pI = 5.97, pK2 = 9.60, OH (equivalents)
Figure 5–9  The titration curve of 0.1 M glycine at 25 °C. The ionic species predominating at key points in the titration are shown above the graph. The shaded boxes, centered about pK1 = 2.34 and pK2 = 9.60, indicate the regions of greatest buffering power.
Titration involves the gradual addition or removal of protons. Figure 5–9 shows the titration curve of the diprotic form of glycine. Each molecule of added base results in the net removal of one proton from
one molecule of amino acid. The plot has two distinct stages, each corresponding to the removal of one proton from glycine. Each of the two stages resembles in shape the titration curve of a monoprotic acid, such as acetic acid (see Fig. 4–10), and can be analyzed in the same way. At very low pH, the predominant ionic species of glycine is +H3N–CH2–COOH, the fully protonated form. At the midpoint in the first stage of the titration, in which the –COOH group of glycine loses its proton, equimolar concentrations of proton-donor (+H3N–CH2–COOH) and proton-acceptor (+H3N–CH2–COO) species are present. At the midpoint of a titration (see Fig. 4–11), the pH is equal to the pKa of the protonated group being titrated. For glycine, the pH at the midpoint is 2.34, thus its –COOH group has a pKa of 2.34. [Recall that pH and pKa are simply convenient notations for proton concentration and the equilibrium constant for ionization, respectively (Chapter 4). The pKa is a measure of the tendency of a group to give up a proton, with that tendency decreasing tenfold as the pKa increases by one unit.] As the titration proceeds, another important point is reached at pH 5.97. Here there is a point of inflection, at which removal of the first proton is essentially complete, and removal of the second has just begun. At this pH the glycine is present largely as the dipolar ion +H3N–CH2–COO. We shall return to the significance of this inflection point in the titration curve shortly.
The second stage of the titration corresponds to the removal of a proton from the –NH3+ group of glycine. The pH at the midpoint of this stage is 9.60, equal to the pKa for the –NH3+ group. The titration is complete at a pH of about 12, at which point the predominant form of glycine is H2N–CH2–COO.
From the titration curve of glycine we can derive several important pieces of information. First, it gives a quantitative measure of the pKa of each of the two ionizing groups, 2.34 for the –COOH group and 9.60 for the –NH3+ group. Note that the carboxyl group of glycine is over 100 times more acidic (more easily ionized) than the carboxyl group of acetic acid, which has a pKa of 4.76. This effect is caused by the nearby positively charged amino group on the α-carbon atom, as described in Figure 5–10.
The second piece of information given by the titration curve of glycine (Fig. 5–9) is that this amino acid has two regions of buffering power (see Fig. 4–12). One of these is the relatively flat portion of the curve centered about the first pKa of 2.34, indicating that glycine is a good buffer near this pH. The other buffering zone extends for ~1.2 pH units centered around pH 9.60. Note also that glycine is not a good buffer at the pH of intracellular fluid or blood, about 7.4.
α-amino acid (glycine), acetic acid
Figure 5–10  (a) Interactions between the α-amino and α-carboxyl groups in an α-amino acid. The nearby positive charge of the –NH3+ group makes ionization of the carboxyl group more likely (i.e., lowers the pKa for –COOH). This is due to a stabilizing interaction between opposite charges on the zwitterion and a repulsive interaction between the positive charges of the amino group and the departing proton. (b) The normal pKa for a carboxyl group is approximately 4.76, as for acetic acid.
The Henderson–Hasselbalch equation (Chapter 4) can be used to calculate the proportions of proton-donor and proton-acceptor species of glycine required to make a buffer at a given pH within the buffering ranges of glycine; it also makes it possible to solve other kinds of buffer problems involving amino acids (see Box 4–2).
Another important piece of information derived from the titration curve of an amino acid is the relationship between its net electric charge and the pH of the solution. At pH 5.97, the point of inflection between the two stages in its titration curve, glycine is present as its dipolar form, fully ionized but with no net electric charge (Fig. 5–9). This characteristic pH is called the isoelectric point or isoelectric pH, designated pI or pHI. For an amino acid such as glycine, which has no ionizable group in the side chain, the isoelectric point is the arithmetic mean of the two pKa values:
pI = (pK1 + pK2) / 2
which in the case of glycine is
pI = (2.34 + 9.60) / 2 = 5.97
As is evident in Figure 5–9, glycine has a net negative charge at any pH above its pI and will thus move toward the positive electrode (the anode) when placed in an electric field. At any pH below its pI, glycine has a net positive charge and will move toward the negative electrode, the cathode. The farther the pH of a glycine solution is from its isoelectric point, the greater the net electric charge of the population of glycine molecules. At pH 1.0, for example, glycine exists entirely as the form +H3N–CH2–COOH, with a net positive charge of 1.0. At pH 2.34, where there is an equal mixture of +H3N–CH2–COOH and +H3N–CH2–COO, the average or net positive charge is 0.5. The sign and the magnitude of the net charge of any amino acid at any pH can be predicted in the same way.
This information has practical importance. For a solution containing a mixture of amino acids, the different amino acids can be separated on the basis of the direction and relative rate of their migration when placed in an electric field at a known pH.
The shared properties of many amino acids permit some simplifying generalizations about the acid–base behavior of different classes of amino acids.
All amino acids with a single α-amino group, a single α-carboxyl group, and an R group that does not ionize have titration curves resembling that of glycine (Fig. 5–9). This group of amino acids is characterized by having very similar, although not identical, values for pK1 (the pK of the –COOH group) in the range of 1.8 to 2.4 and for pK2 (of the –NH3+ group) in the range of 8.8 to 11.0 (Table 5–1).
Amino acids with an ionizable R group (Table 5–1) have more complex titration curves with three stages corresponding to the three possible ionization steps; thus they have three pKa values. The third stage for the titration of the ionizable R group merges to some extent with the others. The titration curves of two representatives of this group, glutamate and histidine, are shown in Figure 5–11. The isoelectric points of amino acids in this class reflect the type of ionizing R groups present. For example, glutamate has a pI of 3.22, considerably lower than that of glycine. This is a result of the presence of two carboxyl
pH, pK1 = 2.19, pKR = 4.25, pK2 = 9.67, OH (equivalents); pH, pK1 = 1.82, pKR = 6.0, pK2 = 9.17, OH (equivalents)
Figure 5–11  The titration curves of (a) glutamate and (b) histidine. The pKa of the R group is designated pKR.
groups which, at the average of their pKa values (3.22), contribute a net negative charge of –1 that balances the +1 contributed by the amino group. Similarly, the pI of histidine, with two groups that are positively charged when protonated, is 7.59 (the average of the pKa values of the amino and imidazole groups), much higher than that of glycine.
Another important generalization can be made about the acid–base behavior of the 20 standard amino acids. Under the general condition of free and open exposure to the aqueous environment, only histidine has an R group (pKa = 6.0) providing significant buffering power near the neutral pH usually found in the intracellular and intercellular fluids of most animals and bacteria. All the other amino acids have pKa values too far away from pH 7 to be effective physiological buffers (Table 5–1), although in the interior of proteins the pKa values of amino acid side chains are often altered.
Ion-exchange chromatography is the most widely used method for separating, identifying, and quantifying the amounts of each amino acid in a mixture. This technique primarily exploits differences in the sign and magnitude of the net electric charges of amino acids at a given pH, which are predictable from their pKa values or their titration curves.
The chromatographic column consists of a long tube filled with particles of a synthetic resin containing fixed charged groups; those with fixed anionic groups are called cation-exchange resins and those with fixed cationic groups, anion-exchange resins. A simple form of ion-exchange chromatography on a cation-exchange resin is described in Figure 5–12. The affinity of each amino acid for the resin is affected by pH (which determines the ionization state of the molecule) and the concentration of other salt ions that may compete with the resin by associating with the amino acid. Separation of amino acids can therefore be optimized by gradually changing the pH and/or salt concentration of the solution being passed through the column so as to create a pH or salt gradient. A modern enhancement of this and other chromatographic techniques is called high-performance liquid chromatography (HPLC). This takes advantage of stronger resin material and improved apparatus designed to permit chromatography at high pressures, allowing better separations in a much shorter time. For amino acids, the entire procedure has been automated, so that elution, collection of fractions, analysis of each fraction, and recording of data are performed automatically in an amino-acid analyzer. Figure 5–13 shows a chromatogram of an amino acid mixture analyzed in this way.
reservoir of buffer allows sample to percolate slowly through column;
solution of amino acids at pH 3.0 is poured onto a cation-exchange column;
amino acids with greatest positive charge (red) bind the column most tightly and therefore move most slowly, those with the least amount of positive charge (blue) move fastest and elute first; fractions are collected from the bottom of the column and analyzed quantitatively
Figure 5–12  Ion-exchange chromatography. An example of a cation-exchange resin is presented. (a) Negatively charged sulfonate groups (–SO3) on the resin surface attract and bind cations, such as H+, Na+, or cationic forms of amino acids. (b) An acidic solution (pH 3.0) of the amino acid mixture is poured on a column packed with resin and allowed to percolate through slowly. At pH 3.0 the amino acids are largely cations with net positive charges, but they differ in the pKa values of their R groups, and hence in the extent to which they are ionized and in their tendency to bind to the anionic resin. As a result, they move through the column at different rates.
time (min), Asp, Glu, Ser, Gly, His, Thr, Ala, Arg, Pro, Tyr, Val, Met, Ile, Leu, Phe, Lys
Figure 5–13  Automatically recorded high-performance liquid chromatographic analysis of amino acids on a cation-exchange resin. The area under each peak on the chromatogram is proportional to the amount of each amino acid in the mixture.
ninhydrin, amino acid, ninhydrin → purple pigment
As for all organic compounds, the chemical reactions of amino acids are those characteristic of their functional groups. Because all amino acids contain amino and carboxyl groups, all will undergo chemical reactions characteristic for these groups. For example, their amino groups can be acetylated or formylated, and their carboxyl groups can be esterified. We will not examine all such organic reactions of amino acids, but several widely used reactions are noteworthy because they greatly simplify the detection, measurement, and identification of amino acids.
One of the most important, technically and historically, is the ninhydrin reaction, which has been used for many years to detect and quantify microgram amounts of amino acids. When amino acids are heated with excess ninhydrin, all those having a free α-amino group yield a purple product. Proline, in which the α-amino group is substituted (forming an imino group), yields a yellow product. Under appropriate conditions the intensity of color produced (optical absorbance of the solution; see Box 5–1) is proportional to the amino acid concentration. Comparing the absorbance to that of appropriate standard solutions is an accurate and technically simple method for measuring amino acid concentration.
Several other convenient reagents are available that react with the α-amino group to form colored or fluorescent derivatives. Unlike ninhydrin, these have the advantage that the intact R group of the amino acid remains part of the product, so that derivatives of different amino acids can be distinguished. Fluorescamine reacts rapidly with amino acids and provides great sensitivity, yielding a highly fluorescent derivative that permits the detection of nanogram quantities of amino acids (Fig. 5–14). Dabsyl chloride, dansyl chloride, and 1-fluoro-2,4-dinitrobenzene (Fig. 5–14) yield derivatives that are stable under harsh conditions such as those used in the hydrolysis of proteins.
Figure 5–14  Reagents that react with the α-amino group of amino acids. The reactions producing 2,4-dinitrophenyl and fluorescamine derivatives are illustrated. The reactions of dansyl chloride and dabsyl chloride are similar to that of 1-fluoro-2,4-dinitrobenzene (Sanger’s reagent). Because the derivatives of these reagents absorb light, they greatly facilitate the detection and quantification of the amino acids.
1-fluoro-2,4-dinitrobenzene, α-amino acid → HF, 2,4-dinitrophenylamino acid; fluorescamine, α-amino acid → fluorescent amine derivative; dansyl chloride, dabsyl chloride
We now turn to polymers of amino acids, the peptides. Biologically occurring peptides range in size from small molecules containing only two or three amino acids to macromolecules containing thousands of amino acids. The focus here is on the structure and chemical properties of the smaller peptides, providing a prelude to the discussion of the large peptides called proteins in the next two chapters.
Figure 5–15  Formation of a peptide bond (shaded in gray) in a dipeptide. This is a condensation reaction. The α-amino group of amino acid 2 acts as a nucleophile (see Table 3–6) to displace the hydroxyl group of amino acid 1 (red). Amino groups are good nucleophiles, but the hydroxyl group is a poor leaving group and is not readily displaced. At physiological pH the reaction as shown does not occur to any appreciable extent. Peptide bond formation is endergonic, with a free-energy change of about +21 kJ/mol.
Two amino acid molecules can be covalently joined through a substituted amide linkage, termed a peptide bond, to yield a dipeptide. Such a linkage is formed by removal of the elements of water from the α-carboxyl group of one amino acid and the α-amino group of another (Fig. 5–15). Peptide-bond formation is an example of a condensation reaction, a common class of reaction in living cells. Note that as shown in Figure 5–15, this reaction has an equilibrium that favors reactants rather than products. To make the reaction thermodynamically more favorable, the carboxyl group must be chemically modified or activated so that the hydroxyl group can be more readily eliminated. A chemical approach to this problem is outlined at the end of this chapter (see Box 5–2). The biological approach to peptide bond formation is a major topic of Chapter 26.
Three amino acids can be joined by two peptide bonds to form a tripeptide; similarly, amino acids can be linked to form tetrapeptides and pentapeptides. When a few amino acids are joined in this fashion, the structure is called an oligopeptide. When many amino acids are joined, the product is called a polypeptide. Proteins may have thousands of amino acid units. Although the terms “protein” and “polypeptide” are sometimes used interchangeably, molecules referred to as polypeptides generally have molecular weights below 10,000.
Figure 5–16  Structure of the pentapeptide serylglycyltyrosinylalanylleucine, or Ser-Gly-Tyr-Ala-Leu. Peptides are named beginning with the amino-terminal residue, which by convention is placed at the left. The peptide bonds are shown shaded in gray, the R groups in red.
amino-terminal end, carboxyl-terminal end
Figure 5–16 shows the structure of a pentapeptide. The amino acid units in a peptide are often called residues (each has lost a hydrogen atom from its amino group and a hydroxyl moiety from its carboxyl group). The amino acid residue at that end of a peptide having a free α-amino group is the amino-terminal (or N-terminal) residue; the residue at the other end, which has a free carboxyl group, is the carboxyl-terminal (C-terminal) residue. By convention, short peptides are named from the sequence of their constituent amino acids, beginning at the left with the amino-terminal residue and proceeding toward the carboxyl terminus at the right (Fig. 5–16).
Although hydrolysis of peptide bonds is an exergonic reaction, it occurs slowly because of its high activation energy. As a result, the peptide bonds in proteins are quite stable under most intracellular conditions.
The peptide bond is the single most important covalent bond linking amino acids in peptides and proteins. The only other type of covalent bond that occurs frequently enough to deserve special mention here is the disulfide bond sometimes formed between two cysteine residues. Disulfide bonds play a special role in the structure of many proteins, particularly those that function extracellularly, such as the hormone insulin and the immunoglobulins or antibodies.
Peptides contain only one free α-amino group and one free α-carboxyl group (Fig. 5–17). These groups ionize as they do in simple amino acids, although the ionization constants are different because the oppositely charged group is absent from the α carbon. The α-amino and α-carboxyl groups of all other constituent amino acids are covalently joined in the form of peptide bonds, which do not ionize and thus do not contribute to the total acid–base behavior of peptides. However, the R groups of some amino acids can ionize (Table 5–1), and in a peptide these contribute to the overall acid–base properties (Fig. 5–17). Thus the acid–base behavior of a peptide can be predicted from its single free α-amino and α-carboxyl groups and the nature and number of its ionizable R groups. Like free amino acids, peptides have characteristic titration curves and a characteristic isoelectric pH at which they do not move in an electric field. These properties are exploited in some of the techniques used to separate peptides and proteins (Chapter 6).
Figure 5–17  Ionization and electric charge of peptides. The groups ionized at pH 7.0 are in red. (a) A tetrapeptide with two ionizable R groups. (b) The cationic, isoelectric, and anionic forms of a dipeptide lacking ionizable R groups.
alanylglutamylglycyllysine, Ala, Glu, Gly, Lys; alanylalanine, cationic form (below pH 3), isoelectric form, anionic form (above pH 10)
Like other organic molecules, peptides undergo chemical reactions that are characteristic of their functional groups: the free amino and carboxyl groups and the R groups.
Peptide bonds can be hydrolyzed by boiling with either strong acid (typically 6 M HCl) or base to yield the constituent amino acids.
Hydrolysis of peptide bonds in this manner is a necessary step in determining the amino acid composition of proteins. The reagents shown in Figure 5–14 label only free amino groups: those of the amino-terminal residue and the R groups of any lysines present. If dabsyl chloride, dansyl chloride, or 1-fluoro-2,4-dinitrobenzene is used before acid hydrolysis of the peptide, the amino-terminal residue can be separated and identified (Fig. 5–18).
dabsyl chloride, tetrapeptide → dabsyl peptide, 6 M HCl, 110 °C, 24h → dabsyl amino acid, α-amino acids
Peptide bonds can also be hydrolyzed by certain enzymes called proteases. Proteolytic (protein-cleaving) enzymes are found in all cells and tissues, where they degrade unneeded or damaged proteins or aid in the digestion of food.
Figure 5–18  The amino-terminal residue of a tetrapeptide can be identified by labeling it with dabsyl chloride, then hydrolyzing the peptide bonds in strong acid. The result is a mixture of amino acids of which only the amino-terminal amino acid (and lysine) is labeled.
Much of the material in the chapters to follow will revolve around the activities of proteins with molecular weights measured in the tens and even hundreds of thousands. Not all polypeptides are so large, however. There are many naturally occurring small polypeptides and oligopeptides, some of which have important biological activities and exert their effects at very low concentrations. For example, a number of vertebrate hormones (intercellular chemical messengers) (Chapter 22) are small polypeptides. The hormone insulin contains two polypeptide chains, one having 30 amino acid residues and the other 21. Other polypeptide hormones include glucagon, a pancreatic hormone of 29 residues that opposes the action of insulin, and corticotropin, a 39-residue
hormone of the anterior pituitary gland that stimulates the adrenal cortex.
Some biologically important peptides have only a few amino acid residues. That small peptides can have large biological effects is readily illustrated by the activity of the commercially synthesized dipeptide, L-aspartylphenylalanyl methyl ester. This compound is an artificial sweetener better known as aspartame or NutraSweet®:
L-aspartyl-L-phenylalanyl methyl ester (aspartame)
Among naturally occurring small peptides are hormones such as oxytocin (nine amino acid residues), which is secreted by the posterior pituitary and stimulates uterine contractions; bradykinin (nine residues), which inhibits inflammation of tissues; and thyrotropin-releasing factor (three residues), which is formed in the hypothalamus and stimulates the release of another hormone, thyrotropin, from the anterior pituitary gland (Fig. 5–19). Also noteworthy among short peptides are the enkephalins, compounds formed in the central nervous system
pyroglutamate, His, prolinamide; Tyr–Gly–Gly–Phe–Met, Tyr–Gly–Gly–Phe–Leu, →D-Phe→L-Leu→L-Orn→L-Val→L-Pro→D-Phe→L-Leu→L-Orn→L-Val→L-Pro→
Figure 5–19  Some naturally occurring peptides with intense biological activity. The amino-terminal residues are at the left end. (a) Bradykinin, a hormonelike peptide that inhibits inflammatory reactions. (b) Oxytocin, formed by the posterior pituitary gland. The shaded portion is a residue of glycinamide (H2N–CH2–CONH2). (c) Thyrotropin-releasing factor, formed by the hypothalamus. (d) Two enkephalins, brain peptides that affect the perception of pain. (e) Gramicidin S, an antibiotic produced by the bacterium Bacillus brevis. The arrows indicate the direction from the amino toward the carboxyl end of each residue. The peptide has no termini because it is circular. Orn is the symbol for ornithine, an amino acid that generally does not occur in proteins. Note that gramicidin S contains two residues of a D-amino acid (D-phenylalanine).
that bind to receptors in certain cells of the brain and induce analgesia (deadening of pain sensations). Enkephalins represent one of the body’s own mechanisms for control of pain. The enkephalin receptors also bind morphine, heroin, and other addicting opiate drugs (although these are not peptides). Some extremely toxic mushroom poisons, such as amanitin, are also peptides, as are many antibiotics.
A growing number of small peptides are proving to be important commercially as pharmaceutical reagents. Unfortunately, they are often present in exceedingly small amounts and hence are hard to purify. For these and other reasons, the chemical synthesis of peptides has become one of the major technologies associated with biochemistry (Box 5–2).

Many peptides are potentially useful as pharmacological reagents, and their synthesis is of considerable commercial importance. There are three ways to obtain a peptide: (1) purification from tissue, a task often made difficult by the vanishingly low concentrations of some peptides; (2) genetic engineering; or (3) direct chemical synthesis. Powerful techniques now make direct chemical synthesis an attractive option in many cases. In addition to commercial applications, the synthesis of specific peptide portions of larger proteins is an increasingly important tool for the study of protein structure and function.
The complexity of proteins makes the traditional synthetic approaches of organic chemistry impractical for peptides with more than four or five amino acids. One problem is the difficulty of purifying the product after each step, because the chemical properties of the peptide change each time a new amino acid is added.
The major breakthrough in this technology was provided by R. Bruce Merrifield. His innovation involved synthesizing a peptide while keeping it attached at one end to a solid support. The support is an insoluble polymer (resin) contained within a column, similar to that used for chromatographic procedures. The peptide is built up on this support one amino acid at a time using a standard set of reactions in a repeating cycle (Fig. 1).
insoluble polystyrene bead,
α-amino group protected by t-butyloxycarbonyl group,
attachment of carboxyl-terminal amino acid to reactive group on resin → Cl, CF3COOH,
protecting group is removed by flushing with CF3COOH, dicyclohexylcarbodiimide,
amino acid with protected α-amino group is activated at carboxyl group by DCC, α-amino group of
amino acid 1 attacks activated carboxyl group of amino acid 2 to form peptide bond → dicyclohexylurea,
reactions to repeated as necessary, HF → completed peptide is deprotected as in reaction ,
HF hydrolyzes ester linkage between peptide and resin
Figure 1  Chemical synthesis of a peptide on a solid support. Reactions through are necessary for the formation of each peptide bond.
The technology for chemical peptide synthesis has been automated, and several commercial instruments are now available. The most important limitation of the process involves the efficiency of each amino acid addition, as can be seen by calculating the overall yields of peptides of various lengths when the yield for addition of each new amino acid is 96.0 versus 99.8% (Table 1). The chemistry has been optimized to permit the synthesis of proteins 100 amino acids long in about 4 days in reasonable yield. A very similar approach is used to synthesize nucleic acids (Fig. 12–38). It is worth noting that this technology, impressive as it is, still pales when compared with biological processes. The same 100 amino-acid protein would be synthesized with exquisite fidelity in about 5 seconds in a bacterial cell.
Table 1  Effect of stepwise yield on overall
yields in peptide synthesis
Number of Overall yields of final peptide (%)
residues when the yield of each step is:
in the final
polypeptide   96.0%          99.8%
     11 66 98
     21 44 96
     31 29 94
     51 13 90
     100  1.7   82    
The 20 amino acids commonly found as hydrolysis products of proteins contain an α-carboxyl group, an α-amino group, and a distinctive R group substituted on the α-carbon atom. The α-carbon atom of the amino acids (except glycine) is asymmetric, and thus amino acids can exist in at least two stereoisomeric forms. Only the L stereoisomers, which are related to the absolute configuration of L-glyceraldehyde, are found in proteins. The amino acids are classified on the basis of the polarity of their R groups. The nonpolar, aliphatic class includes alanine, glycine, isoleucine, leucine, proline, and valine. Phenylalanine, tryptophan, and tyrosine have aromatic side chains and are also relatively hydrophobic. The polar, uncharged class includes asparagine, cysteine, glutamine, methionine, serine, and threonine. The negatively charged (acidic) amino acids are aspartate and glutamate; the positively charged (basic) ones are arginine, histidine, and lysine. There are also a large number of nonstandard amino acids that occur in some proteins (as a result of the modification of standard amino acids) or as free metabolites in cells.
Monoamino monocarboxylic amino acids are diprotic acids (+H3NCH(R)COOH) at low pH. As the pH is raised to about 6, near the isoelectric point, the proton is lost from the carboxyl group to form the dipolar or zwitterionic species +H3NCH(R)COO, which is electrically neutral. Further increase
in pH causes loss of the second proton, to yield the ionic species H2NCH(R)COO. Amino acids with ionizable R groups may exist in additional ionic species, depending on the pH and the pKa of the R group. Thus amino acids vary in their acid–base properties. Amino acids form colored derivatives with ninhydrin. Other colored or fluorescent derivatives are formed in reactions of the α-amino group of amino acids with fluorescamine, dansyl chloride, dabsyl chloride, and 1-fluoro-2,4-dinitrobenzene. Complex mixtures of amino acids can be separated and identified by ion-exchange chromatography or HPLC.
Amino acids can be joined covalently through peptide bonds to form peptides, which can also be formed by incomplete hydrolysis of polypeptides. The acid–base behavior and chemical reactions of a peptide are functions of its amino-terminal amino group, its carboxyl-terminal carboxyl group, and its R groups. Peptides can be hydrolyzed to yield free amino acids. Some peptides occur free in cells and tissues and have specific biological functions. These include some hormones and antibiotics, as well as other peptides with powerful biological activity.
Further Reading
Cantor, C.R. & Schimmel, P.R. (1980) Biophysical Chemistry, Part I: The Conformation of Biological Macromolecules, W.H. Freeman and Company, San Francisco. 
Excellent textbook outlining the properties of biological macromolecules and their monomeric subunits.
Creighton, T.E. (1984) Proteins: Structures and Molecular Properties, W.H. Freeman and Company, New York. 
Very useful general source.
Dickerson, R.E. & Geis, I. (1983) Proteins: Structure, Function, and Evolution, 2nd edn, The Benjamin/Cummings Publishing Company, Menlo Park, CA. 
Beautifully illustrated and interesting account.
Amino Acids
Corrigan, J.J. (1969) D-Amino acids in animals. Science 169, 142–148. 
Meister, A. (1965) Biochemistry of the Amino Acids, 2nd edn, Vols. 1 and 2, Academic Press, Inc., New York. 
Encyclopedic treatment of the properties, occurrence, and metabolism of amino acids.
Montgomery, R. & Swenson, C.A. (1976) Quantitative Problems in the Biochemical Sciences, 2nd edn, W.H. Freeman and Company, New York. 
Segel, I.H. (1976) Biochemical Calculations, 2nd edn, John Wiley & Sons, New York. 
Haschemeyer, R.H. & Haschemeyer, A.E.V. (1973) Proteins: A Guide to Study by Physical and Chemical Methods, John Wiley & Sons, New York. 
Merrifield, B. (1986) Solid phase synthesis. Science 232, 341–347. 
Smith, L.M. (1988) Automated synthesis and sequence analysis of biological macromolecules. Analyt. Chem. 60, 381A–390A. 
1. Absolute Configuration of Citrulline  Is citrulline isolated from watermelons (shown below) a D- or L-amino acid? Explain.
2. Relation between the Structures and Chemical Properties of the Amino Acids  The structures and chemical properties of the amino acids are crucial to understanding how proteins carry out their biological functions. The structures of the side chains of 16 amino acids are given below. Name the amino acid that contains each structure and match the R group with the most appropriate description of its properties, (a) to (m). Some of the descriptions may be used more than once.

(a) Small polar R group containing a hydroxyl group; this amino acid is important in the active site of some enzymes.

(b) Provides the least amount of steric hindrance.

(c) R group has pKa ≈ 10.5, making it positively charged at physiological pH.

(d) Sulfur-containing R group; neutral at any pH.

(e) Aromatic R group, hydrophobic in nature and neutral at any pH.

(f) Saturated hydrocarbon, important in hydrophobic interactions.

(g) The only amino acid having an ionizing R group with a pKa near 7; it is an important group in the active site of some enzymes.

(h) The only amino acid having a substituted α-amino group; it influences protein folding by forcing a bend in the chain.

(i) R group has a pKa near 4 and thus is negatively charged at pH 7.

(j) An aromatic R group capable of forming hydrogen bonds; it has a pKa near 10.

(k) Forms disulfide cross-links between polypeptide chains; the pKa of its functional group is about 10.

(l) R group with pKa ≈ 12, making it positively charged at physiological pH.

(m) When this polar but uncharged R group is hydrolyzed, the amino acid is converted into another amino acid having a negatively charged R group at pH near 7.

3. Relationship between the Titration Curve and the Acid–Base Properties of Glycine  A 100 mL solution of 0.1 M glycine at pH 1.72 was titrated with 2 M NaOH solution. During the titration, the pH was monitored and the results were plotted in the graph shown. The key points in the titration are designated I to V on the graph. For each of the statements below, identify the appropriate key point in the titration and justify your choice.

       (a) At what point will glycine be present predominantly as the species +H3N–CH2–COOH?
       (b) At what point is the average net charge of glycine +½?
       (c) At what point is the amino group of half of the molecules ionized?
       (d) At what point is the pH equal to the pKa of the carboxyl group?
       (e) At what point is the pH equal to the pKa of the protonated amino group?
       (f) At what points does glycine have its maximum buffering capacity?
       (g) At what point is the average net charge zero?
       (h) At what point has the carboxyl group been completely titrated (first equivalence point)?
       (i) At what point are half of the carboxyl groups ionized?
       (j) At what point is glycine completely titrated (second equivalence point)?
       (k) At what point is the structure of the predominant species +H3N–CH2–COO?
       (l) At what point do the structures of the predominant species correspond to a 50:50 mixture of +H3N–CH2–COO and H2N–CH2–COO?
       (m) At what point is the average net charge of glycine –1?
       (n) At what point do the structures of the predominant species consist of a 50:50 mixture of +H3N–CH2–COOH and +H3N–CH2–COO?
       (o) What point corresponds to the isoelectric point?
       (p) At what point is the average net charge on glycine –½?
       (q) What point represents the end of the titration?
       (r) If one wanted to use glycine as an efficient buffer, which points would represent the worst pH regions for buffering power?
       (s) At what point in the titration is the predominant species H2N–CH2–COO?
4. How Much Alanine Is Present as the Completely Uncharged Species?  At a pH equal to the isoelectric point, the net charge on alanine is zero. Two structures can be drawn that have a net charge of zero (zwitterionic and uncharged forms), but the predominant form of alanine at its pI is zwitterionic.

       (a) Explain why the form of alanine at its pI is zwitterionic rather than completely uncharged.
       (b) Estimate the fraction of alanine present at its pI as the completely uncharged form. Justify your assumptions.
5. Ionization State of Amino Acids  Each ionizable group of an amino acid can exist in one of two states, charged or neutral. The electric charge on the functional group is determined by the relationship between its pKa and the pH of the solution. This relationship is described by the Henderson–Hasselbalch equation.
       (a) Histidine has three ionizable functional groups. Write the relevant equilibrium equations for its three ionizations and assign the proper pKa for each ionization. Draw the structure of histidine in each ionization state. What is the net charge on the histidine molecule in each ionization state?
       (b) Draw the structures of the predominant ionization state of histidine at pH 1, 4, 8, and 12. Note that the ionization state can be approximated by treating each ionizable group independently.
       (c) What is the net charge of histidine at pH 1, 4, 8, and 12? For each pH, will histidine migrate toward the anode (+) or cathode (–) when placed in an electric field?
6. Preparation of a Glycine Buffer  Glycine is commonly used as a buffer. Preparation of a 0.1 M glycine buffer starts with 0.1 M solutions of glycine hydrochloride (HOOC–CH2–NH3+Cl) and
glycine (OOC–CH2–NH3+), two commercially available forms of glycine. What volumes of these two solutions must be mixed to prepare 1 L of 0.1 M glycine buffer having a pH of 3.2? (Hint: See Box 4–2)
7. Separation of Amino Acids by Ion-Exchange Chromatography  Mixtures of amino acids are analyzed by first separating the mixture into its components through ion-exchange chromatography. On a cation-exchange resin containing sulfonate groups (see Fig. 5–12), the amino acids flow down the column at different rates because of two factors that retard their movement: (1) ionic attraction between the –SO3 residues on the column and positively charged functional groups on the amino acids and (2) hydrophobic interaction between amino acid side chains and the strongly hydrophobic backbone of the polystyrene resin. For each pair of amino acids listed, determine which member will be eluted first from an ion-exchange column by a pH 7.0 buffer.
       (a) Asp and Lys
       (b) Arg and Met
       (c) Glu and Val
       (d) Gly and Leu
       (e) Ser and Ala
8. Naming the Stereoisomers of Isoleucine  The structure of the amino acid isoleucine is:

       (a) How many chiral centers does it have?
       (b) How many optical isomers?
       (c) Draw perspective formulas for all the optical isomers of isoleucine.
9. Comparison of the pKa Values of an Amino Acid and Its Peptides  The titration curve of the amino acid alanine shows the ionization of two functional groups with pKa values of 2.34 and 9.69, corresponding to the ionization of the carboxyl and the protonated amino groups, respectively. The titration of di-, tri-, and larger oligopeptides of alanine also shows the ionization of only two functional groups, although the experimental pKa values are different. The trend in pKa values is summarized in the table.

       (a) Draw the structure of Ala–Ala–Ala. Identify the functional groups associated with pK1 and pK2.
       (b) The value of pK1 increases in going from Ala to an Ala oligopeptide. Provide an explanation for this trend.
       (c) The value of pK2 decreases in going from Ala to an Ala oligopeptide. Provide an explanation for this trend.
10. Peptide Synthesis  In the synthesis of polypeptides on solid supports, the α-amino group of each new amino acid is “protected” by a t-butyloxycarbonyl group (see Box 5–2). What would happen if this protecting group were not present?
Chapter 6
An Introduction to Proteins
Almost everything that occurs in the cell involves one or more proteins. Proteins provide structure, catalyze cellular reactions, and carry out a myriad of other tasks. Their central place in the cell is reflected in the fact that genetic information is ultimately expressed as protein. For each protein there is a segment of DNA (a gene; see Chapters 12 and 23) that encodes information specifying its sequence of amino acids. There are thousands of different kinds of proteins in a typical cell, each encoded by a gene and each performing a specific function. Proteins are among the most abundant biological macromolecules and are also extremely versatile in their functions.
The chapter begins with a discussion of some of the general properties of proteins. This is followed by a short summary of some common techniques used to purify and study proteins. Finally, we will examine the primary structure of protein molecules: the covalent backbone structure and the sequence of amino acid residues. One goal is to discover the relationships between amino acid sequence and biological function.
An understanding of these important macromolecules must begin with the fundamentals. What do proteins do? How big are they? What forms or shapes do they take? What are their chemical properties? The answers serve as an orientation to much that follows.
We can classify proteins according to their biological roles.
Enzymes  The most varied and most highly specialized proteins are those with catalytic activity – the enzymes. Virtually all the chemical reactions of organic biomolecules in cells are catalyzed by enzymes. Many thousands of different enzymes, each capable of catalyzing a different kind of chemical reaction, have been discovered in different organisms (Fig. 6–1a).
Transport Proteins  Transport proteins in blood plasma bind and carry specific molecules or ions from one organ to another. Hemoglobin of erythrocytes (Fig. 6–1b) binds oxygen as the blood passes through the lungs, carries it to the peripheral tissues, and there releases it to participate in the energy-yielding oxidation of nutrients. The blood plasma
contains lipoproteins, which carry lipids from the liver to other organs. Other kinds of transport proteins are present in the plasma membranes and intracellular membranes of all organisms; these are adapted to bind glucose, amino acids, or other substances and transport them across the membrane.
Nutrient and Storage Proteins  The seeds of many plants store nutrient proteins required for the growth of the germinating seedling. Particularly well-studied examples are the seed proteins of wheat, corn, and rice. Ovalbumin, the major protein of egg white, and casein, the major protein of milk, are other examples of nutrient proteins (Fig. 6–1c). The ferritin found in some bacteria and in plant and animal tissues stores iron.
Contractile or Motile Proteins  Some proteins endow cells and organisms with the ability to contract, to change shape, or to move about. Actin and myosin function in the contractile system of skeletal muscle and also in many nonmuscle cells. Tubulin is the protein from which microtubules are built. Microtubules act in concert with the protein dynein in flagella and cilia (Fig. 6–1d) to propel cells.
Structural Proteins  Many proteins serve as supporting filaments, cables, or sheets, to give biological structures strength or protection. The major component of tendons and cartilage is the fibrous protein collagen, which has very high tensile strength. Leather is almost pure collagen. Ligaments contain elastin, a structural protein capable of stretching in two dimensions. Hair, fingernails, and feathers consist largely of the tough, insoluble protein keratin. The major component of silk fibers and spider webs is fibroin (Fig. 6–1e). The wing hinges of some insects are made of resilin, which has nearly perfect elastic properties.
Defense Proteins  Many proteins defend organisms against invasion by other species or protect them from injury. The immunoglobulins or antibodies, specialized proteins made by the lymphocytes of vertebrates, can recognize and precipitate or neutralize invading bacteria, viruses, or foreign proteins from another species. Fibrinogen and thrombin are blood-clotting proteins that prevent loss of blood when the vascular system is injured. Snake venoms, bacterial toxins, and toxic plant proteins, such as ricin, also appear to have defensive functions (Fig. 6–1f). Some of these, including fibrinogen, thrombin, and some venoms, are also enzymes.
Regulatory Proteins  Some proteins help regulate cellular or physiological activity. Among them are many hormones. Examples include insulin, which regulates sugar metabolism, and the growth hormone of the pituitary. The cellular response to many hormonal signals is often mediated by a class of GTP-binding proteins called G proteins (GTP is closely related to ATP, with guanine replacing the adenine portion of the molecule; see Figs. 1–12 and 3–16b. ) Other regulatory proteins bind to DNA and regulate the biosynthesis of enzymes and RNA molecules involved in cell division in both prokaryotes and eukaryotes (Fig. 6–1g).
Figure 6–1  Functions of proteins. (a) The light produced by fireflies is the result of a light-producing reaction involving luciferin and ATP that is catalyzed by the enzyme luciferase (see Box 13–3). (b) Erythrocytes contain large amounts of the oxygen-transporting protein hemoglobin. (c) The white color of milk is derived primarily from the protein casein. (d) The movement of cilia in protozoans depends on the action of the protein dynein. (e) The protein fibroin is the major structural component of spider webs. (f) Castor beans contain a highly toxic protein called ricin. (g) Cancerous tumors are often made up of cells that have defects involving one or more of the proteins that regulate cell division.
Other Proteins  There are numerous other proteins whose functions are rather exotic and not easily classified. Monellin, a protein of an African plant, has an intensely sweet taste. It is being studied as a
nonfattening, nontoxic food sweetener for human use. The blood plasma of some Antarctic fish contains antifreeze proteins, which protect their blood from freezing.
It is extraordinary that all these proteins, with their very different properties and functions, are made from the same group of 20 amino acids.
How long are the polypeptide chains in proteins? Table 6–1 shows that human cytochrome c has 104 amino acid residues linked in a single chain; bovine chymotrypsinogen has 245 amino acid residues. Probably near the upper limit of size is the protein apolipoprotein B, a cholesterol-transport protein with 4,536 amino acid residues in a single polypeptide chain of molecular weight 513,000. Most naturally occurring polypeptides contain less than 2,000 amino acid residues.
Table 6–1  Molecular data on some proteins
Molecular  Number of  Number of
                                                          weight residues polypeptide chains
Insulin (bovine) 5,733 51 2
Cytochrome c (human) 13,000 104 1
Ribonuclease A (bovine pancreas) 13,700 124 1
Lysozyme (egg white) 13,930 129 1
Myoglobin (equine heart) 16,890 153 1
Chymotrypsin (bovine pancreas) 21,600 241 3
Chymotrypsinogen (bovine) 22,000 245 1
Hemoglobin (human) 64,500 574 4
Serum albumin (human) 68,500 ~550 1
Hexokinase (yeast) 102,000 ~800 2
Immunoglobulin G (human) 145,000 ~1,320 4
RNA polymerase (E. coli) 450,000 ~4,100 5
Apolipoprotein B (human) 513,000 4,536 1
Glutamate dehydrogenase (bovine liver)      1,000,000      ~8,300            ~40              
Some proteins consist of a single polypeptide chain, but others, called multisubunit proteins, have two or more (Table 6–1). The individual polypeptide chains in a multisubunit protein may be identical or different. If at least some are identical, the protein is sometimes called an oligomeric protein and the subunits themselves are referred to as protomers. The enzyme ribonuclease has one polypeptide chain. Hemoglobin has four: two identical α chains and two identical β chains, all four held together by noncovalent interactions.
The molecular weights of proteins, which can be determined by various physicochemical methods, may range from little more than 10,000 for small proteins such as cytochrome c (104 residues), to more than 106 for proteins with very long polypeptide chains or those with several subunits. The molecular weights of some typical proteins are given in Table 6–1. No simple generalizations can be made about the molecular weights of proteins in relation to their function.
One can calculate the approximate number of amino acid residues in a simple protein containing no other chemical group by dividing its molecular weight by 110. Although the average molecular weight of the
20 standard amino acids is about 138, the smaller amino acids predominate in most proteins; when weighted for the proportions in which the various amino acids occur in proteins (see Table 5–1), the average molecular weight is nearer to 128. Because a molecule of water (Mr 18) is removed to create each peptide bond, the average molecular weight of an amino acid residue in a protein is about 128 – 18 = 110. Table 6–1 shows the number of amino acid residues in several proteins.
As is true for simple peptides, hydrolysis of proteins with acid or base yields a mixture of free α-amino acids. When completely hydrolyzed, each type of protein yields a characteristic proportion or mixture of the different amino acids. Table 6–2 shows the composition of the amino acid mixtures obtained on complete hydrolysis of human cytochrome c and of bovine chymotrypsinogen, the inactive precursor of the digestive enzyme chymotrypsin. These two proteins, with very different functions, also differ significantly in the relative numbers of each kind of amino acid they contain. The 20 amino acids almost never occur in equal amounts in proteins. Some amino acids may occur only once per molecule or not at all in a given type of protein; others may occur in large numbers.
Table 6–2  Amino acid composition of two proteins
Number of residues per molecule of protein
Amino      Human                 Bovine   
acid cytochrome c     chymotrypsinogen
Ala 6 22
Arg 2 4
Asn 5 15
Asp 3 8
Cys 2 10
Gln 2 10
Glu 8 5
Gly 13 23
His 3 2
Ile 8 10
Leu 6 19
Lys 18 14
Met 3 2
Phe 3 6
Pro 4 9
Ser 2 28
Thr 7 23
Trp 1 8
Tyr 5 4
Val 3 23
       Total  104            245            
Many proteins, such as the enzymes ribonuclease and chymotrypsinogen, contain only amino acids and no other chemical groups; these are considered simple proteins. However, some proteins contain chemical components in addition to amino acids; these are called conjugated proteins. The non-amino acid part of a conjugated protein is usually called its prosthetic group. Conjugated proteins are classified on the basis of the chemical nature of their prosthetic groups (Table 6–3); for example, lipoproteins contain lipids, glycoproteins contain sugar groups, and metalloproteins contain a specific metal. A number of proteins contain more than one prosthetic group. Usually the prosthetic group plays an important role in the protein’s biological function.
Table 6–3  Conjugated proteins
Class                        Prosthetic group                 Example                                   
Lipoproteins Lipids β1-Lipoprotein of blood
Glycoproteins Carbohydrates Immunoglobulin G
Phosphoproteins Phosphate groups Casein of milk
Hemoproteins Heme (iron porphyrin) Hemoglobin
Flavoproteins Flavin nucleotides Succinate dehydrogenase
Metalloproteins Iron Ferritin
Zinc Alcohol dehydrogenase
Calcium Calmodulin
Molybdenum Dinitrogenase
Copper Plastocyanin
The aggregate biochemical picture of protein structure and function is derived from the study of many individual proteins. To study a protein in any detail it must be separated from all other proteins in a cell, and techniques must be available to determine its properties. The necessary methods come from protein chemistry, a discipline as old as biochemistry itself and one that retains a central position in biochemical research. Modern techniques are providing ever newer experimental insights into the critical relationship between the structure of a protein and its function.
Cells contain thousands of different kinds of proteins. A pure preparation of a given protein is essential before its properties, amino acid composition, and sequence can be determined. How, then, can one protein be purified?
Methods for separating proteins take advantage of properties such as charge, size, and solubility, which vary from one protein to the next. Because many proteins bind to other biomolecules, proteins can also be separated on the basis of their binding properties. The source of a protein is generally tissue or microbial cells. The cells must be broken open and the protein released into a solution called a crude extract. If necessary, differential centrifugation can be used to prepare subcellular fractions or to isolate organelles (see Fig. 2–24). Once the extract or organelle preparation is ready, a variety of methods are available for separation of proteins. Ion-exchange chromatography (see Fig. 5–12) can be used to separate proteins with different charges in much the same way that it separates amino acids. Other chromatographic methods take advantage of differences in size, binding affinity, and solubility (Fig. 6–2). Nonchromatographic methods include the selective precipitation of proteins with salt, acid, or high temperatures.
The approach to the purification of a “new” protein, one not previously isolated, is guided both by established precedents and common sense. In most cases, several different methods must be used sequentially to completely purify a protein. The choice of method is somewhat empirical, and many protocols may be tried before the most effective is determined. Trial and error can often be minimized by using purification procedures developed for similar proteins as a guide. Published purification protocols are available for many thousands of proteins. Common sense dictates that inexpensive procedures be used first, when the total volume and number of contaminants is greatest. Chromatographic methods are often impractical at early stages because the amount of chromatographic medium needed increases with sample size. As each purification step is completed, the sample size generally becomes smaller (Table 6–4) and more sophisticated (and expensive) chromatographic procedures can be applied.
porous polymer beads,
protein mixture is added to column containing cross-linked polymer,
protein molecules separate by size, larger molecules pass more freely, appearing in the earlier fractions; key: protein of interest, ligand, ligand coupled to polymer bead, mixture of proteins, protein mixture is added to column containing a polymer-bound ligand specific for protein of interest → unwanted proteins are washed through column, solution of ligand → protein of interest is eluted by ligand solution
Figure 6–2  Two types of chromatographic methods used in protein purification. (a) Size-exclusion chromatography; also called gel filtration. This method separates proteins according to size. The column contains a cross-linked polymer with pores of selected size. Larger proteins migrate faster than smaller ones, because they are too large to enter the pores in the beads and hence take a more direct route through the column. The smaller proteins enter the pores and are slowed by the more labyrinthian path they take through the column. (b) Affinity chromatography separates proteins by their binding specificities. The proteins retained on the column are those that bind specifically to a ligand cross-linked to the beads. (In biochemistry, the term “ligand” is used to refer to a group or molecule that is bound.) After nonspecific proteins are washed through the column, the bound protein of particular interest is eluted by a solution containing free ligand.
Table 6–4  A purification table for a hypothetical enzyme*
Fraction volume Total protein  Activity  Specific activity
Procedure or step                (ml) (mg) (units) (units/mg)
1. Crude cellular extract 1,400 10,000 100,000 10
2. Precipitation 280 3,000 96,000 32
3. Ion-exchange chromatography 90 400 80,000 200
4. Size-exclusion chromatography 80 100 60,000 600
5. Affinity chromatography 6 3 45,000 15,000        

* All data represent the status of the sample after the procedure indicated in the first column
has been carried out.
In order to purify a protein, it is essential to have an assay to detect and quantify that protein in the presence of many other proteins. Often, purification must proceed in the absence of any information about the size and physical properties of the protein, or the fraction of the total protein mass it represents in the extract.
The amount of an enzyme in a given solution or tissue extract can be assayed in terms of the catalytic effect it produces, that is, the increase in the rate at which its substrate is converted to reaction products when the enzyme is present. For this purpose one must know (1) the overall equation of the reaction catalyzed, (2) an analytical procedure for determining the disappearance of the substrate or the appearance of the reaction products, (3) whether the enzyme requires cofactors such as metal ions or coenzymes, (4) the dependence of the enzyme activity on substrate concentration, (5) the optimum pH, and (6) a temperature zone in which the enzyme is stable and has high activity. Enzymes are usually assayed at their optimum pH and at some convenient temperature within the range 25 to 38 °C. Also, very high substrate concentrations are generally required so that the initial reaction rate, which is measured experimentally, is proportional to enzyme concentration (Chapter 8).
By international agreement, 1.0 unit of enzyme activity is defined as the amount of enzyme causing transformation of 1.0 μmol of substrate per minute at 25 °C under optimal conditions of measurement. The term activity refers to the total units of enzyme in the solution. The specific activity is the number of enzyme units per milligram of protein (Fig. 6–3). The specific activity is a measure of enzyme purity: it increases during purification of an enzyme and becomes maximal and constant when the enzyme is pure (Table 6–4).
Figure 6–3  Activity versus specific activity. The difference between these two terms can be illustrated by considering two jars of marbles. The jars contain the same number of red marbles (representing an unknown protein), but different amounts of marbles of other colors. If the marbles are taken to represent proteins, both jars contain the same activity of the protein represented by the red marbles. The second jar, however, has the higher specific activity because here the red marbles represent a much higher fraction of the total.
After each purification step, the activity of the preparation (in units) is assayed, the total amount of protein is determined independently, and their ratio gives the specific activity. Activity and total protein generally decrease with each step. Activity decreases because some loss always occurs due to inactivation or nonideal interactions with chromatographic materials or other molecules in the solution. Total protein decreases because the objective is to remove as much nonspecific protein as possible. In a successful step, the loss of nonspecific protein is much greater than the loss of activity; therefore, specific activity increases even as total activity falls. The data are then assembled in a purification table (Table 6–4). A protein is generally considered pure when further purification steps fail to increase specific activity, and when only a single protein species can be detected (by methods to be described later).
For proteins that are not enzymes, other quantification methods are required. Transport proteins can be assayed by their binding to the molecule they transport, and hormones and toxins by the biological effect they produce; for example, growth hormones will stimulate the growth of certain cultured cells. Some structural proteins represent such a large fraction of a tissue mass that they can be readily extracted and purified without an assay. The approaches are as varied as the proteins themselves.
sample, direction of migration
In addition to chromatography, another important set of methods is available for the separation of proteins, based on the migration of charged proteins in an electric field, a process called electrophoresis. These procedures are not often used to purify proteins in large amounts because simpler alternative methods are usually available and electrophoretic methods often inactivate proteins. Electrophoresis is, however, especially useful as an analytical method. Its advantage is that proteins can be visualized as well as separated, permitting a researcher to estimate quickly the number of proteins in a mixture or the degree of purity of a particular protein preparation. Also, electrophoresis allows determination of crucial properties of a protein such as its isoelectric point and approximate molecular weight.
In electrophoresis, the force moving the macromolecule (nucleic acids as well as proteins are separated this way) is the electrical potential, E. The electrophoretic mobility of the molecule, μ, is the ratio of the velocity of the particle, V, to the electrical potential. Electrophoretic mobility is also equal to the net charge of the molecule, Z, divided by the frictional coefficient, ƒ. Thus:
μ  =  V / E  =  Z / ƒ
Electrophoresis of proteins is generally carried out in gels made up of the cross-linked polymer polyacrylamide (Fig. 6–4). The polyacrylamide gel acts as a molecular sieve, slowing the migration of proteins approximately in proportion to their mass, or molecular weight.
Figure 6–4  Electrophoresis. (a) Different samples are loaded in wells or depressions at the top of the polyacrylamide gel. The proteins move into the gel when an electric field is applied. The gel minimizes convection currents caused by small temperature gradients, and it minimizes protein movements other than those induced by the electric field. (b) Proteins can be visualized after electrophoresis by treating the gel with a stain such as Coomassie blue, which binds to the proteins but not to the gel itself. Each band on the gel represents a different protein (or protein subunit); smaller proteins are found nearer the bottom of the gel. This gel illustrates the purification of the enzyme RNA polymerase from the bacterium E. coli. The first lane shows the proteins present in the crude cellular extract. Successive lanes show the proteins present after each purification step. The purified protein contains four subunits, as seen in the last lane on the right.
Figure 6–5  Estimating the molecular weight of a protein. The electrophoretic mobility of a protein on an SDS polyacrylamide gel is related to its molecular weight, Mr. (a) Standard proteins of known molecular weight are subjected to electrophoresis (lane 1). These marker proteins can be used to estimate the Mr of an unknown protein (lane 2). (b) A plot of log Mr of the marker proteins versus relative migration during electrophoresis allows the Mr of the unknown protein to be read from the graph.
An electrophoretic method commonly used for estimation of purity and molecular weight makes use of the detergent sodium dodecyl sulfate (SDS). SDS binds to most proteins (probably by hydrophobic interactions; see Chapter 4) in amounts roughly proportional to the molecular weight of the protein, about one molecule of SDS for every two amino acid residues. The bound SDS contributes a large net negative charge, rendering the intrinsic charge of the protein insignificant.
myosin, 200,000, β-galactosidase, 116,250, phosphorylase b, 97,400, bovine serum albumin, 66,200, ovalbumin, 45,000, carbonic anhydrase, 31,000, soybean trypsin inhibitor, 21,500, lysozyme, 14,400, Mr standards, unknown protein; log Mr, relative migration, unknown protein
In addition, the native conformation of a protein is altered when SDS is bound, and most proteins assume a similar shape, and thus a similar ratio of charge to mass. Electrophoresis in the presence of SDS therefore separates proteins almost exclusively on the basis of mass (molecular weight), with smaller polypeptides migrating more rapidly. After electrophoresis, the proteins are visualized by adding a dye such as Coomassie blue (Fig. 6–4b) which binds to proteins but not to the gel itself. This type of gel provides one method to monitor progress in isolating a protein, because the number of protein bands should decrease as the purification proceeds. When compared with the positions to which proteins of known molecular weight migrate in the gel, the position of an unknown protein can provide an excellent measure of its molecular weight (Fig. 6–5). If the protein has two or more different subunits, each subunit will generally be separated by the SDS treatment, and a separate band will appear for each.
an ampholyte solution is incorporated into a gel, pH 9 → 3, a stable pH gradient is established in the gel after application of an electric field → protein solution is added and electric field is reapplied → after staining, proteins are shown to be distributed along pH gradient
Isoelectric focusing is a procedure used to determine the isoelectric point (pI) of a protein (Fig. 6–6). A pH gradient is established by allowing a mixture of low molecular weight organic acids and bases (ampholytes; see p. 118) to distribute themselves in an electric field generated across the gel. When a protein mixture is applied, each protein migrates until it reaches the pH that matches its pI. Proteins with different isoelectric points are thus distributed differently throughout the gel (Table 6–5).
Figure 6–6  Isoelectric focusing. This technique separates proteins according to their isoelectric points. A stable pH gradient is established in the gel by the addition of appropriate ampholytes. A protein mixture is placed in a well on the gel. With an applied electric field, proteins enter the gel and migrate until each reaches a pH equivalent to its pI. Remember that the net charge of a protein is zero when pH = pI.
Table 6–5  The isoelectric points of some
Pepsin ~1.0  
Egg albumin 4.6
Serum albumin 4.9
Urease 5.0
β-Lactoglobulin 5.2
Hemoglobin 6.8
Myoglobin 7.0
Chymotrypsinogen 9.5
Cytochrome c 10.7
Lysozyme 11.0
Combining these two electrophoretic methods in two-dimensional gels permits the resolution of complex mixtures of proteins (Fig. 6–7). This is a more sensitive analytical method than either isoelectric focusing or SDS electrophoresis alone. Two-dimensional electrophoresis separates proteins of identical molecular weight that differ in pI, or proteins with similar pI values but different molecular weights.
Figure 6–7  Two-dimensional electrophoresis. (a) Proteins are first separated by isoelectric focusing. The gel is then laid horizontally on a second gel, and the proteins are separated by SDS polyacrylamide gel electrophoresis. In this two-dimensional gel, horizontal separation reflects differences in pI; vertical separation reflects differences in molecular weight. (b) More than 1,000 different proteins from E. coli can be resolved using this technique.
isoelectric focusing, first dimension, decreasing pI, second dimension, two-dimensional gel, SDS polyacrylamide gel electrophoresis, decreasing pI, decreasing Mr
antibody, binding sites, antigen
Several sensitive analytical procedures have been developed from the study of a class of proteins called antibodies or immunoglobulins. Antibody molecules appear in the blood serum and certain tissues of a vertebrate animal in response to injection of an antigen, a protein or other macromolecule foreign to that individual. Each foreign protein elicits the formation of a set of different antibodies, which can combine with the antigen to form an antigen–antibody complex. The production of antibodies is part of a general defense mechanism in vertebrates called the immune response.
Antibodies are Y-shaped proteins consisting of four polypeptide chains. They have two binding sites that are complementary to specific structural features of the antigen molecule, making possible the formation of a three-dimensional lattice of alternating antigen and antibody molecules (Fig. 6–8). If sufficient antigen is present in a sample, the addition of antibodies or blood serum from an immunized animal will result in the formation of a quantifiable precipitate. No such precipitate is formed when serum of an unimmunized animal is mixed with the antigen.
Figure 6–8  The immune response and the action of antibodies. (a) A molecule of immunoglobulin G (IgG) consists of two polypeptides known as heavy chains (white and light blue) and two known as light chains (purple and dark blue). Immunoglobulins are glycoproteins and contain bound carbohydrate (yellow). (b) Each antigen evokes a specific set of antibodies, which will recognize and combine only with that antigen or closely related molecules. (Antibody-binding sites are shown as red areas on the antigen.) The Y-shaped antibodies each have two binding sites for the antigen, and can precipitate the antigen by forming an insoluble, latticelike aggregate.
Antibodies are highly specific for the foreign proteins or other macromolecules that evoke their formation. It is this specificity that makes them valuable analytical reagents. A rabbit antibody formed to horse serum albumin, for example, will combine with the latter but will not usually combine with other horse proteins, such as horse hemoglobin.
Cesar Milstein, Georges Köhler
Two types of antibody preparations are in use: polyclonal and monoclonal. Polyclonal antibodies are those produced by many different types (or populations) of antibody-producing cells in an animal immunized with an antigen (in this case a protein). Each type of cell produces an antibody that binds only to a specific, small part of the antigen protein. Consequently, polyclonal preparations contain a mixture of antibodies that recognize different parts of the protein. Monoclonal antibodies, in contrast, are synthesized by a population of identical cells (a clone) grown in cell culture. These antibodies are homogeneous, all recognizing the same specific part of the protein. The techniques for producing monoclonal antibodies were worked out by Georges Köhler and Cesar Milstein.
Antibodies are so exquisitely specific that they can in some cases distinguish between two proteins differing by only a single amino acid.
When a mixture of proteins is added to a chromatography column in which the antibody is covalently attached to a resin, the antibody will specifically bind its target protein and retain it on the column while other proteins are washed through. The target protein can then be eluted from the resin by a salt solution or some other agent. This can be a powerful tool for protein purification.
A variety of other analytical techniques rely on antibodies. In each case the antibody is attached to a radioactive label or some other reagent to make it easy to detect. The antibody binds the target protein, and the label reveals its presence in a solution or its location in a gel or even a living cell. Several variations of this procedure are illustrated in Figure 6–9. We shall examine some other aspects of antibodies in chapters to follow; they are of extreme importance in medicine and also tell much about the structure of proteins and the action of genes.
Figure 6–9  Analytical methods based on the interaction of antibodies with antigen. (a) An enzyme-linked immunosorbent assay (ELISA) used in testing for human pregnancy. Human chorionic gonadotropin (hCG), a hormone produced by the placenta, is detectable in maternal urine a few days after conception. In the ELISA, an antibody specific for hCG is attached to the bottom of a well in a plastic tray, to which a few drops of urine are added. If any hCG is present, it will bind to the antibodies. The tube is then washed, and a second antibody (also specific for hCG) is added. This second antibody is linked to an enzyme that catalyzes the conversion of a colorless compound to a colored one; the amount of colored compound produced provides a sensitive measure of the amount of hormone present. The ELISA has been adapted for use in determining the amount of specific proteins in tissue samples, in blood, or in urine.
       (b) Immunoblot (or Western blot) technique. Proteins are separated by electrophoresis, then antibodies are used to determine the presence and size of the proteins. After separation, the proteins are transferred electrophoretically from an SDS polyacrylamide gel to a special paper (which makes them more accessible). Specific, labeled antibody is added, then the paper is washed to remove unbound antibody. The label can be a radioactive element, a fluorescent compound, or an enzyme as in the ELISA. The position of the labeled antibody defines the Mr of the detected protein. All of the proteins are seen in the stained gel; only the protein bound to the antibody is seen in the immunoblot.
       (c) In immunocytochemistry, labeled antibodies are introduced into cells to reveal the subcellular location of a specific protein. Here, fluorescently labeled antibodies and a fluorescence microscope have been used to locate tubulin filaments in a human fibroblast.
urine samples added, hCG, hCG-specific antibody, second antibody with linked enzyme added, color change proportional to amount of hormone present; stained gel, immunoblot
All proteins in all species, regardless of their function or biological activity, are built from the same set of 20 amino acids (Chapter 5). What is it, then, that makes one protein an enzyme, another a hormone, another a structural protein, and still another an antibody? How do they differ chemically? Quite simply, proteins differ from each other because each has a distinctive number and sequence of amino acid residues. The amino acids are the alphabet of protein structure; they can be arranged in an almost infinite number of sequences to make an almost infinite number of different proteins. A specific sequence of amino acids folds up into a unique three-dimensional structure, and this structure in turn determines the function of the protein.
The amino acid sequence of a protein, or its primary structure, can be very informative to a biochemist. No other property so clearly distinguishes one protein from another. This now becomes the focus of the remainder of the chapter. We first consider empirical clues that amino acid sequence and protein function are closely linked, then describe how amino acid sequence is determined, and finally outline the many uses to which this information can be put.
The bacterium E. coli produces about 3,000 different proteins. A human being produces 50,000 to 100,000 different proteins. In both cases, each separate type of protein has a unique structure and this structure confers a unique function. Each separate type of protein also has a unique amino acid sequence. Intuition suggests that the amino acid sequence must play a fundamental role in determining the three-dimensional structure of the protein, and ultimately its function, but is this expectation correct? A quick survey of proteins and how they vary in amino acid sequence provides a number of empirical clues that help substantiate the important relationship between amino acid sequence and biological function. First, as we have already noted, proteins with different functions always have different amino acid sequences. Second, more than 1,400 human genetic diseases have been traced to the production of defective proteins (Table 6–6). Perhaps a third of these proteins are defective because of a single change in the amino acid sequence; hence, if the primary structure is altered, the function of the protein may also be changed. Finally, on comparing proteins with similar functions from different species, we find that these proteins often have similar amino acid sequences. An extreme case is ubiquitin, a 76 amino acid protein involved in regulating the degradation of other proteins. The amino acid sequence of ubiquitin is identical in species as disparate as fruit flies and humans.
Table 6–6  A sampling of genetic diseases linked to loss or defect of a single enzyme or protein
Disease                         Physiological effects                                          Affected enzyme or protein     
Cystic fibrosis Abnormal secretion in lungs, pancreas, sweat Chloride channel
  glands; chronic pulmonary disease generally
  leading to death in children or young adults
Lesch–Nyhan Neurological defects, self-mutilation, mental Hypoxanthine-guanine
  syndrome   retardation   phosphoribosyl transferase
Immunodeficiency Severe loss of immune response Purine nucleoside
  disease     phosphorylase
Immunodeficiency Severe loss of immune response (children Adenosine deaminase
  disease   must live in a sterile bubble)
Gaucher’s disease Erosion of bones, hip joints; sometimes brain Glucocerebrosidase
Gout, primary Overproduction of uric acid resulting in Phosphoribosyl pyrophosphate
  recurring attacks of acute arthritis   synthetase
Rickets, vitamin D- Short stature, convulsions 25-Hydroxycholecalciferol-1-
  dependent     hydroxylase
Familial Atherosclerosis resulting from elevated Low-density lipoprotein
  hypercholesterolemia      cholesterol levels in blood; sometimes early   receptor
  death from heart failure
Tay-Sachs disease Motor weakness, mental deterioration, death Hexosaminidase-A
  by age 3 yr
Sickle-cell anemia Pain, swelling in hands and feet; can lead to Hemoglobin
  sudden severe pain in bones or joints and death 
Is the amino acid sequence absolutely fixed, or invariant, for a particular protein? No; some flexibility is possible. An estimated 20 to 30% of the proteins in humans are polymorphic, having amino acid sequence variants in the human population. Many of these variations in sequence have little or no effect on the function of the protein. Furthermore, proteins that carry out a broadly similar function in distantly related species often differ greatly in overall size and amino acid
sequence. An example is DNA polymerase, the primary enzyme involved in DNA synthesis. The DNA polymerase of a bacterium is very different in much of its sequence from that of a mouse cell.
The amino acid sequence of a protein is inextricably linked to its function. Proteins often contain crucial substructures within their amino acid sequence that are essential to their biological functions. The amino acid sequence in other regions might vary considerably without affecting these functions. The fraction of the sequence that is critical varies from protein to protein, complicating the task of relating sequence to structure, and structure to function. Before we can consider this problem further, however, we must examine how sequence information is obtained.
Frederick Sanger
Two major discoveries in 1953 ushered in the modern era of biochemistry. In that year James D. Watson and Francis Crick deduced the double-helical structure of DNA and proposed a structural basis for the precise replication of DNA (Chapter 12). Implicit in their proposal was the idea that the sequence of nucleotide units in DNA bears encoded genetic information. In that same year, Frederick Sanger worked out the sequence of amino acids in the polypeptide chains of the hormone insulin (Fig. 6–10), surprising many researchers who had long thought that elucidation of the amino acid sequence of a polypeptide would be a hopelessly difficult task. These achievements together suggested that the nucleotide sequence of DNA and the amino acid sequence of proteins were somehow related. Within just over a decade, the nucleotide code that determines the amino acid sequence of protein molecules had been revealed (Chapter 26).
Today the amino acid sequences of thousands of different proteins from many species are known, determined using principles first developed by Sanger. These methods are still in use, although with many variations and improvements in detail.
Figure 6–10  The amino acid sequence of the two chains of bovine insulin, which are joined by disulfide cross-linkages. The A chain is identical in human, pig, dog, rabbit, and sperm whale insulins. The B chains of the cow, pig, dog, goat, and horse are identical. Such identities between similar proteins of different species are discussed later in this chapter.
A chain, B chain
Figure 6–11  Steps in sequencing a polypeptide. (a) Determination of amino acid composition and (b) identification of the amino-terminal residue are the first steps for many polypeptides. Sanger’s method for identifying the amino-terminal residue is shown here. The Edman degradation procedure (c) reveals the entire sequence of a peptide. For shorter peptides, this method alone readily yields the entire sequence, and steps (a) and (b) are often omitted. The latter procedures are useful in the case of larger polypeptides, which are often fragmented into smaller peptides for sequencing (see Fig. 6–13).
Three procedures are used in the determination of the sequence of a polypeptide chain (Fig. 6–11). The first is to hydrolyze it and determine its amino acid composition (Fig. 6–11a). This information is often valuable in later steps, and can also be useful in itself. Because amino acid composition differs from one protein to the next, it can serve as a kind of fingerprint. It can be used, for example, to help determine whether proteins isolated by different laboratories are the same or different.
Often, the next step is to identify the amino-terminal amino acid residue (Fig. 6–11b). For this purpose Sanger developed the reagent 1-fluoro-2,4-dinitrobenzene (FDNB; see Fig. 5–14). Other reagents used to label the amino-terminal residue are dansyl chloride and dabsyl chloride (see Figs. 5–14 and 5–18). The dansyl derivative is highly fluorescent and can be detected and measured in much lower concentrations than dinitrophenyl derivatives. The dabsyl derivative is intensely colored and also provides greater sensitivity than the dinitrophenyl compounds. These methods destroy the polypeptide and their utility is therefore limited to identification of the amino-terminal residue.
To sequence the entire polypeptide, a chemical method devised by Pehr Edman is usually employed. The Edman degradation procedure labels and removes only the amino-terminal residue from a peptide, leaving all other peptide bonds intact (Fig. 6–11c). The peptide is reacted with phenylisothiocyanate, and the amino-terminal residue is ultimately removed as a phenylthiohydantoin derivative. After removal and identifcation of the amino-terminal residue, the new amino-terminal residue so exposed can be labeled, removed, and identified by repeating the same series of reactions. This procedure is repeated until the entire sequence is determined. Refinements of each step permit the sequencing of up to 50 amino acid residues in a large peptide.
The many individual steps and the careful bookkeeping required in the determination of the amino acid sequence of long polypeptide chains are usually carried out by programmed and automated analyzers. The Edman degradation is carried out on a programmed machine, called a sequenator, which mixes reagents in the proper proportions, separates the products, identifies them, and records the results. Such instruments have greatly reduced the time and labor required to determine the amino acid sequence of polypeptides. These methods are extremely sensitive. Often, less than a microgram of protein is sufficient to determine its complete amino acid sequence.
polypeptide, 6 M HCl → free amino acids, HPLC or ion-exchange chromatography → amino acid composition, determine types and amounts of amino acids in polypeptide; FDNB → 2,4-dinitrophenyl derivative of polypeptide, 6 M HCl → 2,4-dinitrophenyl derivative of amino-terminal residue, free amino acids, identify amino-terminal residue of polypeptide; phenylisothiocyanate, 6 M HCl → phenylthiohydantoin amino acid, identify amino-terminal residue, purify and recycle remaining peptide fragment through Edman process
Figure 6–12  Breaking disulfide bonds in proteins. The two common methods are illustrated. Oxidation of cystine with performic acid produces two cysteic acid residues. Reduction by dithiothreitol to form cysteine residues must be followed by further modification of the reactive –SH groups to prevent reformation of the disulfide bond. Acetylation by iodoacetate serves this purpose.
disulfide bond (cystine), oxidation by performic acid → cysteic acid residues; reduction by dithiothreitol, acetylation by iodoacetate → acetylated cysteine residues
The overall accuracy for determination of an amino acid sequence generally declines as the length of the polypeptide increases, especially for polypeptides longer than 50 amino acids. The very large polypeptides found in proteins must usually be broken down into pieces small enough to be sequenced efficiently. There are several steps in this process. First, any disulfide bonds are broken, and the protein is cleaved into a set of specific fragments by chemical or enzymatic methods. Each fragment is then purified, and sequenced by the Edman procedure. Finally, the order in which the fragments appear in the original protein is determined and disulfide bonds (if any) are located.
Breaking Disulfide Bonds  Disulfide bonds interfere with the sequencing procedure. A cystine residue (p. 116) that has one of its peptide bonds cleaved by the Edman procedure will remain attached to the polypeptide. Disulfide bonds also interfere with the enzymatic or chemical cleavage of the polypeptide (described below). Two approaches to irreversible breakage of disulfide bonds are outlined in Figure 6–12.
Cleaving the Polypeptide Chain  Several methods can be used for fragmenting the polypeptide chain. These involve a set of enzymes (proteases) and chemical reagents that cleave peptide chains adjacent to specific amino acid residues (Table 6–7). The digestive enzyme trypsin, for example, catalyzes the hydrolysis of only those peptide bonds in
Table 6–7  The specificity of some important methods for
fragmenting polypeptide chains
Treatment*                Cleavage points†
Trypsin Lys, Arg (C)
Submaxillarus protease Arg (C)
Chymotrypsin Phe, Trp, Tyr (C)
Staphylococcus aureus Asp, Glu (C)
  V8 protease
Asp-N-protease Asp, Glu (N)
Pepsin Phe, Trp, Tyr (N)
Cyanogen bromide Met (C)

* All of the enzymes or reagents listed are available from commercial sources.

† Residues furnishing the primary recognition point for the protease; peptide
bond cleavage occurs either on the carbonyl (C) or amino (N) side of the
indicated group of amino acids.
Figure 6–13  Fragmenting proteins prior to sequencing, and placing peptide fragments in their proper order with overlaps. The one-letter abbreviations for amino acids are given in Table 5–1. In this example, there are only two Cys residues, thus one possibility for location of the disulfide bridge (black bracket). In polypeptides with three or more Cys residues, disulfide bridges can be located as described in the text.
which the carbonyl group is contributed by either a Lys or an Arg residue, regardless of the length or amino acid sequence of the chain. The number of smaller peptides produced by trypsin cleavage can thus be predicted from the total number of Lys or Arg residues in the original polypeptide (Fig. 6–13). A polypeptide with five Lys and/or Arg residues will usually yield six smaller peptides on cleavage with trypsin. Moreover, all except one of these will have a carboxyl-terminal Lys or Arg. The fragments produced by trypsin action are separated by chromatographic or electrophoretic methods.
Sequencing of Peptides  All the peptide fragments resulting from the action of trypsin are sequenced separately by the Edman procedure.
Ordering Peptide Fragments  The order of these trypsin fragments in the original polypeptide chain must now be determined. Another sample of the intact polypeptide is cleaved into small fragments using a different enzyme or reagent, one that cleaves peptide bonds at points other than those cleaved by trypsin. For example, the reagent cyanogen bromide cleaves only those peptide bonds in which the carbonyl group is contributed by Met (Table 6–7). The fragments resulting from this new procedure are then separated and sequenced as before.
The amino acid sequences of each fragment obtained by the two cleavage procedures are examined, with the objective of finding peptides from the second procedure whose sequences establish continuity, because of overlaps, between the fragments obtained by the first cleavage procedure (Fig. 6–13). Overlapping peptides obtained from the second fragmentation yield the correct order of the peptide fragments produced in the first. Moreover, the two sets of fragments can be compared for possible errors in determining the amino acid sequence of each fragment. If the amino-terminal amino acid has been identified before the original cleavage of the protein, this information can be used to establish which fragment is derived from the amino terminus.
If the second cleavage procedure fails to establish continuity between all peptides from the first cleavage, a third or even a fourth cleavage method must be used to obtain a set of peptides that can provide the necessary overlap(s). A variety of proteolytic enzymes with different specificities are available (Table 6–7).
Locating Disulfide Bonds  After sequencing is completed, locating the disulfide bonds requires an additional step. A sample of the protein is again cleaved with a reagent such as trypsin, this time without first breaking the disulfide bonds. When the resulting peptides are separated by electrophoresis and compared with the original set of peptides generated by trypsin, two of the original peptides will be missing and a new, larger peptide will appear. The two missing peptides represent the regions of the intact polypeptide that are linked by a disulfide bond.
procedure, result, conclusion; polypeptide, –S–S–, amino acid analysis → polypeptide has 38 amino acids, trypsin cleave at one R (Arg) and two K (Lys) to give four fragments, cyanogen bromide will cleave at two M (Met) to give three fragments; react with FDNB, hydrolyze, separate amino acids, 2,4-dinitrophenylasparagine detected → N (Asn) is amino-terminal residue; reduce disulfide bonds, –SH, HS–, cleave with trypsin, separate fragments, sequence on sequenator → T-2 placed at amino terminus because it begins with N (Asn), T-3 placed at carboxyl terminus because if does not end with R (Arg) or K (Lys); cleave with cyanogen bromide, separate fragments, sequence on sequenator → C-3 overlaps with T-1 and T-4 allowing them to be ordered; sequence established → amino terminus, carboxyl terminus
The approach outlined above is not the only way to obtain amino acid sequences. The development of rapid DNA sequencing methods (Chapter 12), the elucidation of the genetic code (Chapter 26), and the development of techniques for the isolation of genes (Chapter 28) make it possible to deduce the sequence of a polypeptide by determining the sequence of nucleotides in its gene (Fig. 6–14). The two techniques are complementary. When the gene is available, sequencing the DNA can be faster and more accurate than sequencing the protein. If the gene has not been isolated, direct sequencing of peptides is necessary, and this can provide information (e.g., the location of disulfide bonds) not available in a DNA sequence. In addition, a knowledge of the amino acid sequence can greatly facilitate the isolation of the corresponding gene (Chapter 28).
amino acid sequence (protein), DNA sequence (gene)
Figure 6–14  Correspondence of DNA and amino acid sequences. Each amino acid is encoded by a specific sequence of three nucleotides (triplet) in DNA. The genetic code is described in detail in Chapter 26.
The sequence of amino acids in a protein can offer insights into its three-dimensional structure and its function, cellular location, and evolution. Most of these insights are derived by searching for similarities with other known sequences. Thousands of sequences are known and available in computerized data bases. The comparison of a newly obtained sequence with this large bank of stored sequences often reveals relationships both surprising and enlightening.
The relationship between amino acid sequence and three-dimensional structure, and between structure and function, is not understood in detail. However, a growing number of protein families are being revealed that have at least some shared structural and functional features that can be readily identified on the basis of amino acid sequence similarities alone. For example, there are four major families of proteases, several families of naturally occurring protease inhibitors, a large number of closely related protein kinases, and a similar large number of related protein phosphatases. Individual proteins are generally assigned to families by the degree of similarity in amino acid sequence (identical to other members of the family across 30% or more of the sequence), and proteins in these families generally share at least some structural and functional characteristics. Some families are defined, however, by identities involving only a few amino acids that are critical to a certain function. Many membrane-bound protein receptors share important structural features and have similar amino acid sequences, even though the extracellular molecules they bind are quite different. Even the immunoglobulin family includes a host of extracellular and cell-surface proteins in addition to antibodies.
The similarities may involve the entire protein or may be confined to relatively small segments of it. A number of similar substructures (domains) occur in many functionally unrelated proteins. An example is a 40 to 45 amino acid sequence called the EGF (epidermal growth factor) domain that makes up part of the structure of urokinase, the low-density lipoprotein receptor, several proteins involved in blood clotting, and many others. These domains often fold up into structural configurations that have an unusual degree of stability or that are specialized for a certain environment. Evolutionary relationships can also be inferred from the structural and functional similarities within protein families.
Certain amino acid sequences often serve as signals that determine the cellular location, chemical modification, and half-life of a protein. Special signal sequences, usually at the amino terminus, are used to target certain proteins for export from the cell, while other proteins are distributed to the nucleus, the cell surface, the cytosol, and other cellular locations. Other sequences act as attachment sites for prosthetic groups, such as glycosyl groups in glycoproteins and lipids in lipoproteins. Some of these signals are well characterized, and are easily recognized if they occur in the sequence of a newly discovered protein.
The probability that information about a new protein can be deduced from its primary structure improves constantly with the almost daily addition to the number of published amino acid sequences stored in shared databanks.
Figure 6–15  The amino acid sequence of human cytochrome c. Amino acid substitutions found at different positions in the cytochrome c of other species are listed below the sequence of the human protein. The amino acids are color-coded to help distinguish conservative and nonconservative substitutions: invariant amino acids are shaded in yellow, conservative amino acid substitutions are shaded in blue, and nonconservative substitutions are unshaded. X is an unusual amino acid, trimethyllysine. The one-letter abbreviations for amino acids are used here (see Table 5–1).
Several important conclusions have come from study of the amino acid sequences of homologous proteins from different species. Homologous proteins are those that are evolutionarily related. They usually perform the same function in different species; an example is hemoglobin, which has the same oxygen-transport function in different vertebrates. Homologous proteins from different species often have polypeptide chains that are identical or nearly identical in length. Many positions in the amino acid sequence are occupied by the same amino acid in all species and are thus called invariant residues. But in other positions there may be considerable variation in the amino acid from one species to another; these are called variable residues.
The functional significance of sequence homology can be illustrated by cytochrome c, an iron-containing mitochondrial protein that transfers electrons during biological oxidations in eukaryotic cells. The polypeptide chain of this protein has a molecular weight of about 13,000 and has about 100 amino acid residues in most species. The amino acid sequences of cytochrome c from over 60 different species have been determined, and 27 positions in the chain of amino acid residues are invariant in all species tested (Fig. 6–15), suggesting that they are the most important residues specifying the biological activity of cytochrome c. The residues in other positions in the chain exhibit some interspecies variation. There are clear gradations in the number of changes observed in the variable residues. In some positions, all substitutions involve similar amino acid residues (e.g., Arg will replace Lys, both of which are positively charged); these are called conservative substitutions. At other positions the substitutions are more random. As we will show in the next chapter, the polypeptide chains of proteins are folded into characteristic and specific conformations and these conformations depend on amino acid sequence. Clearly, the invariant residues are more critical to the structure and function of a protein than the variable ones. Recognizing which amino acids fall into each category is an important step in deciphering the complicated question of how amino acid sequence is translated into a specific three-dimensional structure.
The variable amino acids provide information of another sort. Evolution is sometimes regarded as a theory that is accepted but difficult to test, yet the phylogenetic trees established by taxonomy have been tested and experimentally confirmed through biochemistry. The examination
mammals, Homo sapiens, chimp, monkey, mouse, rabbit, kangaroo, horse, pig, sheep, cow, dog, seal, bat, hippopotamus; birds and reptiles, ostrich, emu, chicken, turkey, penguin, pigeon, duck, turtle, rattlesnake; bony fishes, tuna, carp; cartilaginous fishes, dogfish, lamprey; amphibians, bullfrog; starfish, earthworm; insects, moth, hawkmoth, honey bee, fly, locust; fungi, yeast, Candida, Neurospora, Humicola; plants, sunflower, sesame, wheat, rice, spinach, ginkgo
Figure 6–16  Main branches of the evolutionary tree constructed from the number of amino acid differences between cytochrome c molecules of different species. The numbers represent the number of residues by which the cytochrome c of a given line of organism differs from its ancestors.
of sequences of cytochrome c and other homologous proteins has led to an important conclusion: the number of residues that differ in homologous proteins from any two species is in proportion to the phylogenetic difference between those species. For example, 48 amino acid residues differ in the cytochrome c molecules of the horse and of yeast, which are very widely separated species, whereas only two residues differ in the cytochrome c of the much more closely related duck and chicken. In fact, the cytochrome c molecule has identical amino acid sequences in the chicken and the turkey, and in the pig, cow, and sheep. Information on the number of residue differences between homologous proteins of different species allows the construction of evolutionary maps that show the origin and sequence of development of different animals and plants during the evolution of species (Fig. 6–16). The relationships established by taxonomy and biochemistry agree well.
Cells generally contain thousands of different proteins, each with a different function or biological activity. These functions include enzymatic catalysis, molecular transport, nutrition, cell or organismal motility, structural roles, organismal defense, regulation, and many others. Proteins consist of very long polypeptide chains having from 100 to over 2,000 amino acid residues joined by peptide linkages. Some proteins have several polypeptide chains, which are then referred to as subunits. Simple proteins yield only amino acids on hydrolysis; conjugated proteins contain in addition some other component, such as a metal ion or organic prosthetic group.
Proteins are purified by taking advantage of properties in which they differ, such as size, shape, binding affinities, charge, etc. Purification also requires a method for quantifying or assaying a particular protein in the presence of others. Proteins can be both separated and visualized by electrophoretic methods. Antibodies that specifically bind a certain protein can be used to detect and locate that protein in a solution, a gel, or even in the interior of a cell.
All proteins are made from the same set of 20 amino acids. Their differences in function result from differences in the composition and sequence of their amino acids. The amino acid sequences of polypeptide chains can be established by fragmenting them into smaller pieces using several specific reagents, and determining the amino acid sequence of each fragment by the Edman degradation procedure. The sequencing of suitably sized peptide fragments has been automated. The peptide fragments are then placed in the correct order by finding sequence overlaps between fragments generated by different methods. Protein sequences can also be deduced from the nucleotide sequence of the corresponding gene in the DNA. The amino acid sequence can be compared with the thousands of known sequences, often revealing insights into the structure, function, cellular location, and evolution of the protein.
Homologous proteins from different species show sequence homology: certain positions in the polypeptide chains contain the same amino acids, regardless of the species. In other positions the amino acids may differ. The invariant residues are evidently essential to the function of the protein. The degree of similarity between amino acid sequences of homologous proteins from different species correlates with the evolutionary relationship of the species.
Further Reading
See Chapter 5 for additional useful references.
Properties of Proteins
Creighton, T.E. (1984) Proteins: Structures and Molecular Properties, W.H. Freeman and Company, New York. 
Dickerson, R.E. & Geis, I. (1983) Proteins: Structure, Function, and Evolution, 2nd edn, The Benjamin/Cummings Publishing Company, Menlo Park, CA. 
A beautifully illustrated introduction to proteins.
Doolittle, R.F. (1985) Proteins. Sci. Am. 253 (October), 88–99. 
An overview that highlights euolutionary relationships.
Srinavasan, P.R., Fruton, J.S., & Edsall, J.T. (eds) (1979) The Origins of Modern Biochemistry: A Retrospect on Proteins. Ann. N.Y. Acad. Sci. 325. 
A collection of very interesting articles on the history of protein research.
Structure and Function of Proteins. (1989) Trends Biochem. Sci. 14 (July). 
A special issue devoted to reviews on protein chemistry and protein structure.
Working with Proteins
Hirs, C.H.W. & Timasheff, S.N. (eds) (1983) Methods in Enzymology, Vol. 91, Part I: Enzyme Structure, Academic Press, Inc., New York. 
An excellent collection of authoritative articles on techniques in protein chemistry. Includes information on sequencing.
Kornberg, A. (1990) Why purify enzymes? In Methods in Enzymology, Vol. 182: Guide to Protein Purification, (Deutscher, M.P., ed), pp. 1-5, Academic Press, Inc., New York. 
The critical role of classical biochemical methods in a new age.
O’Farrell, P.H. (1975) High resolution two-dimensional electrophoresis of proteins. J. Biol. Chem. 250, 4007–4021. 
An interesting attempt to count all the proteins in the
E. coli cell.
Plummer, David T. (1987) An Introduction to Practical Biochemistry, 3rd edn, McGraw-Hill, London. 
Good descriptions of many techniques for beginning students.
Scopes, R.K. (1987) Protein Purification: Principles and Practice, 2nd edn, Springer-Verlag, New York. 
Tonegawa, S. (1985) The molecules of the immune system. Sci. Am. 253 (October), 122–131. 
The Covalent Structure of Proteins
Dickerson, R.E. (1972) The structure and history of an ancient protein. Sci. Am. 226 (April), 58–72. 
A nice summary of information gleaned from interspecies comparisons of cytochrome
c sequences.
Doolittle, R. (1981) Similar amino acid sequences: chance or common ancestry? Science 214, 149–159. 
A good discussion of what can be learned by comparing amino acid sequences.
Hunkapiller, M.W., Strickler, J.E., & Wilson, K.J. (1984) Contemporary methodology for protein structure determination. Science 226, 304–311. 
Reidhaar-Olson, J.F. & Sauer, R.T. (1988) Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences. Science 241, 53–57. 
A systematic study of possible amino acid substitutions in a short segment of one protein.
Wilson, A.C. (1985) The molecular basis of evolution. Sci. Am. 253 (October), 164–173. 
1. How Many β-Galactosidase Molecules Are Present in an E. coli Cell?  E. coli is a rod-shaped bacterium 2 μm long and 1 μm in diameter. When grown on lactose (a sugar found in milk), the bacterium synthesizes the enzyme β-galactosidase (Mr 450,000), which catalyzes the breakdown of lactose. The average density of the bacterial cell is 1.2 g/mL, and 14% of its total mass is soluble protein, of which 1.0% is β-galactosidase. Calculate the number of β-galactosidase molecules in an E. coli cell grown on lactose.
2. The Number of Tryptophan Residues in Bovine Serum Albumin  A quantitative amino acid analysis reveals that bovine serum albumin contains 0.58% by weight of tryptophan, which has a molecular weight of 204.
       (a) Calculate the minimum molecular weight of bovine serum albumin (i.e., assuming there is only one tryptophan residue per protein molecule).
       (b) Gel filtration of bovine serum albumin gives a molecular weight estimate of about 70,000. How many tryptophan residues are present in a molecule of serum albumin?
3. The Molecular Weight of Ribonuclease  Lysine makes up 10.5% of the weight of ribonuclease. Calculate the minimum molecular weight of ribonuclease. The ribonuclease molecule contains ten lysine residues. Calculate the molecular weight of ribonuclease.
4. The Size of Proteins  What is the approximate molecular weight of a protein containing 682 amino acids in a single polypeptide chain?
5. Net Electric Charge of Peptides  A peptide isolated from the brain has the sequence
Determine the net charge on the molecule at pH 3. What is the net charge at pH 5.5? At pH 8? At pH 11? Estimate the pI for this peptide. (Use pKa values for side chains and terminal amino and carboxyl groups as given in Table 5–1.)
6. The Isoelectric Point of Pepsin  Pepsin of gastric juice (pH ≈ 1.5) has a pI of about 1, much lower than that of other proteins (see Table 6–5). What functional groups must be present in relatively large numbers to give pepsin such a low pI? What amino acids can contribute such groups?
7. The Isoelectric Point of Histones  Histones are proteins of eukaryotic cell nuclei. They are tightly bound to deoxyribonucleic acid (DNA), which has many phosphate groups. The pI of histones is very high, about 10.8. What amino acids must be present in relatively large numbers in histones? In what way do these residues contribute to the strong binding of histones to DNA?
8. Solubility of Polypeptides  One method for separating polypeptides makes use of their differential solubilities. The solubility of large polypeptides in water depends upon the relative polarity of their R groups, particularly on the number of ionized groups: the more ionized groups there are, the more soluble the polypeptide. Which of each pair of polypeptides below is more soluble at the indicated pH?
       (a) (Gly)20 or (Glu)20 at pH 7.0
       (b) (Lys–Ala)3 or (Phe–Met)3 at pH 7.0
       (c) (Ala–Ser–Gly)5 or (Asn–Ser–His)5 at pH 6.0
       (d) (Ala–Asp–Gly)5 or (Asn–Ser–His)5 at pH 3.0
9. Purification of an Enzyme  A biochemist discovers and purifies a new enzyme, generating the purification table below:

       (a) From the information given in the table, calculate the specific activity of the enzyme solution after each purification procedure.
       (b) Which of the purification procedures used for this enzyme is most effective (i.e., gives the greatest increase in purity)?
       (c) Which of the purification procedures is least effective?
       (d) Is there any indication in this table that the enzyme is now pure? What else could be done to estimate the purity of the enzyme preparation?
10. Fragmentation of a Polypeptide Chain by Proteolytic Enzymes  Trypsin and chymotrypsin are specific enzymes that catalyze the hydrolysis of polypeptides at specific locations (Table 6–7). The sequence of the B chain of insulin is shown below. Note that the cystine cross-linkage between the A and B chains has been cleaved through the action of performic acid (see Fig. 6–12).
Indicate the points in the B chain that are cleaved by (a) trypsin and (b) chymotrypsin. Note that these proteases will not remove single amino acids from either end of a polypeptide chain.
11. Sequence Determination of the Brain Peptide Leucine Enkephalin  A group of peptides that influence nerve transmission in certain parts of the brain has been isolated from normal brain tissue. These peptides are known as opioids, because they bind to specific receptors that bind opiate drugs, such as morphine and naloxone. Opioids thus mimic some of the properties of opiates. Some researchers consider these peptides to be the brain’s own pain killers. Using the information below, determine the amino acid sequence of the opioid leucine enkephalin. Explain how your structure is consistent with each piece of information.
       (a) Complete hydrolysis by 1 M HCl at 110 °C followed by amino acid analysis indicated the presence of Gly, Leu, Phe, and Tyr, in a 2:1:1:1 molar ratio.
       (b) Treatment of the peptide with 1-fluoro-2,4-dinitrobenzene followed by complete hydrolysis and chromatography indicated the presence of the 2,4-dinitrophenyl derivative of tyrosine. No free tyrosine could be found.
       (c) Complete digestion of the peptide with pepsin followed by chromatography yielded a dipeptide containing Phe and Leu, plus a tripeptide containing Tyr and Gly in a 1:2 ratio.
12. Structure of a Peptide Antibiotic from Bacillus brevis  Extracts from the bacterium Bacillus brevis contain a peptide with antibiotic properties. Such peptide antibiotics form complexes with metal ions and apparently disrupt ion transport across the cell membrane, killing certain bacterial species. The structure of the peptide has been determined from the following observations.
       (a) Complete acid hydrolysis of the peptide followed by amino acid analysis yielded equimolar amounts of Leu, Orn, Phe, Pro, and Val. Orn is ornithine, an amino acid not present in proteins but present in some peptides. It has the structure

       (b) The molecular weight of the peptide was estimated as about 1,200.
       (c) When treated with the enzyme carboxypeptidase, the peptide failed to undergo hydrolysis.
       (d) Treatment of the intact peptide with 1-fluoro-2,4-dinitrobenzene, followed by complete hydrolysis and chromatography, yielded only free amino acids and the following derivative:

(Hint: Note that the 2,4-dinitrophenyl derivative involves the amino group of a side chain rather than the α-amino group.)
       (e) Partial hydrolysis of the peptide followed by chromatographic separation and sequence analysis yielded the di- and tripeptides below (the amino-terminal amino acid is always at the left):

       Leu–Phe     Phe–Pro     Orn–Leu     Val–Orn
       Val–Orn–Leu     Phe–Pro–Val     Pro–Val–Orn
Given the above information, deduce the amino acid sequence of the peptide antibiotic. Show your reasoning. When you have arrived at a structure, go back and demonstrate that it is consistent with each experimental observation.
Chapter 7
The Three-Dimensional Structure of Proteins
Figure 7–1  The structure of the enzyme chymotrypsin, a globular protein. A molecule of glycine (blue) is shown for size comparison.
The covalent backbone of proteins is made up of hundreds of individual bonds. If free rotation were possible around even a fraction of these bonds, proteins could assume an almost infinite number of three-dimensional structures. Each protein has a specific chemical or structural function, however, strongly suggesting that each protein has a unique three-dimensional structure (Fig. 7–1). The simple fact that proteins can be crystallized provides strong evidence that this is the case. The ordered arrays of molecules in a crystal can generally form only if the molecular units making up the crystal are identical. The enzyme urease (Mr 483,000) was among the first proteins crystallized, by James Sumner in 1926. This accomplishment demonstrated dramatically that even very large proteins are discrete chemical entities with unique structures, and it revolutionized thinking about proteins.
In this chapter, we will explore the three-dimensional structure of proteins, emphasizing several principles. First, the three-dimensional structure of a protein is determined by its amino acid sequence. Second, the function of a protein depends upon its three-dimensional structure. Third, the three-dimensional structure of a protein is unique, or nearly so. Fourth, the most important forces stabilizing the specific three-dimensional structure maintained by a given protein are noncovalent interactions. Finally, even though the structure of proteins is complicated, several common patterns can be recognized.
The relationship between the amino acid sequence and the three-dimensional structure of a protein is an intricate puzzle that has yet to be solved in detail. Polypeptides with very different amino acid sequences sometimes assume similar structures, and similar amino acid sequences sometimes yield very different structures. To find and understand patterns in this biochemical labyrinth requires a renewed appreciation for fundamental principles of chemistry and physics.
The spatial arrangement of atoms in a protein is called a conformation. The term conformation refers to a structural state that can, without breaking any covalent bonds, interconvert with other structural states. A change in conformation could occur, for example, by rotation about single bonds. Of the innumerable conformations that are theoretically possible in a protein containing hundreds of single bonds, one generally predominates. This is usually the conformation that is
thermodynamically the most stable, having the lowest Gibbs’ free energy (G). Proteins in their functional conformation are called native proteins.
What principles determine the most stable conformation of a protein? Although protein structures can seem hopelessly complex, close inspection reveals recurring structural patterns. The patterns involve different levels of structural complexity, and we now turn to a biochemical convention that serves as a framework for much of what follows in this chapter.
Figure 7–2  Levels of structure in proteins. The primary structure consists of a sequence of amino acids linked together by covalent peptide bonds, and includes any disulfide bonds. The resulting polypeptide can be coiled into an α helix, one form of secondary structure. The helix is a part of the tertiary structure of the folded polypeptide, which is itself one of the subunits that make up the quaternary structure of the multimeric protein, in this case hemoglobin.
Conceptually, protein structure can be considered at four levels (Fig. 7–2). Primary structure includes all the covalent bonds between amino acids and is normally defined by the sequence of peptide-bonded amino acids and locations of disulfide bonds. The relative spatial arrangement of the linked amino acids is unspecified.
primary structure, amino acids; secondary structure, α helix; tertiary structure, polypeptide chain; quaternary structure, assembled subunits
Polypeptide chains are not free to take up any three-dimensional structure at random. Steric constraints and many weak interactions stipulate that some arrangements will be more stable than others. Secondary structure refers to regular, recurring arrangements in space of adjacent amino acid residues in a polypeptide chain. There are a few common types of secondary structure, the most prominent being the α helix and the β conformation. Tertiary structure refers to the spatial relationship among all amino acids in a polypeptide; it is the complete three-dimensional structure of the polypeptide. The boundary between secondary and tertiary structure is not always clear. Several different types of secondary structure are often found within the three-dimensional structure of a large protein. Proteins with several polypeptide chains have one more level of structure: quaternary structure, which refers to the spatial relationship of the polypeptides, or subunits, within the protein.
Continued advances in the understanding of protein structure, folding, and evolution have made it necessary to define two additional structural levels intermediate between secondary and tertiary structure. A stable clustering of several elements of secondary structure is sometimes referred to as supersecondary structure. The term is used to describe particularly stable arrangements that occur in many
Figure 7–3  The different structural domains in the polypeptide troponin C, a calcium-binding protein associated with muscle. The separate calcium-binding domains, indicated in blue and purple, are connected by a long α helix, shown in white.
different proteins and sometimes many times in a single protein. A somewhat higher level of structure is the domain. This refers to a compact region, including perhaps 40 to 400 amino acids, that is a distinct structural unit within a larger polypeptide chain. A polypeptide that is folded into a dumbbell-like shape might be considered to have two domains, one at either end. Many domains fold independently into thermodynamically stable structures. A large polypeptide chain can contain several domains that often are readily distinguishable within the overall structure (Fig. 7–3). In some cases the individual domains have separate functions. As we will see, important patterns exist at each of these levels of structure that provide clues to understanding the overall structure of large proteins.
The native conformation of a protein is only marginally stable; the difference in free energy between the folded and unfolded states in typical proteins under physiological conditions is in the range of only 20 to 65 kJ/mol. A given polypeptide chain can theoretically assume countless different conformations, and as a result the unfolded state of a protein is characterized by a high degree of conformational entropy. This entropy, and the hydrogen-bonding interactions of many groups in the polypeptide chain with solvent (water), tend to maintain the unfolded state. The chemical interactions that counteract these effects and stabilize the native conformation include disulfide bonds and the weak (noncovalent) interactions described in Chapter 4: hydrogen bonds, and hydrophobic, ionic, and van der Waals interactions. An appreciation of the role of these weak interactions is especially important to understanding how polypeptide chains fold into specific secondary, tertiary, and quaternary structures.
Every time a bond is formed between two atoms, some free energy is released in the form of heat or entropy. In other words, the formation of bonds is accompanied by a favorable (negative) change in free energy. The ΔG for covalent bond formation is generally in the range of –200 to –460 kJ/mol. For weak interactions, ΔG = –4 to –30 kJ/mol. Although covalent bonds are clearly much stronger, weak interactions predominate as a stabilizing force in protein structure because of their number. In general, the protein conformation with the lowest free energy (i.e., the most stable) is the one with the maximum number of weak interactions.
The stability of a protein is not simply the sum of the free energies of formation of the many weak interactions within it, however. We have already noted that the stability of proteins is marginal. Every hydrogen-bonding group in a polypeptide chain was hydrogen bonded to water prior to folding. For every hydrogen bond formed in a protein, hydrogen bonds (of similar strength) between the same groups and water were broken. The net stability contributed by a given weak interaction, or the difference in free energies of the folded and unfolded state, is close to zero. We must therefore explain why the native conformation of a protein is favored. The contribution of weak interactions to protein stability can be understood in terms of the properties of water (Chapter 4). Pure water contains a network of hydrogen-bonded water molecules. No other molecule has the hydrogen-bonding potential of water, and other molecules present in an aqueous solution will disrupt
the hydrogen bonding of water to some extent. Optimizing the hydrogen bonding of water around a hydrophobic molecule results in the formation of a highly structured shell or solvation layer of water in the immediate vicinity, resulting in an unfavorable decrease in the entropy of water. The association among hydrophobic or nonpolar groups results in a decrease in this structured solvation layer, or a favorable increase in entropy. As described in Chapter 4, this entropy term is the major thermodynamic driving force for the association of hydrophobic groups in aqueous solution, and hydrophobic amino acid side chains therefore tend to be clustered in a protein’s interior, away from water.
The formation of hydrogen bonds and ionic interactions in a protein is also driven largely by this same entropic effect. Polar groups can generally form hydrogen bonds with water and hence are soluble in water. However, the number of hydrogen bonds per unit mass is generally greater for pure water than for any other liquid or solution, and there are limits to the solubility of even the most polar molecules because of the net decrease in hydrogen bonding that occurs when they are present. Therefore, a solvation shell of structured water will also form to some extent around polar molecules. Even though the energy of formation of an intramolecular hydrogen bond or ionic interaction between two polar groups in a macromolecule is largely canceled out by the elimination of such interactions between the same groups and water, the release of structured water when the intramolecular interaction is formed provides an entropic driving force for folding. Most of the net change in free energy that occurs when weak interactions are formed within a protein is therefore derived from the increase in entropy in the surrounding aqueous solution.
Of the different types of weak interactions, hydrophobic interactions are particularly important in stabilizing a protein conformation; the interior of a protein is generally a densely packed core of hydrophobic amino acid side chains. It is also important that any polar or charged groups in the protein interior have suitable partners for hydrogen bonding or ionic interactions. One hydrogen bond makes only a small apparent contribution to the stability of a native structure, but the presence of a single hydrogen-bonding group without a partner in the hydrophobic core of a protein can be so destabilizing that conformations containing such a group are often thermodynamically untenable.
Most of the structural patterns outlined in this chapter reflect these two simple rules: (1) hydrophobic residues must be buried in the protein interior and away from water, and (2) the number of hydrogen bonds must be maximized. Insoluble proteins and proteins within membranes (Chapter 10) follow somewhat different rules because of their function or their environment, but weak interactions are still critical structural elements.
Several types of secondary structure are particularly stable and occur widely in proteins. The most prominent are the α helix and β conformations described below. Using fundamental chemical principles and a few experimental observations, Linus Pauling and Robert Corey predicted the existence of these secondary structures in 1951, several years before the first complete protein structure was elucidated.
In considering secondary structure, it is useful to classify proteins into two major groups: fibrous proteins, having polypeptide chains arranged in long strands or sheets, and globular proteins, with polypeptide chains folded into a spherical or globular shape. Fibrous proteins play important structural roles in the anatomy and physiology of vertebrates, providing external protection, support, shape, and form. They may constitute one-half or more of the total body protein in larger animals. Most enzymes and peptide hormones are globular proteins. Globular proteins tend to be structurally complex, often containing several types of secondary structure; fibrous proteins usually consist largely of a single type of secondary structure. Because of this structural simplicity, certain fibrous proteins played a key role in the development of the modern understanding of protein structure and provide particularly clear examples of the relationship between structure and function; they are considered in some detail after the general discussion of secondary structure.
amino terminus, carboxyl terminus
Pauling and Corey began their work on protein structure in the late 1930s by first focusing on the structure of the peptide bond. The α carbons of adjacent amino acids are separated by three covalent bonds, arranged Cα–C–N–Cα. X-ray diffraction studies of crystals of amino acids and of simple dipeptides and tripeptides demonstrated that the amide C–N bond in a peptide is somewhat shorter than the C–N bond in a simple amine and that the atoms associated with the bond are coplanar. This indicated a resonance or partial sharing of two pairs of electrons between the carbonyl oxygen and the amide nitrogen (Fig. 7–4a).
Figure 7–4  (a) The planar peptide group. Each peptide bond has some double-bond character due to resonance and cannot rotate. The carbonyl oxygen has a partial negative charge and the amide nitrogen a partial positive charge, setting up a small electric dipole. Note that the oxygen and hydrogen atoms in the plane are on opposite sides of the C–N bond. This is the trans configuration. Virtually all peptide bonds in proteins occur in this configuration, although an exception is noted in Fig. 7–10. (b) Three bonds separate sequential Cα carbons in a polypeptide chain. The N–Cα and Cα–C bonds can rotate, with bond angles designated Φ and ψ, respectively. (c) Limited rotation can occur around two of the three types of bonds in a polypeptide chain. The C–N bonds in the planar peptide groups (shaded in blue), which make up one-third of all the backbone bonds, are not free to rotate. Other single bonds in the backbone may also be rotationally hindered, depending on the size and charge of the R groups. (d) By convention, Φ and ψ are both defined as 0° when the two peptide bonds flanking an α carbon are in the same plane. In a protein, this conformation is prohibited by steric overlap between a carbonyl oxygen and an α-amino hydrogen atom.
The oxygen has a partial negative charge and the nitrogen a partial positive charge, setting up a small electric dipole. The four atoms of the peptide group lie in a single plane, in such a way that the oxygen atom of the carbonyl group and the hydrogen atom of the amide nitrogen are trans to each other. From these studies Pauling and Corey concluded that the amide C–N bonds are unable to rotate freely because of their partial double-bond character. The backbone of a polypeptide chain can thus be pictured as a series of rigid planes separated by substituted methylene groups, –CH(R)– (Fig. 7–4c). The rigid peptide bonds limit the number of conformations that can be assumed by a polypeptide chain.
Rotation is permitted about the N–Cα and the Cα–C bonds. By convention the bond angles resulting from rotations are labeled Φ (phi) for the N–Cα bond and ψ (psi) for the Cα–C bond. Again by convention, both Φ and ψ are defined as 0° in the conformation in which the two peptide bonds connected to a single α carbon are in the same plane, as shown in Figure 7–4d. In principle, Φ and ψ can have any value between –180° and +180° but many values of Φ and ψ are prohibited by steric interference between atoms in the polypeptide backbone and amino acid side chains. The conformation in which Φ and ψ are both 0° is prohibited for this reason; this is used merely as a reference point for describing the angles of rotation.
ψ (degrees), Φ (degrees)
Every possible secondary structure is described complet