Chapter 33 — Amino Acids, Peptides, and Proteins: The Chemistry of Life's Machines

Open Organic Chemistry Project

17 min read

> "Proteins are the workers of the cell. Every reaction in your body that does anything — moves a muscle, perceives a photon, copies DNA, breaks down a sugar — is catalyzed by a protein. Master amino acid chemistry, and you master the molecular...

In This Chapter

33.1 Amino acid structure and the 20 proteinogenic amino acids
33.2 Zwitterions and the isoelectric point
33.3 Peptide bond formation
33.4 Solid-phase peptide synthesis (SPPS)
33.5 Protein structure levels
33.6 Protein folding: thermodynamics
33.7 AlphaFold and the structure prediction revolution
33.8 Enzyme catalysis: the serine protease example
33.9 Spectroscopy of proteins
33.10 Summary

Chapter 33 — Amino Acids, Peptides, and Proteins: The Chemistry of Life's Machines

"Proteins are the workers of the cell. Every reaction in your body that does anything — moves a muscle, perceives a photon, copies DNA, breaks down a sugar — is catalyzed by a protein. Master amino acid chemistry, and you master the molecular workforce of biology." — paraphrase from a biochemistry text

"A protein is a polyamide. Its sequence is written by 20 different chemical letters. Read the sequence, predict the structure, predict the function — that's the central problem of molecular biology, and it has been solved (mostly) in the last decade by AlphaFold."

This chapter applies Part VI carbonyl chemistry to the second great class of biological macromolecules: amino acids and proteins. Amino acids are α-amino carboxylic acids; their condensation through amide bonds (Section 26.6) gives peptides and proteins. Proteins fold into three-dimensional structures, and those structures determine catalytic, structural, and regulatory functions.

Carbohydrates (Ch 32) are biology's energy currency and structural material. Proteins (Ch 33) are biology's catalysts, motors, and machinery. The chemistry of both rests on the carbonyl framework of Part VI.

By the end of this chapter you should be able to: - Recognize the 20 standard amino acids and classify their side chains. - Predict the protonation state of each amino acid (and short peptide) at any pH using pKas. - Calculate the isoelectric point (pI) of an amino acid or peptide. - Draw and understand peptide bond formation and the SPPS workflow (Boc/Fmoc strategy). - Recognize the four levels of protein structure (primary, secondary, tertiary, quaternary). - Explain protein folding using hydrophobic collapse, hydrogen bonding, and disulfide bonds. - Understand enzyme catalysis with the serine protease catalytic triad as a model. - Appreciate how AlphaFold has changed structural biology.

33.1 Amino acid structure and the 20 proteinogenic amino acids

An amino acid is an organic molecule containing both an amine group (-NH₂) and a carboxylic acid group (-COOH). In biology, the proteinogenic amino acids are α-amino acids: the amine and carboxyl are on the same carbon (the α-carbon). The general structure:

$$H_2N-CHR-COOH$$

where R is the side chain (different for each amino acid). The α-carbon has 4 substituents: $-H$, $-NH_2$, $-COOH$, $-R$ — making it a stereocenter (except for glycine, where R = H).

L-configuration

All natural proteinogenic amino acids are L-configured (with one important exception: glycine has R = H, so it is achiral). The L-designation comes from the analogy with L-glyceraldehyde, the configuration where the α-amine is on the left in Fischer projection.

The (R)/(S) designation depends on side-chain priority — most amino acids are (S), but cysteine (with its sulfur) is (R) due to priority differences. The L vs (R)/(S) systems can disagree, but L is the universal biological designation.

Why L? The chemical reason is unclear (chirality of biology was likely set early in evolution and propagated). Bacteria do produce some D-amino acids (e.g., D-Ala in cell walls), but proteins use exclusively L.

The 20 proteinogenic amino acids

The 20 standard amino acids are encoded directly by DNA codons. Memorize at least their 3-letter codes and side-chain properties.

Group	Names	3-Letter	Properties
Nonpolar aliphatic	glycine, alanine, valine, leucine, isoleucine, methionine, proline	Gly, Ala, Val, Leu, Ile, Met, Pro	Hydrophobic; cluster inside proteins
Aromatic	phenylalanine, tyrosine, tryptophan	Phe, Tyr, Trp	Hydrophobic; tyrosine has pKa ~10 (slightly acidic)
Polar uncharged	serine, threonine, cysteine, asparagine, glutamine	Ser, Thr, Cys, Asn, Gln	Hydrogen-bond donors/acceptors
Positively charged	lysine, arginine, histidine	Lys, Arg, His	Side-chain pKaH 10-12; Lys, 12; Arg; ~6, His
Negatively charged	aspartate, glutamate	Asp, Glu	Side-chain pKa ~4; deprotonated at pH 7

Memorize: - Gly (G): no side chain (just H). - Ala (A): -CH₃ side chain. - Pro (P): the cyclic amino acid; the α-N is part of a 5-membered ring (forces a kink in the protein backbone). - Cys (C): -CH₂SH (thiol; can form disulfide bonds). - Met (M): -CH₂CH₂SCH₃ (the start codon). - Ser, Thr (S, T): hydroxyl-containing. - Asp, Glu (D, E): carboxylate-containing. - Lys, Arg (K, R): basic with multiple atoms. - His (H): imidazole; pKa near physiological pH. - Phe, Tyr, Trp (F, Y, W): aromatic.

This vocabulary is essential for the rest of biology.

33.2 Zwitterions and the isoelectric point

Free amino acids in water exist as zwitterions: the α-carboxylic acid is deprotonated to -COO⁻ (pKa ~2), and the α-amine is protonated to -NH₃⁺ (pKa ~9.5). The molecule has no net charge — but two charges, one positive (on N) and one negative (on the carboxylate).

The two pKas are far apart, so the zwitterion is stable across a wide pH range. At low pH, the COOH is also protonated → net +1. At high pH, the amine is also deprotonated → net -1.

Calculating pI

The isoelectric point (pI) is the pH at which the molecule has zero net charge. For an amino acid with no ionizable side chain:

$$pI = \frac{1}{2}(pK_{a,COOH} + pK_{a,NH_3^+})$$

For glycine: pI = (2.34 + 9.60)/2 = 5.97. For alanine: similar, pI ≈ 6.0.

For amino acids with acidic side chains (Asp, Glu), the pI is lower because the side-chain COOH contributes another deprotonatable group: - Aspartate: pI = (1.99 + 3.86)/2 = 2.93. Note the side-chain COOH at pKa 3.86 between the α-COOH and the α-NH₃⁺.

For amino acids with basic side chains (Lys, Arg, His), the pI is higher: - Lysine: pI = (8.95 + 10.53)/2 = 9.74. Note the side-chain pKaH 10.53 above α-amine 8.95. - Arginine: pI ≈ 10.76. - Histidine: pI ≈ 7.59.

Why pI matters

At the pI: - The molecule has zero net charge. - Solubility is minimized (at the pI, electrostatic repulsion between molecules is reduced; some amino acids precipitate). - Electrophoresis: at the pI, the molecule does not migrate in an electric field.

Isoelectric focusing uses a pH gradient gel to separate proteins by pI. Each protein migrates to the pH where its pI matches; very high-resolution separation.

33.3 Peptide bond formation

The peptide bond is an amide between the α-COOH of one amino acid and the α-NH₂ of the next. Chemistry of amide formation: Section 26.6.

The forward reaction: $$\text{R}_1-COOH + H_2N-R_2 \to R_1-CO-NH-R_2 + H_2O$$

In water, this reaction is slow and unfavorable (low equilibrium constant for amide formation from free COOH + amine). To make a peptide bond, the COOH must be activated (Section 26.6): - In biology: aminoacyl-tRNA + ribosome (the COOH becomes an aminoacyl-AMP, then a tRNA-bound aminoacyl, which is the activated form for peptide bond formation). - In vitro: DCC, EDC, HBTU, HATU, or related coupling reagents.

Properties of the peptide bond

From Ch 24 case study 2, recall: 1. Planarity: the 6 atoms (Cα-C-O-N-H-Cα) lie nearly coplanar due to the C-N partial double bond from resonance. 2. Restricted rotation: the C-N bond is partly double; rotation is hindered (~20 kcal/mol barrier). 3. Trans preference: the two Cα atoms across the peptide bond are usually trans (~99%). 4. N-H is a good H-bond donor: pKa ~17. 5. Slow hydrolysis: half-life ~600 years at neutral pH (without enzyme).

The peptide chain backbone

A peptide chain has the repeating pattern: ...Cα-CO-NH-Cα-CO-NH-Cα-CO-NH... The Cα atoms carry the side chains; the backbone amide groups can hydrogen-bond.

The phi (φ) and psi (ψ) angles (rotations around N-Cα and Cα-C) are free to vary. The omega (ω) angle (around the peptide bond C-N) is locked at 180° (trans). Different φ/ψ combinations give different secondary structures (Section 33.5).

33.4 Solid-phase peptide synthesis (SPPS)

Solid-phase peptide synthesis (SPPS), developed by Bruce Merrifield (Nobel 1984), is the standard method for synthesizing peptides up to ~50 amino acids in length. It allows a peptide to be built one amino acid at a time on a solid resin support, with each addition being a separate chemical step.

The principle

A growing peptide chain is attached to an insoluble polymer (resin) bead. Each cycle: 1. Deprotection: remove the temporary protecting group from the chain's N-terminus. 2. Coupling: react with the next protected amino acid (its α-amine still protected, but its α-COOH activated). 3. Wash: remove excess reagents and byproducts. 4. Repeat.

After all amino acids are added, the peptide is cleaved from the resin with a strong acid (usually TFA), and any remaining side-chain protections are removed.

Boc strategy

Use Boc (tert-butyloxycarbonyl) to protect each amino acid's α-amine. - Activation: each amino acid's α-COOH is activated (e.g., as a hydroxysuccinimide ester, or with DCC/HBTU). - Coupling: the resin-bound peptide's free amine attacks the activated COOH → new peptide bond. - Deprotection: TFA (a moderately strong acid) removes Boc from the N-terminus, exposing the next amine for coupling.

Fmoc strategy

Use Fmoc (9-fluorenylmethyloxycarbonyl) instead of Boc. The advantages: - Fmoc is removed by base (piperidine), not acid. This means side-chain protecting groups (often acid-labile, like t-butyl or trityl) survive the Fmoc removal. - The whole synthesis can be milder.

The Fmoc strategy is now the dominant SPPS method for peptide drug synthesis.

A typical Fmoc cycle

Resin-bound peptide-Fmoc.
Treat with 20% piperidine in DMF: removes Fmoc, generates free α-amine.
Wash with DMF.
Add next amino acid (Fmoc-protected, with side chain protected if needed) and coupling reagent (HBTU + HOBt + base, or HATU + base).
Coupling: 5–60 minutes at room temperature.
Wash.
(Optional: capping with acetic anhydride to block any unreacted amines.)
Repeat for next amino acid.

After all coupling steps, cleave from resin with TFA (95% TFA + scavengers) → free peptide.

Examples of SPPS-made drugs

Insulin analogs: Humulin, Lantus, Novolog.
GLP-1 analogs: liraglutide (Victoza), semaglutide (Ozempic / Wegovy).
Octreotide: a somatostatin analog for acromegaly.
Oxytocin and vasopressin analogs: for various indications.
Bivalirudin: an anticoagulant.
Calcitonin: for osteoporosis.

The peptide drug market has grown dramatically in the last decade with the success of GLP-1 agonists (semaglutide alone is a > $20 billion/year drug as of 2024).

33.5 Protein structure levels

Proteins have a hierarchy of structure:

Primary structure: the amino acid sequence

The order of amino acids from N-terminus to C-terminus, written conventionally left to right. Encoded directly by DNA codons.

Example: insulin's A chain (one of two chains in mature insulin) has the sequence: GIVEQCCASVCSLYQLENYCN

(21 amino acids; the B chain has 30; both are linked by disulfide bonds.)

Secondary structure: local backbone conformations

Two main types, both formed by backbone hydrogen bonding (the C=O of one amide H-bonds to the N-H of another):

α-helix: a right-handed helix with ~3.6 amino acids per turn, ~5.4 Å pitch. Backbone H-bonds within the same helix (CO of residue i bonds to NH of residue i+4).
β-sheet: extended chains running side-by-side, with backbone H-bonds between adjacent strands. Can be parallel (both strands N-to-C in same direction) or antiparallel (opposite directions).

The Ramachandran plot (φ vs ψ angles) shows the allowed conformations: α-helix at φ ≈ -60°, ψ ≈ -45°; β-sheet at φ ≈ -120°, ψ ≈ +120°.

Tertiary structure: 3D fold of a single polypeptide

The 3D arrangement of a single polypeptide. Driven by: - Hydrophobic collapse: water excludes nonpolar side chains; they cluster in the protein interior. - Polar/charged residues prefer the surface (where they can hydrogen-bond with water). - Disulfide bonds (Cys-Cys, oxidative S-S formation) stabilize the structure. - Hydrogen bonds and salt bridges within the structure stabilize specific folds.

Most proteins fold to a unique 3D structure determined by their primary sequence.

Quaternary structure: assembly of multiple chains

Some proteins are composed of multiple polypeptide chains (subunits). Examples: - Hemoglobin (4 subunits: 2 α + 2 β). - Antibodies (4 chains: 2 heavy + 2 light). - Insulin (2 chains, A and B, linked by disulfide bonds).

The subunits are held together by non-covalent interactions (or by disulfide bonds in some cases).

33.6 Protein folding: thermodynamics

In the Anfinsen experiment (1961, Nobel 1972), ribonuclease A was denatured in 8 M urea + reduced disulfides, then renatured by removing urea and oxidizing the cysteines. The protein refolded to its native, active state — proving that the primary sequence determines the 3D structure.

The driving forces: 1. Hydrophobic effect: the dominant force. Hydrophobic side chains cluster in the protein interior, releasing water from their surfaces (an entropy gain). 2. Hydrogen bonding: backbone amides and polar side chains form hydrogen bonds that stabilize specific conformations. 3. Electrostatics: salt bridges (oppositely charged side chains) and electrostatic complementarity stabilize specific geometries. 4. Van der Waals: weaker but additive; the tightly-packed protein interior maximizes vdW interactions. 5. Disulfide bonds (in some proteins): covalent S-S bonds between Cys residues, set during the oxidative folding process.

The folded state (native) is typically only ~5–15 kcal/mol lower in energy than the unfolded state. This means proteins are marginally stable — easily denatured by heat, pH change, denaturants (urea, guanidinium), or other stresses.

Levinthal's paradox

If a protein could explore all possible conformations, it would take longer than the age of the universe to find the native fold (10^60 s for a 100-residue protein). Yet proteins fold in milliseconds to seconds. How?

Answer: proteins fold via partial fold pathways (folding intermediates and "molten globule" states) rather than random search. The energy landscape is funneled toward the native state. Ch 33's biophysical detail goes here.

Misfolding and disease

Many diseases involve protein misfolding: - Alzheimer's disease: amyloid-β plaques (misfolded peptide aggregates). - Parkinson's disease: α-synuclein aggregates. - Prion diseases (mad cow, Creutzfeldt-Jakob): misfolded prion protein recruits normal protein to misfold. - Cystic fibrosis: misfolded CFTR is degraded before reaching its functional location.

Drug discovery for these diseases targets the misfolding chemistry.

33.7 AlphaFold and the structure prediction revolution

For 60 years, predicting a protein's 3D structure from its sequence (the "protein folding problem") was the hardest open problem in biology. Computational methods improved slowly (CASP competitions tracked progress).

Then in 2020, AlphaFold 2 (DeepMind) achieved near-experimental accuracy on the CASP14 benchmark. AlphaFold uses a deep learning model trained on the Protein Data Bank (~200,000 known structures) to predict structures from sequence. The model integrates: - Multiple sequence alignment (homology information). - Coevolutionary signal (residues that mutate together likely contact each other). - Geometric reasoning (pairwise distances, dihedral angles).

By 2022, AlphaFold had predicted structures for ~200 million proteins — essentially every protein from every sequenced organism. The AlphaFold Database is freely available at https://alphafold.ebi.ac.uk/.

The impact: structural biology research that previously took years is now done in seconds. Drug discovery, vaccine design, and basic biology have been transformed. AlphaFold's authors (Demis Hassabis, John Jumper) shared the 2024 Nobel Prize in Chemistry for this work, with David Baker (Rosetta) for related computational methods.

33.8 Enzyme catalysis: the serine protease example

Enzymes are the catalysts of biology. Most catalyze reactions 10⁶ – 10¹² times faster than the same reactions in solution. The mechanism: bring substrates together, stabilize transition states, provide acid/base or nucleophilic catalysis using amino acid side chains.

Serine protease catalytic triad

The serine protease family (chymotrypsin, trypsin, elastase, etc.) shares a catalytic mechanism using three amino acids: - Ser: the nucleophile (its OH attacks the substrate carbonyl). - His: the general acid/base (its imidazole transfers protons). - Asp: the orienting/stabilizing residue (it positions His correctly; "the lever").

The mechanism (for hydrolysis of a peptide bond by chymotrypsin):

Substrate binds: the peptide enters the active site; the substrate carbonyl is positioned next to Ser195.
First nucleophilic attack: His (with help from Asp orientation) deprotonates Ser-OH; Ser-O⁻ attacks the substrate carbonyl C → tetrahedral intermediate.
Leaving group departs: the amine of the C-terminal half of the substrate leaves (His donates a proton to it). The acyl half is now covalently attached to Ser as an "acyl-enzyme intermediate."
Water enters: a water molecule binds in the now-empty space.
Second nucleophilic attack: His (with Asp's help) deprotonates water; HO⁻ attacks the acyl-Ser carbonyl → second tetrahedral intermediate.
Release: the Ser leaves; the carboxylic acid (the new C-terminus of the peptide's first half) is released; the enzyme is regenerated.

Net: one peptide bond hydrolyzed; enzyme is regenerated. Rate enhancement: ~10¹⁰ over background. The chemistry is exactly nucleophilic acyl substitution (Family II, Ch 26), but enzyme-catalyzed.

Mechanism Map 33.1: Serine protease catalysis.

The three-step rate enhancement: 1. Acid/base catalysis: His + Asp can deprotonate Ser to make a stronger nucleophile, and protonate the leaving group's amine. 2. Covalent catalysis: Ser becomes covalently attached to the substrate (acyl-enzyme), then is hydrolyzed by water in a second step. 3. Substrate binding/orientation: the substrate is oriented so that the Bürgi-Dunitz attack on the carbonyl is pre-organized.

Combined, these give 10⁹-10¹² rate enhancement, allowing peptide hydrolysis on ms timescale instead of years.

Other enzyme classes

The serine protease family is one of dozens of enzyme classes. Other examples: - Cysteine proteases (e.g., papain): use Cys-SH instead of Ser-OH as the nucleophile. - Aspartic proteases (e.g., HIV protease, pepsin): use two Asp side chains for general acid catalysis without a covalent intermediate. - Metalloproteases (e.g., thermolysin, MMPs): use a Zn²⁺ ion to activate water for hydrolysis. - Glycosidases: hydrolyze glycosidic bonds (Ch 32) using similar acid/base + nucleophilic catalysis. - Kinases (e.g., protein kinases): transfer a phosphate from ATP to a substrate hydroxyl.

Each enzyme class has a characteristic catalytic mechanism, but they all use amino acid side chains as the chemistry tools. Master the side-chain pKas, nucleophilicities, and binding properties, and you can rationalize most enzyme mechanisms.

33.9 Spectroscopy of proteins

Proteins are characterized by: - UV absorbance: Trp, Tyr, Phe absorb at ~280 nm. Used to quantify protein concentration. - Circular dichroism (CD): distinguishes α-helix vs β-sheet vs random coil by chiroptical signal. - NMR: ¹H, ¹³C, ¹⁵N spectra; can determine protein structure for proteins up to ~50 kDa. - Mass spectrometry: protein mass, peptide mapping (after tryptic digest), modifications. - X-ray crystallography: gold standard for atomic-resolution structure. - Cryo-electron microscopy (cryo-EM): increasingly used; can image larger complexes. - AlphaFold: prediction (no experiment needed) for sequence → structure.

33.10 Summary

The 20 standard amino acids: nonpolar (Gly, Ala, Val, Leu, Ile, Met, Pro), aromatic (Phe, Tyr, Trp), polar uncharged (Ser, Thr, Cys, Asn, Gln), positively charged (Lys, Arg, His), negatively charged (Asp, Glu).
All proteinogenic amino acids except glycine are L-configured at the α-C.
Zwitterion: at neutral pH, the α-COOH is deprotonated (-COO⁻) and α-NH₂ is protonated (-NH₃⁺); net charge zero.
Isoelectric point (pI): the pH at which net charge is zero. For neutral side chains, pI ≈ (pKa of COOH + pKa of NH₃⁺) / 2.
Peptide bond = amide between two amino acids. Forms by activation (e.g., DCC) of the α-COOH and attack by the α-amine.
Solid-phase peptide synthesis (SPPS) uses Boc or Fmoc protection on a resin bead. Fmoc strategy with HBTU/HATU coupling is now standard.
Protein structure levels: primary (sequence), secondary (α-helix, β-sheet), tertiary (3D fold), quaternary (subunit assembly).
Protein folding is mostly thermodynamic; driven by hydrophobic collapse, supplemented by H-bonding, electrostatics, and disulfide bonds.
AlphaFold (2020-2022) revolutionized structure prediction; structures are now available for ~200 million proteins.
Enzyme catalysis uses amino acid side chains. Serine proteases use Ser/His/Asp catalytic triad to hydrolyze peptide bonds via nucleophilic acyl substitution + acid/base + covalent catalysis.
Misfolding causes disease: Alzheimer's, Parkinson's, prion diseases, CFTR cystic fibrosis.
Peptide drugs (insulin, GLP-1 analogs, octreotide, bivalirudin) are made by SPPS at industrial scale.

Chapter 34 turns to lipids and biosynthesis — the third great class of biomolecules.