Appendix E — Computational Chemistry Software Setup

Reference for the computational exercises across the book, plus enough method-selection guidance to extend your work beyond what we assigned. Computational chemistry is now standard infrastructure in modern organic — treat it as a sixth spectroscopic tool.


1. Why computational chemistry matters in modern organic

Three concrete uses appear throughout this book:

  • Predicting reactivity — HOMO/LUMO surfaces (Ch 2, 19), electrostatic potential maps (Ch 3), Fukui indices for site selectivity in EAS (Ch 21).
  • Transition state energies — locating TS structures to rationalize selectivity (Ch 10 SN2 vs SN1 partitioning, Ch 19 endo/exo Diels-Alder, Ch 39 sigmatropic stereospecificity).
  • NMR shift prediction — DFT GIAO calculations to assign ambiguous diastereomers and natural product structures (Ch 6, Ch 38). Modern GIAO/DP4+ analysis frequently distinguishes regio- and stereoisomers when experimental NMR is ambiguous.

Other routine uses: IR frequency assignment, conformational scanning, pKa estimation, dipole moment, partial charges, NBO bonding analysis, NCI (non-covalent interaction) surfaces.

Limits to remember: computational chemistry estimates. Errors are typically ±2 kcal/mol for DFT on organic systems, ±0.1-0.3 ppm for ¹H NMR shifts, ±5 ppm for ¹³C. Reliability scales with system size and method cost — pick the method to match the question.


2. Tool tiers

Free / open source

Tool Purpose Notes
Avogadro GUI builder + visualizer + FF optimization Drives most of the book's computational exercises
ORCA DFT, MP2, CCSD(T), TS search Free for academic use; binaries from orcaforum.kofo.mpg.de
NWChem Multi-purpose QM, plane-wave DFT DOE-supported; strong for solid state, weaker UX
PSI4 Python-driven QM Excellent for scripting; pip-installable
xTB / CREST GFN-xTB semi-empirical Fast conformer searches; Grimme group
RDKit Cheminformatics, 2D/3D, FF Python; used in our scripts/generate_structures.py
Open Babel File format conversion, FF CLI swiss-army knife
Jmol / PyMOL / VMD Visualization PyMOL excellent for protein-ligand views

Academic (license required)

Tool Strength
Gaussian Industry standard; widest method coverage; GaussView GUI
GAMESS Free for academic, written request; strong MCSCF/CASSCF
Spartan Best teaching GUI; bundled curricula
Schrödinger (Jaguar, Glide, Maestro) Pharma-grade docking + QM
Q-Chem Strong excited-state, range-separated DFT
Molpro High-accuracy correlated methods

Web / cloud

  • WebMO — browser front-end to Gaussian/GAMESS/Q-Chem; departmental install
  • Chemcraft — Windows visualization for output files (free for non-commercial)
  • IQmol — free GUI tied to Q-Chem
  • Chemcompute.org — free shared GAMESS/PSI4 access for students

For this book's exercises, the free stack (Avogadro + ORCA + RDKit) covers every problem.


3. Setting up Avogadro

avogadro.cc. Versions referenced here: Avogadro 1.2.x (classic, stable) and Avogadro 2.0+ (rewrite, faster, plugin-based). The book's screenshots use 1.2.x; 2.0 has identical menus for the operations we need.

Windows

  1. Download the .exe installer from avogadro.cc.
  2. Run installer. Default install path: C:\Program Files\Avogadro.
  3. Launch from Start menu.

macOS

  1. Download .dmg.
  2. Drag Avogadro.app to /Applications.
  3. On first launch: right-click → Open (bypasses Gatekeeper warning).

Linux

sudo apt install avogadro          # Debian/Ubuntu
sudo dnf install avogadro          # Fedora
flatpak install flathub cc.avogadro.Avogadro2   # Avogadro 2 via Flatpak

First-run sanity check

  1. Click empty canvas → places C atom.
  2. Drag → forms bond to new C.
  3. Add Hs: Build → Add Hydrogens (or toolbar H button).
  4. Optimize: Extensions → Optimize Geometry (default UFF).
  5. Energy reported in status bar.

Force-field choice within Avogadro

  • UFF — Universal; covers full periodic table; less accurate for organics.
  • MMFF94 — Best default for closed-shell organics. Use this.
  • GAFF — Better for biomolecules and ligands.
  • Ghemical — Legacy; avoid.

4. Installing ORCA

orcaforum.kofo.mpg.de. Free for academic and personal non-commercial use; registration required before download. Commercial users need a license. Versions current at time of writing: ORCA 6.x.

Download

  1. Register on the ORCA forum.
  2. Download platform binary (Windows: .zip; Linux: .tar.xz; macOS: ARM/x86 builds).
  3. Extract to a permanent path, e.g. C:\orca\ or /opt/orca/.

Environment variables (Windows PowerShell)

$env:Path = "C:\orca;$env:Path"
$env:OMP_NUM_THREADS = "4"

Add the same Path entry permanently in System Properties → Environment Variables.

Linux/macOS .bashrc or .zshrc

export PATH=/opt/orca:$PATH
export LD_LIBRARY_PATH=/opt/orca:$LD_LIBRARY_PATH
export OMP_NUM_THREADS=4

First input file — water DFT optimization

File water.inp:

! B3LYP def2-SVP Opt
%pal nprocs 4 end
* xyz 0 1
O   0.000000   0.000000   0.117790
H   0.000000   0.755453  -0.471161
H   0.000000  -0.755453  -0.471161
*

Run:

orca water.inp > water.out

Output appears in water.out. Optimized geometry in water.xyz; orbitals in water.gbw; thermochemistry in the .out tail.

Opt + Freq + single-point (typical workflow)

! B3LYP def2-SVP Opt Freq
! B3LYP def2-TZVP

%pal nprocs 8 end
%maxcore 3000

* xyz 0 1
[coordinates]
*

The second ! line is read as a single-point on the optimized geometry using a larger basis — the standard "optimize cheap, refine energy" pattern.


5. Setting up RDKit in Python

conda create -n rdkit-env python=3.11
conda activate rdkit-env
conda install -c conda-forge rdkit

Pip

pip install rdkit

The pure-pip wheel works on Linux/macOS/Windows since RDKit 2022.09.

Minimum example — SMILES to 3D to FF-optimized

from rdkit import Chem
from rdkit.Chem import AllChem, Draw

# 2-bromobutane
mol = Chem.MolFromSmiles('CCC(C)Br')
mol = Chem.AddHs(mol)

# Embed 3D coords using ETKDG (Riniker-Landrum)
AllChem.EmbedMolecule(mol, AllChem.ETKDGv3())

# Optimize with MMFF94
AllChem.MMFFOptimizeMolecule(mol)

# Write .xyz for ORCA / Avogadro
Chem.MolToXYZFile(mol, 'butane.xyz')

# 2D depiction
Draw.MolToFile(mol, 'butane.png', size=(400, 400))

Conformer search (ETKDG)

from rdkit import Chem
from rdkit.Chem import AllChem

mol = Chem.AddHs(Chem.MolFromSmiles('CC(C)(C)OC1CCCCC1'))
params = AllChem.ETKDGv3()
params.numThreads = 4
cids = AllChem.EmbedMultipleConfs(mol, numConfs=50, params=params)

# Optimize each conformer, collect energies
results = AllChem.MMFFOptimizeMoleculeConfs(mol, numThreads=4)
energies = [(cid, e) for cid, (status, e) in zip(cids, results) if status == 0]
energies.sort(key=lambda x: x[1])
best_cid, best_e = energies[0]
print(f"Lowest-E conformer: id={best_cid}, E={best_e:.3f} kcal/mol")

6. Method selection guide

Method class When to use Typical cost (relative) Comment
MM / Force field (MMFF, UFF, GAFF, OPLS) Conformer searches, large molecule pre-opt, FF-relevant scans 1 No electronic structure — no bond breaking, no excited states
Semi-empirical (PM6, PM7, GFN2-xTB, AM1) Geometry pre-opt of 100-1000 atom systems; rough TS scans 10-100 GFN2-xTB now competitive with low-cost DFT for organics
DFT The default for organic chemistry — energies, geometries, IR, NMR, TS 10³-10⁴ See functional table below
MP2 When DFT fails for dispersion or anion stability; small benchmarks 10⁴-10⁵ Scales O(N⁵); double-hybrid DFT is often better value
CCSD(T) Gold standard for small-molecule energies, benchmarks 10⁶+ Practical limit ~20 heavy atoms; basis set extrapolation typical

DFT functional choices

Functional Use case
B3LYP Generic workhorse; underestimates dispersion — pair with D3(BJ) or D4 correction
B3LYP-D3(BJ) Add empirical dispersion to B3LYP — now near-default for organic geometries
ωB97X-D Range-separated + dispersion; excellent for thermochemistry, kinetics
M06-2X Truhlar's functional, strong for kinetics, noncovalent interactions
PBE0 Hybrid GGA, fast, reliable
B97-3c, r²SCAN-3c "3c" composite methods (Grimme) — DFT + basis + corrections bundled, cheap and accurate
TPSS, M06-L Pure (non-hybrid) functionals — cheaper, OK for geometries
DLPNO-CCSD(T) Local CC — extends CCSD(T) accuracy to 100+ atoms

For 90% of organic questions in this book: B3LYP-D3(BJ)/def2-SVP for geometry + frequencies, ωB97X-D/def2-TZVP for single-point energies. For NMR: mPW1PW91/6-311+G(2d,p) GIAO is a widely cited recipe.


7. Basis set guide

Basis Quality Use case
STO-3G Minimal Pedagogy only — don't publish
3-21G, 6-31G Split-valence Quick scans, very rough
6-31G(d) = 6-31G* Polarization on heavies Old default; OK for geometry
6-31+G(d) + diffuse on heavies Anions, lone-pair-heavy systems
6-311+G(d,p) Triple-zeta, polarization on H + heavies, diffuse Standard energy/property basis
def2-SVP Karlsruhe split-valence + polarization Modern default geometry
def2-TZVP Triple-zeta + polarization Energies, thermochemistry
def2-TZVPP Larger polarization Tighter benchmarks
def2-QZVPP Quadruple-zeta Near-CBS benchmarks
cc-pVDZ, cc-pVTZ, cc-pVQZ Dunning correlation-consistent CCSD(T) extrapolations
aug-cc-pVTZ + diffuse Anions, polarizabilities

Heuristic: def2-SVP for geometry, def2-TZVP for energy, add diffuse functions when treating anions, hydrogen-bonded clusters, or excited states.


8. Common calculation types

Type ORCA keyword What you get
Geometry optimization Opt Local minimum on PES; .xyz of optimized structure
Frequency Freq Vibrational frequencies, IR intensities, ZPE, thermochemistry (S, H, G)
Single-point energy (no Opt) Electronic energy at fixed geometry
Transition state OptTS (with Hess: NumFreq) Saddle point; exactly one imaginary frequency
QST2 / QST3 (Gaussian) TS interpolated between two/three reference structures
IRC IRC Reaction path forward + backward from TS to reactants/products
NMR shielding NMR Isotropic shielding tensors; subtract from TMS reference to get δ
NBO %nbo block Natural bond orbital populations, hyperconjugation analysis
NCI %plots NCI true end Non-covalent interaction surfaces (Yang)
Excited state (TD-DFT) ! TD-DFT Vertical excitation energies, oscillator strengths
Solvation ! CPCM(water) Implicit solvent correction (PCM, SMD, COSMO)

TS validation: always run Freq after OptTS and check that exactly one imaginary frequency is present and its mode visually corresponds to the bond making/breaking motion. Then run IRC to verify it connects the intended reactant and product.


9. Computational exercise solutions (cross-reference)

Exercise Chapter Method Expected result
Methane, ethane build Ch 1 Avogadro UFF Tₐ at C; C-H 1.09 Å; H-C-H 109.5°
Ethane rotation barrier Ch 1 MMFF94 dihedral scan ~2.9 kcal/mol staggered → eclipsed
Ethylene HOMO/LUMO Ch 2 Avogadro ext. or ORCA HF/STO-3G HOMO = π; LUMO = π*; nodes at midpoint of C-C
Electrostatic potential, HCl Ch 3 DFT B3LYP/6-31G(d) δ⁻ red on Cl, δ⁺ blue on H
Cyclohexane chair Ch 5 MMFF94 opt Chair lower than twist-boat by ~5 kcal/mol
Methyl A-value Ch 5 MMFF94 axial vs equatorial ΔE ≈ 1.7 kcal/mol (eq favored)
IR prediction of acetone Ch 6 B3LYP/6-31G(d) Freq C=O stretch ~1715 cm⁻¹ (after 0.96 scale)
¹H NMR of toluene Ch 6 mPW1PW91/6-311+G(2d,p) GIAO δ ~7.2 (aryl), 2.3 (CH₃)
SN2 TS for Cl⁻ + CH₃Br Ch 10 ωB97X-D/def2-TZVP OptTS Trigonal bipyramidal C; one imag freq ~-450 cm⁻¹
Carbocation stabilities Ch 11 DFT isodesmic 3° < 2° < 1° < methyl (relative E)
Diels-Alder endo/exo Ch 19 M06-2X/def2-SVP endo TS ~1-2 kcal/mol lower
Aromatic substitution σ⁺ Ch 21 B3LYP charges, Fukui Para preferred for OMe; meta for NO₂
Aldol TS Zimmerman-Traxler Ch 28 B3LYP TS Chair TS with Z-enolate gives syn product

For each: build in Avogadro → preoptimize MMFF94 → export .xyz → wrap with ORCA input → run → analyze.


10. Sanity-checking results

Computational chemistry rewards skepticism. Run through this checklist on any new result before trusting it:

Symptom Cause Fix
Multiple imaginary frequencies after Opt Not a minimum — saddle point Follow imag mode, reoptimize
One imaginary freq after Opt (not TS) Spurious low-mode from FF history, or true second-order saddle Tighten convergence (! TightOpt); displace and reopt
No imaginary freq after OptTS Not a TS Restart from displaced TS guess
SCF convergence failure Bad initial guess; near-degeneracy Use ! SlowConv or ! VeryTightSCF; try different starting orbitals (! NoIter then read)
Wildly wrong bond length (>0.1 Å off) Symmetry constraint accidentally imposed, or wrong charge/multiplicity Check * line: charge, multiplicity correct?
Energy off by ~30 kcal/mol vs expectation Forgot dispersion correction on B3LYP Use B3LYP-D3(BJ) or switch functional
Negative atomic charges nonsensical Mulliken artifact Use Hirshfeld, CM5, or NBO charges instead
Frequencies look fine but G° wrong Forgot to include solvation; default T = 298.15 K Add ! CPCM(solvent); check ! Print[P_Thermo] 1
TS connects wrong reactants/products Wrong saddle Run IRC; if wrong, restart from better guess
"Wrong" stereochemistry from opt Local minimum, not global Run conformer search first (xTB/CREST or RDKit ETKDG)

Always visualize the result. A geometry that converges to the right energy but looks distorted is usually a bug.


11. Reading the literature

When citing computational results, papers report at minimum: - Method — functional + basis set, e.g., "B3LYP-D3(BJ)/def2-TZVP//B3LYP-D3(BJ)/def2-SVP" (energy basis // geometry basis). - Software + version — "ORCA 5.0.4" or "Gaussian 16, rev C.01." - Solvation model — "SMD(toluene)" or "gas phase." - Thermal corrections — "Gibbs free energies at 298.15 K, 1 atm."

Typical error bars for organic chemistry DFT: - Bond lengths: ±0.01-0.02 Å - Bond angles: ±1-2° - Reaction enthalpies: ±2-3 kcal/mol (good functional); ±5+ (poor choice) - Activation barriers: ±1-3 kcal/mol - ¹H NMR shifts: ±0.1-0.3 ppm after referencing - ¹³C NMR shifts: ±2-5 ppm - IR frequencies: scale factor ~0.96-0.97 needed; ±20 cm⁻¹ residual

Foundational papers to cite for method validation: - Becke 1993; Lee, Yang, Parr 1988 (B3LYP) - Grimme et al. 2010 / 2011 (D3, BJ damping) - Weigend & Ahlrichs 2005 (def2 basis sets) - Zhao & Truhlar 2008 (M06 family) - Chai & Head-Gordon 2008 (ωB97X-D) - Riniker & Landrum 2015 (ETKDG) - Grimme et al. 2017 (GFN-xTB)


Computational chemistry is fast, cheap, and increasingly trustworthy — but only when you understand what you asked the computer to do.