Appendix C: Environment Setup Guide
When to use this appendix: Chapter 2 walks you through setting up your data science toolkit step by step. This appendix provides additional detail for edge cases, platform-specific instructions, and troubleshooting common problems. If everything in Chapter 2 went smoothly, you may never need this page. If something went wrong, start here.
C.1 Installing Anaconda
Anaconda is a free distribution of Python that includes all the data science libraries used in this book (NumPy, pandas, matplotlib, seaborn, scikit-learn, and Jupyter). It also includes conda, a package and environment manager that handles dependencies so you do not have to.
Windows
- Download the Anaconda Individual Edition installer from anaconda.com/download. Choose the 64-bit version (the only option for modern machines).
- Run the installer. Accept the license agreement.
- Choose "Just Me" unless you have a specific reason to install for all users (which requires admin privileges).
- Installation directory. Accept the default (
C:\Users\YourName\anaconda3) unless your username contains spaces or non-ASCII characters. If it does, install to a path without spaces, such asC:\anaconda3. - Add to PATH. The installer asks whether to add Anaconda to your system PATH. The official recommendation is "No" (use Anaconda Prompt instead), but if you plan to use the standard Command Prompt or PowerShell, checking "Yes" is convenient. If you skip this, you must use the Anaconda Prompt for all commands.
- Verify. Open Anaconda Prompt (find it in your Start menu) and type:
conda --version python --versionYou should see version numbers (e.g.,conda 24.x.xandPython 3.12.x).
macOS
- Download the installer from anaconda.com/download. Choose the graphical installer (.pkg) for simplicity.
- Run the installer. Follow the prompts, accepting defaults.
- Shell initialization. The installer will offer to initialize conda for your shell. Say yes. This modifies your
~/.zshrc(or~/.bash_profileon older macOS) so that thecondacommand works in Terminal. - Verify. Open Terminal and type:
conda --version python --version - If
condais not recognized after installation, run:~/anaconda3/bin/conda init zshThen close and reopen Terminal.
Linux
- Download the Linux installer (
.shfile) from anaconda.com/download. - Run the installer from a terminal:
bash Anaconda3-2024.xx-Linux-x86_64.sh(Replace the filename with the actual file you downloaded.) - Accept the license. Press Enter to scroll through, then type
yes. - Accept the default location (
~/anaconda3). - Initialize conda when asked. Type
yes. - Restart your terminal or run
source ~/.bashrc. - Verify with
conda --versionandpython --version.
C.2 Setting Up JupyterLab
JupyterLab comes pre-installed with Anaconda. To launch it:
jupyter lab
This opens JupyterLab in your default web browser. If nothing happens, check the terminal output for a URL that looks like http://localhost:8888/lab?token=... and paste it into your browser manually.
Useful JupyterLab settings
- Autosave: JupyterLab autosaves every 120 seconds by default. You can also press
Ctrl+S(Windows/Linux) orCmd+S(macOS) at any time. - Dark mode: Settings menu > Theme > JupyterLab Dark.
- Line numbers: View menu > Show Line Numbers.
- Increase font size: Settings menu > Theme > Increase Content Font Size (or
Ctrl+Shift+=).
Essential keyboard shortcuts
| Action | Windows/Linux | macOS |
|---|---|---|
| Run cell and move down | Shift+Enter |
Shift+Enter |
| Run cell and stay | Ctrl+Enter |
Cmd+Enter |
| Insert cell below | B (in command mode) |
B |
| Insert cell above | A (in command mode) |
A |
| Delete cell | D, D (press D twice in command mode) |
D, D |
| Switch to command mode | Esc |
Esc |
| Switch to edit mode | Enter |
Enter |
| Save | Ctrl+S |
Cmd+S |
| Restart kernel | 0, 0 (in command mode) |
0, 0 |
| Toggle comment | Ctrl+/ |
Cmd+/ |
C.3 Managing Environments with conda
An environment is an isolated collection of packages. Using environments prevents package conflicts between projects.
# Create a new environment with a specific Python version
conda create --name ds-book python=3.12
# Activate the environment
conda activate ds-book
# Install packages
conda install numpy pandas matplotlib seaborn scikit-learn jupyterlab
# Install a package from conda-forge (a community channel with more packages)
conda install -c conda-forge plotly
# List installed packages
conda list
# List all environments
conda env list
# Deactivate (return to base)
conda deactivate
# Remove an environment
conda env remove --name ds-book
# Export environment to file (for sharing/reproducibility)
conda env export > environment.yml
# Recreate environment from file
conda env create -f environment.yml
conda vs. pip
Both install Python packages. Here are the guidelines:
- Use conda first for packages available through conda. It handles non-Python dependencies (C libraries, etc.) that pip cannot.
- Use pip only for packages not available through conda (e.g., some newer or niche packages).
- Do not mix them casually. If you must use pip inside a conda environment, install all conda packages first, then use pip. Running conda after pip can sometimes overwrite packages in unpredictable ways.
- pip command inside conda: Use
pip install package_namewhile the conda environment is active.
C.4 Google Colab as an Alternative
If you cannot install Anaconda (company laptop restrictions, Chromebook, limited disk space), Google Colab is a free cloud-based Jupyter environment.
Getting started with Colab
- Go to colab.research.google.com.
- Sign in with a Google account.
- Click "New Notebook."
- You now have a Jupyter-style notebook running on Google's servers.
Key differences from local Jupyter
| Feature | Local Jupyter | Google Colab |
|---|---|---|
| Requires installation | Yes | No |
| Internet required | No | Yes |
| Pre-installed libraries | Depends on your setup | NumPy, pandas, matplotlib, scikit-learn, seaborn all included |
| File access | Direct access to local files | Must upload files or mount Google Drive |
| GPU access | Requires local GPU | Free GPU/TPU available (limited) |
| Session persistence | Indefinite | Sessions disconnect after idle time (~90 min) |
| Customization | Full control | Limited |
Uploading data to Colab
# Option 1: Upload from your computer
from google.colab import files
uploaded = files.upload() # Opens a file picker dialog
# Option 2: Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
# Then access files at /content/drive/MyDrive/...
# Option 3: Download from URL
import pandas as pd
df = pd.read_csv("https://example.com/data.csv")
Installing additional packages in Colab
!pip install plotly # Use ! prefix for shell commands in Colab
!pip install ibm-watson # Install any pip package
C.5 Troubleshooting Common Problems
"conda is not recognized as a command"
Windows: You are using Command Prompt or PowerShell, but Anaconda was not added to PATH. Solutions:
1. Use Anaconda Prompt instead (find it in the Start menu).
2. Or add Anaconda to your PATH manually: search for "Edit the system environment variables" in Windows settings, click "Environment Variables," find Path under User variables, and add C:\Users\YourName\anaconda3 and C:\Users\YourName\anaconda3\Scripts.
macOS/Linux: conda was not initialized for your shell. Run:
~/anaconda3/bin/conda init zsh # macOS
~/anaconda3/bin/conda init bash # Linux
Then close and reopen your terminal.
"Kernel not found" or "No kernel named python3"
This happens when JupyterLab cannot find a Python kernel for your notebook.
# Install the IPython kernel in your environment
conda activate ds-book
conda install ipykernel
python -m ipykernel install --user --name ds-book --display-name "Python (ds-book)"
Then restart JupyterLab and select the new kernel from the kernel picker.
Package conflicts ("Solving environment: failed")
When conda cannot find a compatible set of package versions:
# Try installing from conda-forge
conda install -c conda-forge problematic-package
# Or create a fresh environment
conda create --name fresh-env python=3.12 numpy pandas matplotlib
# Nuclear option: update conda itself
conda update conda
"ModuleNotFoundError: No module named 'pandas'"
This usually means you are running Python from a different environment than the one where pandas is installed.
# Check which Python is running
which python # macOS/Linux
where python # Windows
# Check which environment is active
conda env list # The asterisk (*) marks the active one
# Activate the right environment
conda activate ds-book
Jupyter notebook is slow or unresponsive
- Restart the kernel: Kernel menu > Restart Kernel. This clears all variables from memory.
- Clear outputs: Edit menu > Clear All Outputs. Large outputs (especially images) consume memory.
- Check for infinite loops: If a cell has been running for too long, click the stop button or press
I, Iin command mode. - Large datasets: If your dataset is very large (millions of rows), consider working with a sample while developing your analysis:
df_sample = df.sample(10000).
"Permission denied" errors
- Windows: Run Anaconda Prompt as Administrator (right-click > "Run as administrator").
- macOS/Linux: Do not use
sudowith conda. If you installed Anaconda withsudoby accident, reinstall it under your user account.
SSL certificate errors
This can happen behind corporate firewalls or VPNs:
# Tell conda to not verify SSL (temporary workaround)
conda config --set ssl_verify false
# Better: install your company's CA certificate bundle
conda config --set ssl_verify /path/to/certificate.pem
Updating everything
# Update conda itself
conda update conda
# Update all packages in the current environment
conda update --all
# Update a specific package
conda update pandas
C.6 Recommended Directory Structure
For the exercises and projects in this book, we recommend organizing your files like this:
data-science-book/
data/
raw/ # Original, unmodified data files
processed/ # Cleaned and transformed data
notebooks/
ch01/
ch02/
...
scripts/ # Reusable Python scripts
output/ # Saved figures, reports
environment.yml # Conda environment specification
README.md # Project description
Create this structure at the start of the course and stick with it. In Chapter 33, you will learn to put it under version control with git.
C.7 Verifying Your Setup
Run this cell in a Jupyter notebook to verify that all the packages used in this book are installed and working:
import sys
print(f"Python version: {sys.version}")
import numpy as np
print(f"NumPy version: {np.__version__}")
import pandas as pd
print(f"pandas version: {pd.__version__}")
import matplotlib
print(f"matplotlib version: {matplotlib.__version__}")
import seaborn as sns
print(f"seaborn version: {sns.__version__}")
import sklearn
print(f"scikit-learn version: {sklearn.__version__}")
import scipy
print(f"SciPy version: {scipy.__version__}")
print("\nAll core packages imported successfully.")
If any import fails, install the missing package:
conda install package-name
If you encounter a problem not covered here, search the error message online. Stack Overflow and the official conda documentation at docs.conda.io are excellent resources. And remember: every experienced data scientist has spent hours debugging environment issues. You are not alone.