Chapter 7: Strings: Text Processing and Manipulation

Contributors

16 min read

> "The string is the duct tape of programming — it holds everything together."

Learning Objectives

Access individual characters and substrings using indexing and slicing
Explain why strings are immutable and work effectively within that constraint
Use essential string methods (split, join, strip, replace, find, upper, lower, startswith, endswith, count) to process text
Iterate over strings character by character, word by word, and line by line
Validate user input using string inspection methods (isdigit, isalpha, isalnum)
Format output precisely using f-string format specifications
Use escape characters and raw strings for special text handling

In This Chapter

Chapter Overview
7.1 Strings Are Everywhere
7.2 String Indexing
7.3 String Slicing
7.4 Strings Are Immutable
7.5 Essential String Methods
7.6 Processing Text
7.7 Input Validation with Strings
7.8 Advanced Formatting with f-Strings
7.9 Escape Characters and Raw Strings
7.10 Project Checkpoint: TaskFlow v0.6
🐛 Debugging Walkthrough: Immutability TypeError
Chapter Summary

Chapter 7: Strings: Text Processing and Manipulation

"The string is the duct tape of programming — it holds everything together." — Practical programmer wisdom

Chapter Overview

Think about the last application you used. Maybe you searched for something on Google, sent a text message, filled out a form, or scrolled through social media. Every single one of those interactions involved text — strings of characters being parsed, validated, transformed, compared, and displayed.

Strings are the most common data type in real-world software. Not integers. Not floating-point numbers. Strings. Web servers parse URL strings. Databases store and query text fields. Machine learning pipelines clean messy text data before analysis. Bioinformaticians process DNA sequences that are just long strings of A, C, G, and T. Every command-line tool you've ever used takes string arguments and produces string output.

You've been using strings since Chapter 2, of course — every print() call, every input() prompt, every f-string you formatted. But we've been treating strings like simple containers for text. In this chapter, we crack them open and learn what they can really do.

This chapter introduces one of Python's most important concepts: immutability. Understanding that strings cannot be changed in place — only replaced with new strings — is a threshold concept that reshapes how you think about data. It trips up nearly every beginner, and it's the foundation for understanding how Python handles data more broadly.

In this chapter, you will learn to: - Access individual characters and substrings using indexing and slicing - Explain why strings are immutable and work effectively within that constraint - Use essential string methods to process text like a professional - Iterate over strings in multiple ways for different tasks - Validate user input before your program tries to use it - Format output with precise alignment, decimal places, and separators - Handle special characters with escape sequences and raw strings

🔄 Spaced Review: This chapter builds directly on Chapter 5 (loops — you'll iterate over strings) and Chapter 3 (type casting — remember str()?). You'll also use the functions you learned to write in Chapter 6 throughout every example.

🏃 Fast Track: If you're comfortable with basic string indexing and just need methods and formatting, skim 7.2-7.3 and jump to 7.5.

🔬 Deep Dive: The case studies for this chapter explore FASTA file parsing (bioinformatics) and how everyday apps like autocorrect, search engines, and spam filters rely on string processing.

7.1 Strings Are Everywhere

Let's start with a question: what percentage of code in a typical web application deals with strings?

The answer varies, but it's shockingly high — often 60-70% of the logic involves string operations. Parsing HTTP headers, validating email addresses, sanitizing user input, constructing SQL queries, formatting dates, generating HTML, processing JSON. Strings are the universal interface between systems, between users and programs, and between different parts of the same program.

Here's a quick tour of strings in the wild:

# Bioinformatics: a DNA sequence is just a string
dna = "ATCGATCGATCG"

# Web development: URLs are strings that encode routing information
url = "https://example.com/users/42/profile?tab=settings"

# Data science: CSV files are strings with structure
csv_row = "Patel,Anika,Biology,University of Michigan"

# System administration: log files are strings with timestamps
log_entry = "2025-03-14 08:23:17 ERROR Database connection timeout"

# Natural language processing: all text starts as a string
tweet = "Just finished my CS homework! #python #coding"

Every one of these examples requires different string operations. By the end of this chapter, you'll know how to handle all of them.

Dr. Anika Patel — the biology researcher you met back in Chapter 1 — works with DNA sequence files in a format called FASTA. Each sequence has a header line starting with > followed by lines of nucleotide characters. Her daily work is essentially string processing: parsing headers, counting nucleotides, searching for patterns. We'll use her work as a running example throughout this chapter.

7.2 String Indexing

A string is a sequence of characters. Each character sits at a numbered position called an index. Python uses zero-based indexing, which means the first character is at index 0, not index 1.

gene = "ATCGATCG"
#       01234567

print(gene[0])    # Output: A
print(gene[1])    # Output: T
print(gene[4])    # Output: A
print(gene[7])    # Output: G

Why zero-based? It's a convention inherited from C and the way memory addresses work — the index represents the offset from the start of the string. The first character has zero offset. You'll get used to it, and eventually it'll feel natural.

Negative Indexing

Python offers a slick shortcut for counting from the end: negative indices. Index -1 is the last character, -2 is second-to-last, and so on.

gene = "ATCGATCG"
#       01234567
#      -8      -1

print(gene[-1])   # Output: G   (last character)
print(gene[-2])   # Output: C   (second to last)
print(gene[-8])   # Output: A   (same as gene[0])

This is genuinely useful. When you need the last character of a string and you don't know (or don't care) how long it is, my_string[-1] is cleaner than my_string[len(my_string) - 1].

IndexError: Going Out of Bounds

What happens when you try to access an index that doesn't exist?

gene = "ATCGATCG"  # 8 characters, indices 0–7
print(gene[8])      # IndexError: string index out of range

Python raises an IndexError. This is a common mistake, especially when you forget that a string of length n has valid indices 0 through n-1. If gene has 8 characters, the last valid index is 7, not 8.

🐛 Debugging Tip: When you see IndexError: string index out of range, check two things: (1) Is your index off by one? (2) Is the string shorter than you expected? Print len(your_string) to verify.

🔄 Check Your Understanding #1

What does the following code print?

message = "Hello, World!"
print(message[7])
print(message[-6])

Answer

`W` and `W`. Index 7 counts from the start (H=0, e=1, l=2, l=3, o=4, ,=5, space=6, W=7). Index -6 counts from the end (!=−1, d=−2, l=−3, r=−4, o=−5, W=−6). They're the same character.

7.3 String Slicing

Indexing gets you one character. Slicing gets you a substring — a piece of the original string. The syntax is string[start:stop:step].

greeting = "Hello, World!"

print(greeting[0:5])     # Output: Hello
print(greeting[7:12])    # Output: World
print(greeting[:5])      # Output: Hello    (start defaults to 0)
print(greeting[7:])      # Output: World!   (stop defaults to end)
print(greeting[:])       # Output: Hello, World!  (copy entire string)

The critical rule: the start index is inclusive, but the stop index is exclusive. greeting[0:5] gives you characters at indices 0, 1, 2, 3, 4 — five characters total. This feels odd at first, but it has a nice property: the length of the slice equals stop - start.

The Step Parameter

The optional third parameter controls the step size — how many positions to advance between each character:

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

print(alphabet[0:10:2])   # Output: ACEGI   (every other character)
print(alphabet[::3])      # Output: ADGJMPSVY  (every third character)
print(alphabet[::-1])     # Output: ZYXWVUTSRQPONMLKJIHGFEDCBA  (reversed!)

That last one — [::-1] — is the classic Python idiom for reversing a string. You'll see it everywhere.

Common Slicing Patterns

Here are patterns you'll use constantly:

filename = "experiment_results_2025.csv"

# Get file extension
extension = filename[-4:]        # ".csv"

# Get filename without extension
name_only = filename[:-4]        # "experiment_results_2025"

# Get first n characters
first_ten = filename[:10]        # "experiment"

# Get last n characters
last_eight = filename[-8:]       # "2025.csv"

Slicing Never Raises IndexError

Here's something that surprises most beginners: slicing with out-of-range indices doesn't crash.

short = "Hi"
print(short[0:100])    # Output: Hi   (no error!)
print(short[50:100])   # Output:      (empty string, no error)

Python simply returns whatever characters fall within the requested range. This is different from indexing — short[100] would raise an IndexError, but short[0:100] quietly returns all available characters. This is by design; it makes slicing more forgiving when you don't know the exact length of your data.

🐛 Debugging Walkthrough: Off-by-One in Slicing

A student writes code to extract the area code from a phone number:

python phone = "(555) 867-5309" area_code = phone[1:3] print(area_code) # Output: 55 — wrong! Expected 555

The bug: they wanted characters at indices 1, 2, and 3, but phone[1:3] only gives indices 1 and 2. The stop index is exclusive. The fix: phone[1:4].

This off-by-one error is the most common slicing mistake. When you're not getting the characters you expect, count the indices on your fingers and remember: stop is exclusive.

7.4 Strings Are Immutable

Here's the moment that trips up every beginner. You have a string, and you want to change one character:

name = "Jython"
name[0] = "P"    # TypeError: 'str' object does not support item assignment

Python throws a TypeError. You cannot change a string in place. This is because strings are immutable — once created, their contents cannot be modified. Not a single character. Not a slice. Not at all.

🚪 Threshold Concept: Immutability

Before: "I can change the third character of a string."

After: "Strings are immutable — I create a NEW string with the changes I want."

This is a fundamental shift in how you think about data. Instead of modifying existing strings, you build new strings from old ones. The original string remains unchanged (and Python's garbage collector eventually cleans it up if nothing references it anymore).

🧩 Productive Struggle

Before reading on, try to fix this code. The goal is to change "Jython" to "Python":

name = "Jython"
# How do you make name equal "Python"?

Spend a minute thinking about it. What tools do you already have?

Solution

You create a new string by combining pieces:

name = "Jython"
name = "P" + name[1:]    # Concatenate "P" with "ython"
print(name)               # Output: Python

Or use the `replace()` method (which we'll learn in Section 7.5):

name = "Jython"
name = name.replace("J", "P")
print(name)               # Output: Python

In both cases, you're not modifying the original string — you're creating a brand-new string and reassigning the variable `name` to point to it. The old `"Jython"` string still exists briefly in memory until Python cleans it up.

Why Immutability?

You might wonder: why would Python be designed this way? Isn't it inconvenient?

There are real engineering reasons:

Safety. When you pass a string to a function, you know the function can't alter your original string. This eliminates an entire category of bugs.
Efficiency. Python can optimize memory by reusing identical strings. If two variables hold "hello", Python can point them both to the same object in memory. This is only safe because neither can be changed.
Hashability. Strings can be used as dictionary keys and in sets (you'll learn about these in Chapter 9) precisely because they're immutable. Mutable objects can't be safely used as keys because their hash might change.
Thread safety. In concurrent programs, immutable objects can be shared between threads without locks. You won't need this for a while, but it matters in production systems.

For now, the practical takeaway is simple: every "change" to a string creates a new string. Get comfortable with patterns like text = text.upper() and result = text[:5] + "new_part" + text[8:].

🔄 Spaced Review (Ch 3): Remember from Chapter 3 that variables are name tags, not boxes? This is where that mental model pays off. When you write name = name.replace("J", "P"), you're not changing the object — you're pointing the name tag name at a different object.

7.5 Essential String Methods

Strings come loaded with methods — functions that belong to the string object and operate on its contents. Python strings have over 40 built-in methods. We'll focus on the ones you'll use most.

Remember: because strings are immutable, none of these methods change the original string. They all return new strings.

Changing Case

title = "the great gatsby"
print(title.upper())        # Output: THE GREAT GATSBY
print(title.lower())        # Output: the great gatsby
print(title.title())        # Output: The Great Gatsby
print(title.capitalize())   # Output: The great gatsby

# The original is unchanged
print(title)                # Output: the great gatsby

Case conversion is essential for case-insensitive comparisons — a pattern you'll use all the time:

user_input = input("Continue? (yes/no): ")    # User might type "YES", "Yes", "yes"
if user_input.lower() == "yes":
    print("Continuing...")

Searching

text = "Dr. Patel studies DNA sequences in her laboratory"

# find() returns the index of the first occurrence, or -1 if not found
print(text.find("DNA"))          # Output: 18
print(text.find("RNA"))          # Output: -1

# count() returns how many times a substring appears
print(text.count("a"))           # Output: 3

# startswith() and endswith() return True/False
print(text.startswith("Dr."))    # Output: True
print(text.endswith("lab"))      # Output: False
print(text.endswith("laboratory"))  # Output: True

# in operator (not a method, but essential for searching)
print("DNA" in text)             # Output: True
print("RNA" in text)             # Output: False

💡 Tip: Use find() when you need the position of a substring. Use in when you just need to know if it's present. There's also index(), which works like find() but raises a ValueError instead of returning -1. Prefer find() unless you want the exception.

Splitting and Joining

split() and join() are arguably the most powerful string methods. They convert between strings and lists.

# split() breaks a string into a list of substrings
csv_line = "Patel,Anika,Biology,University of Michigan"
fields = csv_line.split(",")
print(fields)    # Output: ['Patel', 'Anika', 'Biology', 'University of Michigan']
print(fields[2])  # Output: Biology

# split() with no argument splits on whitespace (spaces, tabs, newlines)
sentence = "The  quick   brown fox"
words = sentence.split()
print(words)     # Output: ['The', 'quick', 'brown', 'fox']
# Note: multiple spaces are treated as one separator

# join() does the opposite — combines a list into a string
words = ["Hello", "World"]
result = " ".join(words)
print(result)    # Output: Hello World

# join() with different separators
print(", ".join(words))   # Output: Hello, World
print("-".join(words))    # Output: Hello-World
print("".join(words))     # Output: HelloWorld

The join() syntax feels backwards at first — separator.join(list) instead of list.join(separator). Think of it as: "I'm a separator, and I'm joining these pieces together."

Stripping Whitespace

messy = "   Hello, World!   \n"

print(messy.strip())       # Output: Hello, World!   (both sides)
print(messy.lstrip())      # Output: Hello, World!   \n  (left only)
print(messy.rstrip())      # Output:    Hello, World!  (right only)

# strip() is essential when reading user input or file data
user_input = input("Enter your name: ")  # User types "  Alice  "
clean_name = user_input.strip()          # "Alice"

Replacing

text = "I love Java. Java is great!"
new_text = text.replace("Java", "Python")
print(new_text)   # Output: I love Python. Python is great!

# Replace with a limit (third argument)
new_text = text.replace("Java", "Python", 1)
print(new_text)   # Output: I love Python. Java is great!

# Remove characters by replacing with empty string
phone = "(555) 867-5309"
digits_only = phone.replace("(", "").replace(")", "").replace(" ", "").replace("-", "")
print(digits_only)  # Output: 5558675309

Text Adventure: Processing Player Commands

Let's see these methods in action. In Crypts of Pythonia, the text adventure game, we need to process whatever the player types into a standardized format:

def process_command(raw_input):
    """Clean and parse a player command."""
    # Strip whitespace and convert to lowercase
    cleaned = raw_input.strip().lower()

    # Split into words
    words = cleaned.split()

    if not words:
        return None, None

    # First word is the action, rest is the target
    action = words[0]
    target = " ".join(words[1:]) if len(words) > 1 else None

    return action, target


# Test with messy player input
commands = [
    "  GO north  ",
    "TAKE rusty sword",
    "  look  ",
    "use   health   potion",
]

for cmd in commands:
    action, target = process_command(cmd)
    print(f"Action: {action!r:12s} Target: {target!r}")

Output:
Action: 'go'         Target: 'north'
Action: 'take'       Target: 'rusty sword'
Action: 'look'       Target: None
Action: 'use'        Target: 'health potion'

Notice how strip(), lower(), and split() work together to normalize wildly inconsistent input into a clean, predictable format. This is real-world string processing in miniature.

Quick Reference: Common String Methods

Method	What It Does	Returns	Example
`upper()`	All uppercase	new str	`"hi".upper()` → `"HI"`
`lower()`	All lowercase	new str	`"HI".lower()` → `"hi"`
`strip()`	Remove leading/trailing whitespace	new str	`" hi ".strip()` → `"hi"`
`split(sep)`	Break into list	list	`"a,b".split(",")` → `["a","b"]`
`join(list)`	Combine list into string	new str	`",".join(["a","b"])` → `"a,b"`
`replace(old, new)`	Replace occurrences	new str	`"ab".replace("a","x")` → `"xb"`
`find(sub)`	Index of first match, or -1	int	`"abc".find("b")` → `1`
`count(sub)`	Number of occurrences	int	`"aaba".count("a")` → `3`
`startswith(s)`	Starts with prefix?	bool	`"abc".startswith("ab")` → `True`
`endswith(s)`	Ends with suffix?	bool	`"abc".endswith("bc")` → `True`
`title()`	Title Case	new str	`"hi there".title()` → `"Hi There"`
`capitalize()`	First char uppercase	new str	`"hi there".capitalize()` → `"Hi there"`

7.6 Processing Text

Now that you know the individual methods, let's combine them with the loops from Chapter 5 to process text in three different ways.

Iterating Character by Character

A for loop over a string gives you one character at a time:

def count_nucleotides(sequence):
    """Count each nucleotide in a DNA sequence."""
    counts = {"A": 0, "T": 0, "C": 0, "G": 0}
    for char in sequence.upper():
        if char in counts:
            counts[char] += 1
    return counts


dna = "ATCGATCGATCG"
result = count_nucleotides(dna)
print(result)    # Output: {'A': 3, 'T': 3, 'C': 3, 'G': 3}

This is Dr. Patel's bread-and-butter operation — counting nucleotides is the "Hello, World!" of bioinformatics.

Iterating Word by Word

Split the string first, then iterate:

def word_frequency(text):
    """Count how often each word appears."""
    words = text.lower().split()
    freq = {}
    for word in words:
        # Strip punctuation from each word
        clean_word = word.strip(".,!?;:'\"()")
        if clean_word:
            freq[clean_word] = freq.get(clean_word, 0) + 1
    return freq


sample = "To be or not to be, that is the question."
result = word_frequency(sample)
for word, count in sorted(result.items()):
    print(f"  {word}: {count}")

Output:
  be: 2
  is: 1
  not: 1
  or: 1
  question: 1
  that: 1
  the: 1
  to: 2

🔄 Spaced Review (Ch 5): This pattern — looping over a sequence to build up a result — is exactly the accumulator pattern from Chapter 5. Here the accumulator is a dictionary instead of a number, but the structure is the same.

Iterating Line by Line

Multi-line strings (using triple quotes) or strings read from files contain newline characters. Use splitlines() or split("\n") to process them:

def parse_fasta_header(fasta_text):
    """Extract sequence headers from FASTA-formatted text."""
    headers = []
    for line in fasta_text.splitlines():
        if line.startswith(">"):
            # Remove the '>' and strip whitespace
            header = line[1:].strip()
            headers.append(header)
    return headers


fasta_data = """>gi|12345|ref|NM_001.2| BRCA1 gene
ATCGATCGATCGATCGATCG
GCTAGCTAGCTAGCTAGCTA
>gi|67890|ref|NM_002.1| TP53 gene
TTTTAAAACCCCGGGG
>gi|11111|ref|NM_003.3| MYC gene
AAAGGGCCCTTTTAAA"""

headers = parse_fasta_header(fasta_data)
for h in headers:
    print(h)

Output:
gi|12345|ref|NM_001.2| BRCA1 gene
gi|67890|ref|NM_002.1| TP53 gene
gi|11111|ref|NM_003.3| MYC gene

This is a simplified version of what Dr. Patel does every day. Real FASTA files can contain millions of sequences, but the parsing logic is exactly this.

7.7 Input Validation with Strings

Here's a scenario you've encountered if you've built any interactive programs: you ask the user for a number, and they type "forty-two". Your program calls int("forty-two") and crashes with a ValueError.

The fix is to validate first, convert second. Python strings have built-in methods for checking what kind of characters they contain:

test_strings = ["42", "3.14", "hello", "Hello42", "   ", ""]

for s in test_strings:
    print(f"{s!r:10s}  isdigit={str(s.isdigit()):5s}  "
          f"isalpha={str(s.isalpha()):5s}  "
          f"isalnum={str(s.isalnum()):5s}")

Output:
'42'        isdigit=True   isalpha=False  isalnum=True
'3.14'      isdigit=False  isalpha=False  isalnum=False
'hello'     isdigit=False  isalpha=True   isalnum=True
'Hello42'   isdigit=False  isalpha=False  isalnum=True
'   '       isdigit=False  isalpha=False  isalnum=False
''          isdigit=False  isalpha=False  isalnum=False

Key things to notice: - isdigit() returns True only if every character is a digit. It doesn't handle decimals, negatives, or spaces. - isalpha() returns True only if every character is a letter. Spaces and numbers disqualify it. - isalnum() returns True if every character is a letter or a digit. - Empty strings return False for all three.

Grade Calculator: Validating Score Input

Let's apply this to the grade calculator running example. We need to make sure the user enters an actual number before we try to compute with it:

def get_valid_score(prompt):
    """Keep asking until the user enters a valid integer score (0-100)."""
    while True:
        raw = input(prompt).strip()

        if not raw:
            print("  Please enter a score — don't leave it blank.")
            continue

        if not raw.isdigit():
            print(f"  '{raw}' is not a valid number. Enter digits only (0-100).")
            continue

        score = int(raw)  # Safe to convert now — we know it's all digits
        if score < 0 or score > 100:
            print(f"  {score} is out of range. Enter a score between 0 and 100.")
            continue

        return score


# Usage
score = get_valid_score("Enter test score: ")
print(f"Score recorded: {score}")

Example session:
Enter test score:
  Please enter a score — don't leave it blank.
Enter test score: forty-two
  'forty-two' is not a valid number. Enter digits only (0-100).
Enter test score: -5
  '-5' is not a valid number. Enter digits only (0-100).
Enter test score: 85
Score recorded: 85

⚠️ Caveat: isdigit() doesn't handle negative numbers (the minus sign isn't a digit) or decimal points. For more sophisticated numeric validation, you'll learn about try/except in Chapter 11, which is the Pythonic way to handle this. For now, isdigit() handles the most common case — positive integers — cleanly.

🔄 Check Your Understanding #2

What does " 123 ".strip().isdigit() return? What about "12.5".isdigit()?

Answer

`" 123 ".strip().isdigit()` returns `True`. The `strip()` removes whitespace, leaving `"123"`, and `isdigit()` confirms all characters are digits. `"12.5".isdigit()` returns `False`. The decimal point `.` is not a digit.

7.8 Advanced Formatting with f-Strings

You've been using basic f-strings since Chapter 3: f"Hello, {name}!". But f-strings have a powerful formatting mini-language that lets you control exactly how values are displayed.

The syntax is {value:format_spec}, where the format spec comes after a colon.

Width and Alignment

# Right-aligned in a field of 10 characters (default for numbers)
print(f"{'Price':>10s}: {'Amount':>10s}")
print(f"{'='*10:s}: {'='*10:s}")
print(f"{9.99:>10.2f}: {3:>10d}")
print(f"{149.50:>10.2f}: {1:>10d}")
print(f"{1099.00:>10.2f}: {2:>10d}")

Output:
     Price:     Amount
==========: ==========
      9.99:          3
    149.50:          1
   1099.00:          2

The alignment characters: - < left-align (default for strings) - > right-align (default for numbers) - ^ center

# Alignment examples
name = "Python"
print(f"|{name:<20s}|")   # Left-aligned
print(f"|{name:>20s}|")   # Right-aligned
print(f"|{name:^20s}|")   # Centered
print(f"|{name:*^20s}|")  # Centered with fill character

Output:
|Python              |
|              Python|
|       Python       |
|*******Python*******|

Decimal Places

pi = 3.141592653589793

print(f"Default:    {pi}")           # 3.141592653589793
print(f"2 decimals: {pi:.2f}")       # 3.14
print(f"4 decimals: {pi:.4f}")       # 3.1416   (rounds!)
print(f"0 decimals: {pi:.0f}")       # 3

Thousands Separator

population = 8045311

print(f"Population: {population:,}")        # 8,045,311
print(f"Population: {population:_}")        # 8_045_311
print(f"Budget: ${2_500_000.75:,.2f}")      # $2,500,000.75

Percentage

ratio = 0.8567

print(f"Pass rate: {ratio:.1%}")     # 85.7%
print(f"Pass rate: {ratio:.0%}")     # 86%

Combining Format Specs

Format specs can be combined. The full syntax is {value:fill_char alignment width .precision type}:

# Practical example: formatted report
students = [
    ("Alice Chen", 92.567, 0.945),
    ("Bob Martinez", 87.333, 0.892),
    ("Carol Washington", 95.100, 0.971),
]

print(f"{'Student':<20s} {'Average':>8s} {'Attendance':>11s}")
print("-" * 41)
for name, avg, attend in students:
    print(f"{name:<20s} {avg:>8.1f} {attend:>10.1%}")

Output:
Student              Average  Attendance
-----------------------------------------
Alice Chen               92.6       94.5%
Bob Martinez             87.3       89.2%
Carol Washington         95.1       97.1%

This is what professional output looks like — clean columns, consistent alignment, appropriate precision. You'll use this pattern in the TaskFlow project below.

7.9 Escape Characters and Raw Strings

Some characters can't be typed directly into a string. You need escape characters — special sequences that start with a backslash (\).

Common Escape Characters

# Newline: \n
print("Line one\nLine two")
# Output:
# Line one
# Line two

# Tab: \t
print("Name\tAge\tCity")
print("Alice\t30\tNew York")
# Output:
# Name    Age     City
# Alice   30      New York

# Backslash: \\
print("C:\\Users\\Documents\\file.txt")
# Output: C:\Users\Documents\file.txt

# Quote inside a string: \" or \'
print("She said, \"Hello!\"")
# Output: She said, "Hello!"
print('It\'s a beautiful day')
# Output: It's a beautiful day

Escape Character Reference

Escape	Meaning	Example Output
`\n`	Newline	Line break
`\t`	Tab	Horizontal tab
`\\`	Literal backslash	`\`
`\"`	Double quote	`"`
`\'`	Single quote	`'`
`\0`	Null character	(empty)

Raw Strings

Sometimes you need a string with lots of backslashes — file paths on Windows, regular expressions (Chapter 22), or LaTeX formulas. Escaping every backslash gets tedious:

# Without raw string — need to double every backslash
path = "C:\\Users\\Patel\\Documents\\sequences\\data.fasta"

# With raw string — backslashes are literal
path = r"C:\Users\Patel\Documents\sequences\data.fasta"

print(path)  # Output: C:\Users\Patel\Documents\sequences\data.fasta

A raw string is created by prefixing the string with r or R. Inside a raw string, backslashes are treated as literal characters — no escape processing happens.

# Regular string: \n is a newline
print("Hello\nWorld")
# Output:
# Hello
# World

# Raw string: \n is literally backslash-n
print(r"Hello\nWorld")
# Output: Hello\nWorld

💡 Tip: You'll use raw strings extensively in Chapter 22 when we cover regular expressions. For now, just know they exist and that they're useful for Windows file paths and any string with literal backslashes.

🔄 Check Your Understanding #3

What does the following code print?

print("A\tB\tC")
print(r"A\tB\tC")
print("Line1\nLine2")
print(len("Hello\n"))

Answer

A   B   C
A\tB\tC
Line1
Line2
6

The first `print` outputs tab-separated characters. The second, being a raw string, outputs the literal backslash-t characters. The third outputs two lines. The `len("Hello\n")` is 6 because `\n` is a single character (newline), so the string contains H-e-l-l-o-newline = 6 characters.

7.10 Project Checkpoint: TaskFlow v0.6

Time to put everything together. In Chapter 6, we refactored TaskFlow into functions. Now we'll add two new features that use string processing:

Search tasks by keyword — case-insensitive using lower()
Formatted task display — aligned columns using f-string formatting

Here's the updated version with the new features highlighted:

"""
TaskFlow v0.6 — Task manager with search and formatted display.

New in v0.6:
  - search_tasks(): case-insensitive keyword search
  - Formatted display with aligned columns
"""

# --- Task storage ---
tasks = []


def add_task():
    """Prompt for a task description and priority, then add to the list."""
    description = input("Task description: ").strip()
    if not description:
        print("  Task description cannot be empty.")
        return

    priority = input("Priority (high/medium/low): ").strip().lower()
    if priority not in ("high", "medium", "low"):
        print(f"  '{priority}' is not valid. Using 'medium'.")
        priority = "medium"

    tasks.append({"description": description, "priority": priority})
    print(f"  Added: '{description}' [{priority}]")


def list_tasks():
    """Display all tasks in a formatted table."""
    if not tasks:
        print("  No tasks yet.")
        return

    # Header
    print(f"\n  {'#':<4s} {'Description':<35s} {'Priority':>10s}")
    print(f"  {'-'*4} {'-'*35} {'-'*10}")

    # Task rows
    for i, task in enumerate(tasks, start=1):
        desc = task["description"]
        pri = task["priority"]

        # Truncate long descriptions
        if len(desc) > 33:
            desc = desc[:30] + "..."

        print(f"  {i:<4d} {desc:<35s} {pri:>10s}")

    print(f"\n  Total: {len(tasks)} task(s)")


def delete_task():
    """Delete a task by its number."""
    list_tasks()
    if not tasks:
        return

    raw = input("Delete task number: ").strip()
    if not raw.isdigit():
        print(f"  '{raw}' is not a valid number.")
        return

    num = int(raw)
    if num < 1 or num > len(tasks):
        print(f"  No task #{num}. Enter 1-{len(tasks)}.")
        return

    removed = tasks.pop(num - 1)
    print(f"  Deleted: '{removed['description']}'")


def search_tasks():
    """Search tasks by keyword (case-insensitive)."""
    keyword = input("Search keyword: ").strip().lower()
    if not keyword:
        print("  Please enter a search term.")
        return

    matches = []
    for i, task in enumerate(tasks, start=1):
        if keyword in task["description"].lower():
            matches.append((i, task))

    if not matches:
        print(f"  No tasks matching '{keyword}'.")
        return

    print(f"\n  Found {len(matches)} match(es) for '{keyword}':")
    print(f"  {'#':<4s} {'Description':<35s} {'Priority':>10s}")
    print(f"  {'-'*4} {'-'*35} {'-'*10}")

    for num, task in matches:
        desc = task["description"]
        if len(desc) > 33:
            desc = desc[:30] + "..."
        print(f"  {num:<4d} {desc:<35s} {task['priority']:>10s}")


def show_menu():
    """Display the main menu."""
    print("\n--- TaskFlow v0.6 ---")
    print("1. Add task")
    print("2. List tasks")
    print("3. Delete task")
    print("4. Search tasks")
    print("5. Quit")


def main():
    """Main loop for the TaskFlow application."""
    print("Welcome to TaskFlow v0.6!")
    print("Now with search and formatted display.\n")

    while True:
        show_menu()
        choice = input("\nChoose (1-5): ").strip()

        if choice == "1":
            add_task()
        elif choice == "2":
            list_tasks()
        elif choice == "3":
            delete_task()
        elif choice == "4":
            search_tasks()
        elif choice == "5":
            print("Goodbye!")
            break
        else:
            print(f"  '{choice}' is not a valid option.")


if __name__ == "__main__":
    main()

What's New in v0.6

Let's break down the string techniques used:

strip() on every input() call — cleans up accidental whitespace
lower() for case-insensitive priority and search — "BUY groceries".lower() becomes "buy groceries", matching a search for "buy" or "groceries"
isdigit() to validate the delete number before calling int()
f-string alignment (:<4d, :<35s, :>10s) for clean column output
String truncation with slicing (desc[:30] + "...") for long descriptions
in operator for substring search (keyword in task["description"].lower())

💡 Looking Ahead: In Chapter 8, we'll convert tasks from dictionaries to tuples and add sorting by priority. In Chapter 9, we'll use dictionaries more fully for category filtering. And in Chapter 10, we'll save tasks to a file — making TaskFlow persistent.

🐛 Debugging Walkthrough: Immutability TypeError

Here's a common debugging scenario. A student writes this code to censor a word in a sentence:

def censor(text, word):
    """Replace a word with asterisks."""
    position = text.find(word)
    if position != -1:
        for i in range(position, position + len(word)):
            text[i] = "*"     # TypeError!
    return text

result = censor("The password is secret123", "secret123")

The error: TypeError: 'str' object does not support item assignment

The fix: Use replace() instead:

def censor(text, word):
    """Replace a word with asterisks."""
    replacement = "*" * len(word)
    return text.replace(word, replacement)

result = censor("The password is secret123", "secret123")
print(result)  # Output: The password is *********

Or, using slicing and concatenation:

def censor(text, word):
    """Replace a word with asterisks."""
    position = text.find(word)
    if position == -1:
        return text
    stars = "*" * len(word)
    return text[:position] + stars + text[position + len(word):]

result = censor("The password is secret123", "secret123")
print(result)  # Output: The password is *********

Both approaches create a new string rather than trying to modify the original. The replace() version is cleaner and handles multiple occurrences automatically.

Chapter Summary

Strings are sequences of characters, and they're the most common data type in real-world software. Here's what you've learned:

Indexing accesses individual characters: s[0] (first), s[-1] (last)
Slicing extracts substrings: s[start:stop:step], with stop being exclusive
Strings are immutable — you can't change them in place, only create new ones
String methods like split(), join(), strip(), replace(), find(), upper(), and lower() are your daily tools for text processing
Inspection methods like isdigit(), isalpha(), and isalnum() validate input
f-string format specs control width, alignment, decimal places, and separators
Escape characters (\n, \t, \\) represent special characters; raw strings (r"...") disable escape processing

The threshold concept of this chapter — immutability — will come up again when you learn about tuples in Chapter 8 and becomes even more important when you study mutability and aliasing (the threshold concept of Chapter 8). The contrast between immutable strings and mutable lists is one of the most important distinctions in Python.

What's next: Chapter 8 introduces lists and tuples — mutable and immutable sequences. You'll see how the indexing and slicing you learned here apply to all sequence types, and you'll encounter the flip side of immutability: what happens when objects can be changed in place.