Case Study 02: Dependency Hell and How AI Helped Escape It

Resolving Complex Dependency Conflicts with AI Guidance

Background

Marcus Chen was the lead developer of HealthTrack, an open-source health data analytics platform built in Python. The project had grown organically over three years, accumulating 47 direct dependencies and hundreds of transitive dependencies. It used Flask for the web interface, pandas and NumPy for data processing, scikit-learn for statistical analysis, Plotly for visualizations, and SQLAlchemy for database access.

The trouble started when Marcus attempted to upgrade their machine learning pipeline to use the latest version of scikit-learn, which included a new anomaly detection algorithm requested by several users. What should have been a simple version bump turned into a week-long odyssey through dependency conflicts that nearly forced a major architectural refactoring.

The Initial Conflict

Marcus started with what seemed like a straightforward change:

# Original requirements.txt (excerpt)
scikit-learn==1.2.2
pandas==1.5.3
numpy==1.24.3
scipy==1.10.1
plotly==5.14.1
flask==2.3.2
sqlalchemy==2.0.15
celery==5.3.1

He updated scikit-learn:

$ pip install scikit-learn==1.4.0

The first error appeared immediately:

ERROR: pip's dependency resolver does not currently take into account
all the packages that are installed.

Package requirements conflict:
- scikit-learn 1.4.0 requires numpy>=1.19.5,<2.0
- scikit-learn 1.4.0 requires scipy>=1.6.0
- pandas 1.5.3 requires numpy>=1.20.3
  (compatible, but pandas 1.5.3 is not compatible with numpy>=1.26)
- plotly 5.14.1 requires tenacity>=6.2.0
- some-ml-plugin 0.3.1 requires scikit-learn>=1.0,<1.3

The last line was the critical one: some-ml-plugin, a community plugin that Marcus had added six months earlier, had an upper bound constraint on scikit-learn that prevented the upgrade.

First AI Consultation: Understanding the Conflict Web

Marcus turned to AI with a comprehensive description:

I'm trying to upgrade scikit-learn from 1.2.2 to 1.4.0 in a Python
project with 47 direct dependencies. I'm hitting dependency conflicts.

Here's my current requirements.txt:
[full requirements.txt - 47 lines]

Here's the pip error when I try to upgrade:
[full error output]

And here's the output of pipdeptree showing the dependency tree:
[pipdeptree output - showing the conflict chain]

The key conflict seems to be:
1. scikit-learn 1.4.0 works with numpy>=1.19.5
2. But some-ml-plugin 0.3.1 requires scikit-learn<1.3
3. And pandas 1.5.3 has compatibility issues with newer numpy

I need scikit-learn 1.4.0 for its new anomaly detection features.
What are my options?

The AI provided a systematic analysis:

"You have a dependency conflict diamond. Here's the structure:

Your project ├── scikit-learn 1.4.0 (desired) │ ├── numpy >=1.19.5 │ └── scipy >=1.6.0 ├── some-ml-plugin 0.3.1 │ └── scikit-learn >=1.0, <1.3 ← CONFLICT ├── pandas 1.5.3 │ └── numpy >=1.20.3 (but tested only up to 1.24.x) └── plotly 5.14.1 └── (no numpy dependency)

You have three strategic options: 1. Replace some-ml-plugin — If you can find an alternative or fork the plugin 2. Upgrade pandas alongside scikit-learn — pandas 2.0+ supports newer numpy 3. Pin compatible versions — Find a version combination that satisfies all constraints

I recommend starting with option 3 to understand the constraint space, then moving to option 2 if needed."

The Cascade Effect

Marcus tried option 3 first. He asked AI to help find compatible versions:

Can you help me find a version combination where scikit-learn>=1.4.0
works with all my other dependencies? Here are the version constraints
from PyPI for each package:

scikit-learn 1.4.0: numpy>=1.19.5, scipy>=1.6.0, threadpoolctl>=2.0.0
some-ml-plugin 0.3.1: scikit-learn>=1.0,<1.3, numpy>=1.20.0
pandas 2.0.0: numpy>=1.22.4, python-dateutil>=2.8.2, pytz>=2020.1
pandas 2.1.0: numpy>=1.23.2, python-dateutil>=2.8.2, pytz>=2020.1

The AI quickly determined that no version of some-ml-plugin was compatible with scikit-learn 1.4.0. The plugin had not been updated in eight months and its maintainer had not responded to GitHub issues about the version constraint.

Marcus then tried option 2 — upgrading pandas alongside scikit-learn. This triggered a second cascade:

$ pip install scikit-learn==1.4.0 pandas==2.1.0
ERROR: Cannot install pandas==2.1.0 and celery==5.3.1 because:
  pandas 2.1.0 requires numpy>=1.23.2
  celery 5.3.1 requires click>=8.1.2
  click 8.1.7 requires importlib-metadata; python_version < "3.8"
  But: flask 2.3.2 requires click>=8.1.3 (OK)

  Additional conflict:
  some-data-validator 1.2.0 requires pandas>=1.3,<2.0

Another upper-bound constraint — this time some-data-validator required pandas 1.x. Marcus shared this growing web of conflicts with AI:

The dependency situation is getting worse. Now I have TWO packages
with upper-bound constraints blocking my upgrade:

1. some-ml-plugin 0.3.1: scikit-learn<1.3 (blocks sklearn upgrade)
2. some-data-validator 1.2.0: pandas<2.0 (blocks pandas upgrade)

Both are community packages with low maintenance activity. I use
some-ml-plugin for exactly one function (custom_feature_selector)
and some-data-validator for its schema validation.

What's the best strategy here? I'm starting to think I need to
replace these packages rather than trying to find compatible versions.

The AI-Guided Resolution Strategy

The AI proposed a phased approach:

"Given that both blocking packages are low-activity dependencies used for limited functionality, I recommend:

Phase 1: Replace some-ml-plugin The custom_feature_selector function likely wraps scikit-learn's SelectKBest or mutual_info_classif. If you share the function's code, I can help you write a direct implementation using scikit-learn's built-in features, eliminating the dependency entirely.

Phase 2: Replace some-data-validator For schema validation, consider: - pydantic (v2) — fast, well-maintained, no pandas dependency - pandera — specifically designed for pandas DataFrame validation, supports pandas 2.x - cerberus — lightweight, no pandas version constraints

Phase 3: Upgrade simultaneously Once the blocking dependencies are removed, upgrade scikit-learn, pandas, and numpy together to compatible versions.

Phase 4: Add version management tooling Switch from requirements.txt to pyproject.toml with pip-tools or poetry for better dependency resolution and lock file management."

Phase 1: Replacing some-ml-plugin

Marcus shared the plugin function he used:

from some_ml_plugin import custom_feature_selector

# Used in our pipeline like this:
selected_features = custom_feature_selector(
    X_train, y_train,
    method='mutual_info',
    n_features=20,
    threshold=0.05
)

He asked AI to help write a replacement:

Here's how I use custom_feature_selector. Can you help me write a
replacement using scikit-learn 1.4.0's built-in feature selection?
I need it to support mutual information scoring, selecting top N
features, and filtering by a minimum score threshold.

The AI provided a clean implementation using sklearn.feature_selection.SelectKBest with mutual_info_classif, which Marcus tested against the old implementation to verify identical results. The replacement was 35 lines of well-documented code that used no external dependencies beyond scikit-learn itself.

Phase 2: Replacing some-data-validator

For schema validation, Marcus chose pandera because it was specifically designed for pandas DataFrames and supported pandas 2.x:

# Old approach with some-data-validator
from some_data_validator import validate_schema

schema = {
    'patient_id': {'type': 'int', 'required': True},
    'heart_rate': {'type': 'float', 'min': 30, 'max': 250},
    'timestamp': {'type': 'datetime', 'required': True}
}
validate_schema(df, schema)

# New approach with pandera
import pandera as pa

schema = pa.DataFrameSchema({
    'patient_id': pa.Column(int, nullable=False),
    'heart_rate': pa.Column(float, pa.Check.in_range(30, 250)),
    'timestamp': pa.Column('datetime64[ns]', nullable=False)
})
schema.validate(df)

Marcus asked AI to help translate all 12 schema definitions from the old format to pandera, which the AI completed accurately.

Phase 3: The Coordinated Upgrade

With both blocking dependencies removed, Marcus attempted the upgrade:

I've removed some-ml-plugin and some-data-validator. Now I need to
upgrade scikit-learn to 1.4.0 and pandas to 2.1.0. Here's my updated
requirements.txt (minus the two removed packages):

[updated requirements.txt]

Can you suggest a compatible set of versions for all packages?

The AI analyzed the constraint graph and suggested:

scikit-learn==1.4.0
pandas==2.1.4
numpy==1.26.3
scipy==1.12.0
pandera==0.18.0
flask==2.3.2      # No change needed
sqlalchemy==2.0.25 # Minor update for bug fixes
celery==5.3.6     # Minor update
plotly==5.18.0     # Updated for pandas 2.x compatibility

The installation succeeded. But the work was not done — Marcus needed to verify that the application still worked correctly with the new versions.

Phase 4: Post-Upgrade Debugging

After the upgrade, 23 of 340 tests failed. Marcus categorized the failures with AI's help:

After upgrading pandas from 1.5.3 to 2.1.4 and scikit-learn from
1.2.2 to 1.4.0, 23 tests are failing. Here are the error categories:

Category 1 (15 tests): FutureWarning turned into errors
  - df.append() deprecated, use pd.concat()
  - df.swaplevel() signature changed

Category 2 (5 tests): dtype changes
  - Integer columns now use nullable Int64 instead of int64
  - Some comparisons fail due to NA vs NaN differences

Category 3 (3 tests): sklearn API changes
  - n_features_in_ attribute now required
  - Renamed parameters in some estimators

Can you help me address each category systematically?

The AI provided migration guides for each category, which Marcus worked through over the course of a day. The pandas 1.x to 2.x migration required the most changes, primarily replacing deprecated DataFrame.append() calls with pd.concat() and updating dtype handling.

Phase 5: Preventing Future Dependency Pain

Finally, Marcus asked AI for advice on preventing this situation from recurring:

This dependency upgrade took almost a week. How can I prevent this
level of pain in the future? The project has 47 direct dependencies
and is growing.

The AI provided comprehensive recommendations:

  1. Switch to pyproject.toml with pip-compile — Separate abstract requirements (what you need) from concrete dependencies (exact versions). Use pip-compile to generate a lock file.

  2. Set up automated dependency update checks — Use Dependabot or Renovate to create PRs for dependency updates weekly, catching conflicts early when they are small.

  3. Minimize direct dependencies — Audit all 47 dependencies. Remove unused packages, replace large packages used for small features with lightweight alternatives or custom code.

  4. Avoid packages with upper-bound constraints — When evaluating new dependencies, check their version constraints. Packages that pin upper bounds on popular libraries are likely to cause conflicts.

  5. Run CI with dependency matrix — Test against both minimum and maximum supported versions of key dependencies (numpy, pandas, scikit-learn) to catch compatibility issues early.

  6. Create a dependency policy document — Define criteria for adding new dependencies: maintenance activity, version constraint philosophy, test coverage, and license compatibility.

Marcus implemented all six recommendations. He reduced the dependency count from 47 to 38 by removing unused packages and replacing three packages with custom implementations. He set up Renovate for automated updates and added a CI matrix for numpy and pandas versions.

Key Lessons

Upper-bound version constraints in dependencies are the primary cause of dependency hell. Both blocking packages used < constraints that prevented upgrades. When evaluating dependencies, check their constraint philosophy.

AI can analyze complex constraint graphs faster than humans. The dependency web involved dozens of packages with overlapping constraints. AI identified the blocking packages and suggested a resolution strategy within minutes.

Sometimes the right fix is to remove a dependency, not to resolve the conflict. Both blocking packages were used for minor features that could be implemented directly. The custom implementations were smaller, faster, and removed the constraint entirely.

Coordinate major upgrades. Upgrading one package at a time through a web of interdependencies is like trying to untangle a knot by pulling one string. Upgrading related packages together (scikit-learn + numpy + scipy + pandas) is more likely to find a compatible combination.

Prevention is cheaper than cure. The week spent resolving this conflict could have been avoided with automated dependency update checking that would have surfaced the conflicts months earlier, when the solutions would have been simpler.

The Role of AI Throughout

Throughout this ordeal, AI served as:

  • Constraint analyzer — Mapping the dependency graph and identifying blocking packages
  • Migration guide — Providing specific code changes for pandas and scikit-learn API updates
  • Alternative researcher — Suggesting replacement packages with compatible constraints
  • Strategy advisor — Recommending a phased approach rather than trying to solve everything at once
  • Prevention consultant — Providing a comprehensive strategy to avoid future dependency issues

Marcus estimated that AI reduced the resolution time from potentially two weeks to five days, primarily by eliminating the manual research phase of understanding each package's version constraints and finding compatible combinations. The AI's ability to reason about version constraint intersections across multiple packages simultaneously was particularly valuable — this is a task that is tedious and error-prone for humans but natural for AI.

The experience also changed how Marcus evaluated new dependencies. He now checks three things before adding any package: maintenance activity (last commit date, issue response time), version constraints (does it use upper bounds?), and alternatives (could he implement the needed functionality in less than 100 lines?). This discipline has kept the dependency count stable and prevented similar conflicts from developing.