Search This Blog

Wednesday, July 2, 2025

🔍 Testing an ML Model ≠ Testing Traditional Code


 Testing a Machine Learning (ML) model is very different from testing traditional software because:

  • The output is probabilistic, not deterministic.

  • The behavior depends on data patterns, not just logic.

To test an ML model effectively, you need a multi-layered strategy combining functionaldata-driven, and performance-based testing.


✅ 1. Unit Testing the ML Pipeline (Code-Level)

🔍 What to Test:

  • Data preprocessing methods (normalization, encoding)

  • Feature extraction logic

  • Model loading and inference function

💡 Example:

# Check that normalization function scales data between 0–1

assert 0 <= normalize(input)[0] <= 1


✅ 2. Input/Output Testing (Functional Testing)

🔍 What to Test:

  • Given a known input, does the model return an expected class/label/output?

  • Test with edge-case inputs (nulls, NaNs, unexpected formats)

💡 Example:

# Input: image of a cat
# Expected Output: "cat" label
assert model.predict(cat_image) == "cat"


✅ 3. Accuracy / Precision / Recall Testing (Performance Testing)

🔍 What to Test:

  • Use a test dataset with known labels

  • Compare predictions vs ground truth

  • Track metrics: accuracy, precision, recall, F1-score

💡 Example:

from sklearn.metrics import accuracy_score

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_true, y_pred)

assert accuracy > 0.90 


✅ 4. Regression Testing

🔍 What to Test:

  • When retraining/updating model, ensure performance doesn’t degrade

  • Compare new vs old model predictions on same test set

💡 Example:

assert new_model_accuracy >= old_model_accuracy - 0.01


✅ 5. Robustness Testing

🔍 What to Test:

  • Slightly modified inputs (noise, adversarial examples)

  • Input format changes, missing values

💡 Example:

assert model.predict(perturbed_input) ≈ model.predict(original_input)


✅ 6. Bias & Fairness Testing

🔍 What to Test:

  • Check model behavior across different demographic groups

  • Ensure fairness in prediction

💡 Example:

# Accuracy across gender groups

assert abs(male_accuracy - female_accuracy) < 0.05 


✅ 7. Integration & Deployment Testing

🔍 What to Test:

  • Can the model be deployed?

  • Can it process input in production format (e.g., JSON via REST API)?

  • Is the latency acceptable?


🧠 Summary of ML Model Testing Layers

Layer

What to Test

Tool Examples

Unit Testing

Preprocessing, feature logic

pytestunittest

Functional Testing

Input → Output checks

assert, Golden datasets

Performance Metrics

Accuracy, Precision, F1

sklearn.metrics

Regression Testing

New vs Old Model consistency

Custom scripts

Robustness

Noisy, missing, edge inputs

Fuzzing, noise injection

Bias/Fairness

Group-wise comparison

AIF360Fairlearn, custom code

Integration

API, latency, resource usage

Postman, API tests, load tests

Complete script to test model can be found here

No comments:

My Profile

My photo
can be reached at 09916017317