Testing Protocol
This document provides step-by-step testing protocols for validating PyAMA functionality across all packages and components.
Overview
PyAMA testing follows a multi-layered approach:
Unit Tests: Individual function and class testing
Integration Tests: Component interaction testing
Visual Tests: Algorithm verification with visual output
Client Tests: API/UI integration testing for pyama-react
Performance Tests: Load and stress testing
Test Organization
Tests are organized within each package directory. Each package has its own tests/ subdirectory. The main test suite is currently in pyama/tests/:
{package}/tests/ (e.g., pyama/tests/)
├── _plots/ # Generated visual test outputs
├── _results/ # Test artifacts (gitignored)
├── __init__.py
├── conftest.py # Pytest configuration
├── analysis/ # Analysis model tests
│ ├── test_event.py
│ └── test_kinetic.py
├── features/ # Feature extraction tests
│ ├── __init__.py
│ ├── test_area.py
│ ├── test_intensity_total.py
│ └── test_particle_num.py
├── processing/ # Processing workflow tests
│ ├── test_merge.py
│ ├── test_normalization.py
│ ├── test_seg.py
│ └── test_track.py
└── utils/ # Utility function tests
├── __init__.py
└── progress.py
Running Tests
All Tests
# Run complete test suite
uv run pytest
# Run with coverage
uv run pytest --cov=pyama --cov-report=html
Specific Categories
# Unit tests only
uv run pytest tests/unit/
# Integration tests
uv run pytest tests/integration/
# Visual tests (require manual inspection)
uv run pytest pyama/tests/processing/test_seg.py -v
# Performance tests
uv run pytest tests/performance/
Visual Testing Guidelines
Plot Generation
All visual tests save plots to {package}/tests/_plots/ (e.g., pyama/tests/_plots/ for core tests):
def test_segmentation_round_cells():
"""Test segmentation on round synthetic cells."""
# Generate synthetic data
np.random.seed(42) # Deterministic RNG
# Create test image
image = generate_round_cells(n_cells=10, noise_level=0.1)
# Run segmentation
mask = segment_cells(image, method="logstd")
# Visualize results
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
axes[0].imshow(image, cmap='gray')
axes[0].set_title('Original Image')
axes[1].imshow(mask, cmap='binary')
axes[1].set_title('Segmentation Result')
# Add boundaries
for i, prop in enumerate(regionprops(mask)):
if prop.area > 50: # Filter small objects
y0, x0, y1, x1 = prop.bbox
rect = Rectangle((x0, y0), x1-x0, y1-y0,
fill=False, edgecolor='red', linewidth=2)
axes[1].add_patch(rect)
# Save plot (adjust package name as needed)
plot_dir = os.getenv('PYAMA_PLOT_DIR', 'pyama/tests/_plots')
os.makedirs(plot_dir, exist_ok=True)
plt.savefig(f'{plot_dir}/segmentation_round_cells.pdf', dpi=150)
plt.close(fig) # important: close figure
# Assertions
n_cells = len([r for r in regionprops(mask) if r.area > 50])
assert n_cells == 10, f"Expected 10 cells, got {n_cells}"
Test Requirements
From AGENTS.md protocol rules:
Essential Tests Only
Event detection: noisy step up/down with event lines
Particle counting: many Gaussian particles with bounding boxes
Output Location
Always save to
{package}/tests/_plots/(e.g.,pyama/tests/_plots/for core tests)Override with
PYAMA_PLOT_DIRenvironment variable
Deterministic RNG
np.random.seed(42) # Or any fixed seed
Robust Assertions
# Good: Count matches expected assert len(detected_cells) >= expected_min # Bad: Tight numerical tolerance assert abs(mean_intensity, 2.345, 0.001) # Too strict
No OS-Specific Paths
# Bad: Linux temp tempfile.mktemp() # Don't use # Good: Current directory Path("test_output").mkdir(exist_ok=True)
PyAMA-React Client Testing Protocol
The desktop client should be tested end-to-end against a running pyama API server.
Manual Testing Checklist
Connection and startup
Client starts successfully via
bun run devBackend connection indicator reflects API availability
Processing workflow
ND2/CZI file selection works
Channel/feature config can be edited
Task creation succeeds and progress updates stream correctly
Task lifecycle
Task list refreshes status (
pending/running/succeeded/failed)Cancel operation works and updates UI state
Error handling
Network/API failures produce actionable UI messages
Invalid input is blocked with clear validation feedback
Integration Testing
Workflow Integration
# tests/integration/test_complete_workflow.py
def test_end_to_end_workflow():
"""Test complete workflow from ND2 to fitted results."""
# Setup
with TemporaryDirectory() as tmpdir:
output_dir = Path(tmpdir) / "output"
merged_dir = Path(tmpdir) / "merged"
# Step 1: Process ND2
config = create_test_config(output_dir)
success = run_complete_workflow(
metadata=test_metadata,
config=config,
fov_start=0,
fov_end=4, # Small batch
batch_size=2,
n_workers=2
)
assert success, "Processing workflow failed"
# Step 2: Verify outputs
fov_dirs = list(output_dir.glob("fov_*"))
assert len(fov_dirs) == 5
for fov_dir in fov_dirs:
trace_file = fov_dir / "test_fov_*_traces.csv"
assert trace_file.exists(), f"Missing trace in {fov_dir}"
# Verify CSV structure
df = pd.read_csv(trace_file)
required_columns = ["fov", "cell", "frame", "good"]
for col in required_columns:
assert col in df.columns, f"Missing column: {col}"
# Step 3: Merge results
sample_yaml = output_dir / "samples.yaml"
create_sample_file(sample_yaml)
run_merge(sample_yaml, output_dir, merged_dir)
# Verify merged files
merged_files = list(merged_dir.glob("*_merged.csv"))
assert len(merged_files) > 0
# Step 4: Analyze results
for merged_file in merged_files:
df = pd.read_csv(merged_file)
model = get_model("maturation")
result = fit_model(model, df['time'], df['value'])
assert result['success'], f"Fitting failed for {merged_file}"
assert result['r_squared'] > 0.5, "Poor fit quality"
API Integration
# tests/integration/test_api_workflow.py
def test_api_complete_workflow():
"""Test API workflow endpoints."""
client = TestClient(app)
# Step 1: Load metadata
response = client.post(
"/api/v1/processing/load-metadata",
json={"file_path": TEST_ND2_PATH}
)
assert response.status_code == 200
metadata = response.json()["data"]
# Step 2: Start workflow
config = {
"microscopy_path": TEST_ND2_PATH,
"output_dir": TEST_OUTPUT_DIR,
"channels": {
"phase": {"channel": 0, "features": ["area"]},
"fluorescence": []
},
"parameters": {"fov_start": 0, "fov_end": 4}
}
response = client.post("/api/v1/processing/workflow/start", json=config)
assert response.status_code == 200
job_id = response.json()["data"]["job_id"]
# Step 3: Monitor completion
for _ in range(60): # 60 second timeout
response = client.get(f"/api/v1/processing/workflow/status/{job_id}")
status = response.json()["data"]["status"]
if status == "completed":
break
elif status == "failed":
pytest.fail("Workflow failed")
time.sleep(1)
# Step 4: Get results
response = client.get(f"/api/v1/processing/workflow/results/{job_id}")
assert response.status_code == 200
results = response.json()["data"]
assert len(results["traces"]) == 5 # FOVs 0-4
Performance Testing
Memory Usage
# tests/performance/test_memory.py
def test_memory_usage_large_dataset():
"""Test memory usage with large datasets."""
import psutil
import os
process = psutil.Process(os.getpid())
initial_memory = process.memory_info().rss / 1024 / 1024 # MB
# Process large dataset
config = create_large_dataset_config(n_fovs=50, n_frames=100)
with memory_monitor() as memory_log:
success = run_complete_workflow(
metadata=large_metadata,
config=config,
batch_size=2,
n_workers=4
)
peak_memory = max(memory_log)
memory_increase = peak_memory - initial_memory
# Should not exceed reasonable limits
assert memory_increase < 4096, f"Memory usage too high: {memory_increase} MB"
assert success, "Large dataset processing failed"
Processing Speed
# tests/performance/test_speed.py
def test_processing_speed():
"""Benchmark processing speed."""
import time
sizes = [(10, 50), (20, 100), (50, 200)] # (FOVs, frames)
speed_results = []
for n_fovs, n_frames in sizes:
start_time = time.time()
run_complete_workflow(
metadata=create_test_metadata(n_fovs=n_fovs, n_frames=n_frames),
config=test_config,
n_workers=4
)
elapsed = time.time() - start_time
cells_per_second = (n_fovs * n_frames * AVG_CELLS_PER_FOV) / elapsed
speed_results.append((n_fovs, n_frames, cells_per_second))
print(f"{n_fovs}x{n_frames}: {cells_per_second:.1f} cells/sec")
# Verify scaling is reasonable
assert speed_results[2][2] > speed_results[0][2], "No speed improvement with larger batches"
Data Validation
Synthetic Data Generation
# tests/utils/test_data.py
class SyntheticDataGenerator:
"""Generate test microscopy data with known properties."""
@staticmethod
def create_cell_tracks(n_cells: int, n_frames: int):
"""Create synthetic cell trajectories."""
tracks = []
for cell_id in range(n_cells):
# Random walk with drift
x = np.cumsum(np.random.randn(n_frames) * 0.5)
y = np.cumsum(np.random.randn(n_frames) * 0.5)
# Add linear drift
x += np.linspace(0, 10, n_frames)
y += np.linspace(0, 5, n_frames)
tracks.append({
'cell_id': cell_id,
'positions': np.column_stack([x, y])
})
return tracks
@staticmethod
def create_fluorescence_trace():
"""Create synthetic fluorescence with maturation kinetics."""
t = np.linspace(0, 30, 180) # 30 hours, 180 points
# Maturation model: f(t) = A * (1 - exp(-kt)) + B
A = 2.0 # Amplitude
k = 0.1 # Rate constant
B = 0.5 # Baseline
# Add noise
signal = A * (1 - np.exp(-k * t)) + B
noise = np.random.randn(len(t)) * 0.1
return t, signal + noise
CSV Validation
# tests/validation/test_csv.py
def validate_trace_csv(filepath: Path) -> bool:
"""Validate trace CSV format and content."""
try:
df = pd.read_csv(filepath)
# Required columns
required = ['fov', 'cell', 'frame', 'good']
for col in required:
if col not in df.columns:
return False
# Data types
assert df['fov'].dtype in [int, 'int64']
assert df['cell'].dtype in [int, 'int64']
assert df['frame'].dtype in [int, 'int64']
assert df['good'].dtype == bool
# Value ranges
assert df['frame'].min() >= 0
assert df['cell'].min() >= 1
assert len(df) > 0
return True
except Exception:
return False
Continuous Integration
GitHub Actions Workflow
# .github/workflows/test.yml
name: Test Suite
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.11', '3.12']
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install UV
run: pip install uv
- name: Install dependencies
run: uv sync --all-extras
- name: Run tests
run: uv run pytest --cov=pyama --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
Test Categories in CI
Unit Tests: Fast, run on all PRs
Integration Tests: Medium speed, run on PRs
Performance Tests: Slow, run on main branch
Visual Tests: Manual verification, artifacts saved
Test Data Management
Synthetic Test Data
Small datasets checked into repository
Large datasets generated on-the-fly
Deterministic seed for reproducibility
Real Test Data
Anonymized experimental data
Stored in separate repository
Accessed via Git LFS or download server
Test Artifacts
Plots saved to
_plots/Test reports in HTML format
Performance benchmarks as JSON
Debugging Tests
Debug Mode
# Run single test with debugging
uv run pytest pyama/tests/processing/test_merge.py -v -s --pdb
# Enable debug logging
PYAMA_LOG_LEVEL=DEBUG uv run pytest
Test Output
# In test files
import logging
logger = logging.getLogger(__name__)
def test_something():
logger.info("Starting test")
# ... test code ...
logger.debug(f"Intermediate result: {result}")
Common Issues
Flaky Tests
Use deterministic seeds
Add retry logic for network calls
Increase timeouts
Environment Specific
Use temp directories
Avoid hardcoded paths
Test on multiple platforms
Resource Exhaustion
Clean up resources in tearDown
Use timeouts for long operations
Monitor memory usage
Contributing to Tests
When adding new features:
Add Unit Tests
Test new functions/classes
Cover edge cases
Mock external dependencies
Add Integration Tests
Test feature in context
Verify end-to-end workflows
Include error conditions
Update Documentation
Add test examples
Document testing procedures
Update checklists
Performance Monitoring
Add benchmarks for significant changes
Monitor memory usage
Document performance characteristics
Test Metrics and Targets
Coverage Targets
Core packages: > 90% coverage
Client/UI packages: > 80% coverage
Utilities: > 95% coverage
Performance Targets
Small dataset (< 10 FOVs): < 5 minutes
Medium dataset (10-50 FOVs): < 30 minutes
Large dataset (> 50 FOVs): < 2 hours
Quality Targets
All tests pass on CI
Zero flaky tests
Memory usage < 4GB for typical datasets
This comprehensive testing protocol ensures PyAMA remains reliable, performant, and maintainable across all its components.