# Testing Protocol

This document provides step-by-step testing protocols for validating PyAMA functionality across all packages and components.

## Overview

PyAMA testing follows a multi-layered approach:
- **Unit Tests**: Individual function and class testing
- **Integration Tests**: Component interaction testing  
- **Visual Tests**: Algorithm verification with visual output
- **Client Tests**: API/UI integration testing for pyama-react
- **Performance Tests**: Load and stress testing

## Test Organization

Tests are organized within each package directory. Each package has its own `tests/` subdirectory. The main test suite is currently in `pyama/tests/`:

```
{package}/tests/  (e.g., pyama/tests/)
├── _plots/                    # Generated visual test outputs
├── _results/                  # Test artifacts (gitignored)
├── __init__.py
├── conftest.py                # Pytest configuration
├── analysis/                  # Analysis model tests
│   ├── test_event.py
│   └── test_kinetic.py
├── features/                  # Feature extraction tests
│   ├── __init__.py
│   ├── test_area.py
│   ├── test_intensity_total.py
│   └── test_particle_num.py
├── processing/                # Processing workflow tests
│   ├── test_merge.py
│   ├── test_normalization.py
│   ├── test_seg.py
│   └── test_track.py
└── utils/                     # Utility function tests
    ├── __init__.py
    └── progress.py
```

## Running Tests

### All Tests

```bash
# Run complete test suite
uv run pytest

# Run with coverage
uv run pytest --cov=pyama --cov-report=html
```

### Specific Categories

```bash
# Unit tests only
uv run pytest tests/unit/

# Integration tests
uv run pytest tests/integration/

# Visual tests (require manual inspection)
uv run pytest pyama/tests/processing/test_seg.py -v

# Performance tests
uv run pytest tests/performance/
```

## Visual Testing Guidelines

### Plot Generation

All visual tests save plots to `{package}/tests/_plots/` (e.g., `pyama/tests/_plots/` for core tests):

```python
def test_segmentation_round_cells():
    """Test segmentation on round synthetic cells."""
    # Generate synthetic data
    np.random.seed(42)  # Deterministic RNG
    
    # Create test image
    image = generate_round_cells(n_cells=10, noise_level=0.1)
    
    # Run segmentation
    mask = segment_cells(image, method="logstd")
    
    # Visualize results
    fig, axes = plt.subplots(1, 2, figsize=(10, 5))
    axes[0].imshow(image, cmap='gray')
    axes[0].set_title('Original Image')
    
    axes[1].imshow(mask, cmap='binary')
    axes[1].set_title('Segmentation Result')
    
    # Add boundaries
    for i, prop in enumerate(regionprops(mask)):
        if prop.area > 50:  # Filter small objects
            y0, x0, y1, x1 = prop.bbox
            rect = Rectangle((x0, y0), x1-x0, y1-y0, 
                           fill=False, edgecolor='red', linewidth=2)
            axes[1].add_patch(rect)
    
    # Save plot (adjust package name as needed)
    plot_dir = os.getenv('PYAMA_PLOT_DIR', 'pyama/tests/_plots')
    os.makedirs(plot_dir, exist_ok=True)
    plt.savefig(f'{plot_dir}/segmentation_round_cells.pdf', dpi=150)
    plt.close(fig)  # important: close figure
    
    # Assertions
    n_cells = len([r for r in regionprops(mask) if r.area > 50])
    assert n_cells == 10, f"Expected 10 cells, got {n_cells}"
```

### Test Requirements

From `AGENTS.md` protocol rules:

1. **Essential Tests Only**
   - Event detection: noisy step up/down with event lines
   - Particle counting: many Gaussian particles with bounding boxes

2. **Output Location**
   - Always save to `{package}/tests/_plots/` (e.g., `pyama/tests/_plots/` for core tests)
   - Override with `PYAMA_PLOT_DIR` environment variable

3. **Deterministic RNG**
   ```python
   np.random.seed(42)  # Or any fixed seed
   ```

4. **Robust Assertions**
   ```python
   # Good: Count matches expected
   assert len(detected_cells) >= expected_min
   
   # Bad: Tight numerical tolerance
   assert abs(mean_intensity, 2.345, 0.001)  # Too strict
   ```

5. **No OS-Specific Paths**
   ```python
   # Bad: Linux temp
   tempfile.mktemp()  # Don't use
   
   # Good: Current directory
   Path("test_output").mkdir(exist_ok=True)
   ```

## PyAMA-React Client Testing Protocol

The desktop client should be tested end-to-end against a running `pyama` API server.

### Manual Testing Checklist

1. **Connection and startup**
   - [ ] Client starts successfully via `bun run dev`
   - [ ] Backend connection indicator reflects API availability
2. **Processing workflow**
   - [ ] ND2/CZI file selection works
   - [ ] Channel/feature config can be edited
   - [ ] Task creation succeeds and progress updates stream correctly
3. **Task lifecycle**
   - [ ] Task list refreshes status (`pending/running/succeeded/failed`)
   - [ ] Cancel operation works and updates UI state
4. **Error handling**
   - [ ] Network/API failures produce actionable UI messages
   - [ ] Invalid input is blocked with clear validation feedback

## Integration Testing

### Workflow Integration

```python
# tests/integration/test_complete_workflow.py
def test_end_to_end_workflow():
    """Test complete workflow from ND2 to fitted results."""
    
    # Setup
    with TemporaryDirectory() as tmpdir:
        output_dir = Path(tmpdir) / "output"
        merged_dir = Path(tmpdir) / "merged"
        
        # Step 1: Process ND2
        config = create_test_config(output_dir)
        success = run_complete_workflow(
            metadata=test_metadata,
            config=config,
            fov_start=0,
            fov_end=4,  # Small batch
            batch_size=2,
            n_workers=2
        )
        assert success, "Processing workflow failed"
        
        # Step 2: Verify outputs
        fov_dirs = list(output_dir.glob("fov_*"))
        assert len(fov_dirs) == 5
        
        for fov_dir in fov_dirs:
            trace_file = fov_dir / "test_fov_*_traces.csv"
            assert trace_file.exists(), f"Missing trace in {fov_dir}"
            
            # Verify CSV structure
            df = pd.read_csv(trace_file)
            required_columns = ["fov", "cell", "frame", "good"]
            for col in required_columns:
                assert col in df.columns, f"Missing column: {col}"
        
        # Step 3: Merge results
        sample_yaml = output_dir / "samples.yaml"
        create_sample_file(sample_yaml)
        
        run_merge(sample_yaml, output_dir, merged_dir)
        
        # Verify merged files
        merged_files = list(merged_dir.glob("*_merged.csv"))
        assert len(merged_files) > 0
        
        # Step 4: Analyze results
        for merged_file in merged_files:
            df = pd.read_csv(merged_file)
            model = get_model("maturation")
            result = fit_model(model, df['time'], df['value'])
            assert result['success'], f"Fitting failed for {merged_file}"
            assert result['r_squared'] > 0.5, "Poor fit quality"
```

### API Integration

```python
# tests/integration/test_api_workflow.py
def test_api_complete_workflow():
    """Test API workflow endpoints."""
    
    client = TestClient(app)
    
    # Step 1: Load metadata
    response = client.post(
        "/api/v1/processing/load-metadata",
        json={"file_path": TEST_ND2_PATH}
    )
    assert response.status_code == 200
    metadata = response.json()["data"]
    
    # Step 2: Start workflow
    config = {
        "microscopy_path": TEST_ND2_PATH,
        "output_dir": TEST_OUTPUT_DIR,
        "channels": {
            "phase": {"channel": 0, "features": ["area"]},
            "fluorescence": []
        },
        "parameters": {"fov_start": 0, "fov_end": 4}
    }
    
    response = client.post("/api/v1/processing/workflow/start", json=config)
    assert response.status_code == 200
    job_id = response.json()["data"]["job_id"]
    
    # Step 3: Monitor completion
    for _ in range(60):  # 60 second timeout
        response = client.get(f"/api/v1/processing/workflow/status/{job_id}")
        status = response.json()["data"]["status"]
        
        if status == "completed":
            break
        elif status == "failed":
            pytest.fail("Workflow failed")
        
        time.sleep(1)
    
    # Step 4: Get results
    response = client.get(f"/api/v1/processing/workflow/results/{job_id}")
    assert response.status_code == 200
    
    results = response.json()["data"]
    assert len(results["traces"]) == 5  # FOVs 0-4
```

## Performance Testing

### Memory Usage

```python
# tests/performance/test_memory.py
def test_memory_usage_large_dataset():
    """Test memory usage with large datasets."""
    import psutil
    import os
    
    process = psutil.Process(os.getpid())
    initial_memory = process.memory_info().rss / 1024 / 1024  # MB
    
    # Process large dataset
    config = create_large_dataset_config(n_fovs=50, n_frames=100)
    
    with memory_monitor() as memory_log:
        success = run_complete_workflow(
            metadata=large_metadata,
            config=config,
            batch_size=2,
            n_workers=4
        )
    
    peak_memory = max(memory_log)
    memory_increase = peak_memory - initial_memory
    
    # Should not exceed reasonable limits
    assert memory_increase < 4096, f"Memory usage too high: {memory_increase} MB"
    assert success, "Large dataset processing failed"
```

### Processing Speed

```python
# tests/performance/test_speed.py
def test_processing_speed():
    """Benchmark processing speed."""
    import time
    
    sizes = [(10, 50), (20, 100), (50, 200)]  # (FOVs, frames)
    speed_results = []
    
    for n_fovs, n_frames in sizes:
        start_time = time.time()
        
        run_complete_workflow(
            metadata=create_test_metadata(n_fovs=n_fovs, n_frames=n_frames),
            config=test_config,
            n_workers=4
        )
        
        elapsed = time.time() - start_time
        cells_per_second = (n_fovs * n_frames * AVG_CELLS_PER_FOV) / elapsed
        
        speed_results.append((n_fovs, n_frames, cells_per_second))
        print(f"{n_fovs}x{n_frames}: {cells_per_second:.1f} cells/sec")
    
    # Verify scaling is reasonable
    assert speed_results[2][2] > speed_results[0][2], "No speed improvement with larger batches"
```

## Data Validation

### Synthetic Data Generation

```python
# tests/utils/test_data.py
class SyntheticDataGenerator:
    """Generate test microscopy data with known properties."""
    
    @staticmethod
    def create_cell_tracks(n_cells: int, n_frames: int):
        """Create synthetic cell trajectories."""
        tracks = []
        
        for cell_id in range(n_cells):
            # Random walk with drift
            x = np.cumsum(np.random.randn(n_frames) * 0.5)
            y = np.cumsum(np.random.randn(n_frames) * 0.5)
            
            # Add linear drift
            x += np.linspace(0, 10, n_frames)
            y += np.linspace(0, 5, n_frames)
            
            tracks.append({
                'cell_id': cell_id,
                'positions': np.column_stack([x, y])
            })
        
        return tracks
    
    @staticmethod
    def create_fluorescence_trace():
        """Create synthetic fluorescence with maturation kinetics."""
        t = np.linspace(0, 30, 180)  # 30 hours, 180 points
        
        # Maturation model: f(t) = A * (1 - exp(-kt)) + B
        A = 2.0  # Amplitude
        k = 0.1  # Rate constant
        B = 0.5  # Baseline
        
        # Add noise
        signal = A * (1 - np.exp(-k * t)) + B
        noise = np.random.randn(len(t)) * 0.1
        
        return t, signal + noise
```

### CSV Validation

```python
# tests/validation/test_csv.py
def validate_trace_csv(filepath: Path) -> bool:
    """Validate trace CSV format and content."""
    try:
        df = pd.read_csv(filepath)
        
        # Required columns
        required = ['fov', 'cell', 'frame', 'good']
        for col in required:
            if col not in df.columns:
                return False
        
        # Data types
        assert df['fov'].dtype in [int, 'int64']
        assert df['cell'].dtype in [int, 'int64']
        assert df['frame'].dtype in [int, 'int64']
        assert df['good'].dtype == bool
        
        # Value ranges
        assert df['frame'].min() >= 0
        assert df['cell'].min() >= 1
        assert len(df) > 0
        
        return True
        
    except Exception:
        return False
```

## Continuous Integration

### GitHub Actions Workflow

```yaml
# .github/workflows/test.yml
name: Test Suite

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ['3.11', '3.12']
    
    steps:
    - uses: actions/checkout@v3
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python-version }}
    
    - name: Install UV
      run: pip install uv
    
    - name: Install dependencies
      run: uv sync --all-extras
    
    - name: Run tests
      run: uv run pytest --cov=pyama --cov-report=xml
    
    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        file: ./coverage.xml
```

### Test Categories in CI

1. **Unit Tests**: Fast, run on all PRs
2. **Integration Tests**: Medium speed, run on PRs
3. **Performance Tests**: Slow, run on main branch
4. **Visual Tests**: Manual verification, artifacts saved

## Test Data Management

### Synthetic Test Data

- Small datasets checked into repository
- Large datasets generated on-the-fly
- Deterministic seed for reproducibility

### Real Test Data

- Anonymized experimental data
- Stored in separate repository
- Accessed via Git LFS or download server

### Test Artifacts

- Plots saved to `_plots/`
- Test reports in HTML format
- Performance benchmarks as JSON

## Debugging Tests

### Debug Mode

```bash
# Run single test with debugging
uv run pytest pyama/tests/processing/test_merge.py -v -s --pdb

# Enable debug logging
PYAMA_LOG_LEVEL=DEBUG uv run pytest
```

### Test Output

```python
# In test files
import logging

logger = logging.getLogger(__name__)

def test_something():
    logger.info("Starting test")
    # ... test code ...
    logger.debug(f"Intermediate result: {result}")
```

### Common Issues

1. **Flaky Tests**
   - Use deterministic seeds
   - Add retry logic for network calls
   - Increase timeouts

2. **Environment Specific**
   - Use temp directories
   - Avoid hardcoded paths
   - Test on multiple platforms

3. **Resource Exhaustion**
   - Clean up resources in tearDown
   - Use timeouts for long operations
   - Monitor memory usage

## Contributing to Tests

When adding new features:

1. **Add Unit Tests**
   - Test new functions/classes
   - Cover edge cases
   - Mock external dependencies

2. **Add Integration Tests**
   - Test feature in context
   - Verify end-to-end workflows
   - Include error conditions

3. **Update Documentation**
   - Add test examples
   - Document testing procedures
   - Update checklists

4. **Performance Monitoring**
   - Add benchmarks for significant changes
   - Monitor memory usage
   - Document performance characteristics

## Test Metrics and Targets

### Coverage Targets
- Core packages: > 90% coverage
- Client/UI packages: > 80% coverage  
- Utilities: > 95% coverage

### Performance Targets
- Small dataset (< 10 FOVs): < 5 minutes
- Medium dataset (10-50 FOVs): < 30 minutes
- Large dataset (> 50 FOVs): < 2 hours

### Quality Targets
- All tests pass on CI
- Zero flaky tests
- Memory usage < 4GB for typical datasets

This comprehensive testing protocol ensures PyAMA remains reliable, performant, and maintainable across all its components.