# Workflow Pipeline Reference This reference document provides complete technical details of the pyama processing workflow, including algorithm specifications, data formats, and implementation details for creating plugins or reproducing the pipeline in other systems. ## Overview The pyama processing workflow processes time-lapse microscopy images through seven sequential steps to extract cell traces with quantitative features. The workflow operates on individual Fields of View (FOVs) and processes data in configurable batches for efficiency. **Processing Order:** 1. **Copying** - Extract frames from microscopy files (ND2/CZI) to Zarr format 2. **Segmentation** - Identify cell boundaries using phase contrast images (requires PC channel) 3. **Tracking** - Track cells across time points using consistent cell IDs (requires PC channel) 4. **Background Estimation** - Estimate background fluorescence using tiled interpolation (requires FL channels) 5. **Cropping** - Extract cell bounding box crops from tracked segmentation (requires PC channel) 6. **Extraction** - Extract quantitative features and generate trace CSV files (always runs if PC configured) 7. **Caching** - Generate visualization levels (/1/) for all channels after processing completes **Channel-Conditional Behavior:** - **No PC channel configured**: Segmentation, tracking, cropping, and extraction are skipped - **No FL channels configured**: Background estimation is skipped automatically - **PC channel with no features**: Extraction still outputs base fields (cell, frame, position, bbox) for tracking - **Copy-only mode**: Processing stages 2-7 are skipped when `config.params.copy_only` is True ## Input Requirements ### Microscopy Data - **Input format**: ND2 or CZI files containing time-lapse microscopy images - **Multiple FOVs**: Supported and processed in parallel - **Time-lapse**: Each FOV contains multiple time frames - **Multi-channel**: One phase contrast and one or more fluorescence channels ### Channel Configuration - **Phase Contrast (PC) Channel**: Required for segmentation. One channel specified for cell boundary detection. - **Fluorescence (FL) Channels**: Optional, one or more channels specified for feature extraction. ### Processing Context - Output directory paths - Channel configurations (`Channels` dataclass) - Processing parameters - Configuration saved to `processing_config.yaml` - FOV outputs discovered by naming convention (`fov_XXX/`) ## Workflow Steps ### Step 1: CopyingService **Purpose**: Extract raw image frames from microscopy file formats and save to Zarr format for efficient access. #### Input - Microscopy file path (ND2 or CZI) - FOV index range - Channel configuration #### Processing Algorithm For each FOV and specified channel: 1. Load frames sequentially from microscopy file 2. Create or open Zarr store: `fov_{fov:03d}/images.zarr` 3. Write channel data to `{pc|fl}_ch_{channel_id}/0/` where: - `0` = full resolution level - Data type: `uint16` for raw pixel values - Dimensions: `(T, H, W)` where `T` = number of time frames, `H,W` = image dimensions 4. Channels are stored as separate groups in the Zarr hierarchy #### Output Format - Phase contrast: `fov_{fov:03d}/images.zarr/pc_ch_{pc_id}/0/` - Fluorescence: `fov_{fov:03d}/images.zarr/fl_ch_{fl_id}/0/` #### Implementation Notes - Runs sequentially per batch to avoid I/O bottlenecks - Zarr format provides efficient chunked storage and compression - Existing channels are detected and skipped (supports resuming) - Visualization level (/1/) is generated later by CachingService ### Step 2: SegmentationService **Purpose**: Identify cell boundaries in each frame using phase contrast microscopy images using LOG-STD method. **Channel Requirements**: Requires PC channel. Skips with warning if no PC channel configured. #### Input - Phase contrast stack from Step 1: `fov_{fov}/images.zarr/pc_ch_{pc_id}/0/` - `(T, H, W)` of `uint16` #### LOG-STD Algorithm For each time frame `t`: 1. **Local Standard Deviation**: ``` local_mean = uniform_filter(image, size=window_size) local_var = uniform_filter(image**2, size=window_size) - local_mean**2 logstd = 0.5 * log(local_var) ``` 2. **Automatic Thresholding**: - Build histogram of log-STD values - Find valley threshold between background/cell modes - Binary mask: `binary = logstd > threshold` 3. **Morphological Cleanup**: ``` mask = binary_closing(binary, structure=disk(7), iterations=3) mask = remove_small_objects(mask) mask = binary_fill_holes(mask) mask = binary_opening(mask, structure=disk(7), iterations=3) ``` #### Output - Labeled segmentation: `fov_{fov}/images.zarr/seg_labeled_ch_{pc_id}/0/` - Format: `(T, H, W)` of `uint16` - `0` = background, `1-N` = cell IDs (frame-specific) - Visualization level (/1/) generated later by CachingService #### Algorithm Characteristics - Computes per-pixel local intensity variation - LOG-STD is effective for phase contrast where boundaries create local intensity changes - Parameters: window size (default from neighborhood), number of iterations (default: 3) ### Step 3: TrackingService **Purpose**: Track cells across time frames by assigning consistent cell IDs using Intersection over Union (IoU). **Channel Requirements**: Requires PC channel. Skips with warning if no PC channel configured. #### Input - Labeled segmentation from Step 2: `fov_{fov}/images.zarr/seg_labeled_ch_{pc_id}/0/` - `(T, H, W)` of `uint16` #### IoU-based Hungarian Assignment Algorithm **Per-frame Processing**: 1. **Extract Regions**: ```python from skimage.measure import regionprops_table props = regionprops_table(labeled_frame, properties=['label', 'area', 'bbox']) ``` 2. **IoU Cost Matrix**: ```python from scipy.spatial.distance import cdist # Calculate IoU for all current vs previous region pairs cost_matrix = np.zeros((n_current, n_previous)) for i, current in enumerate(current_regions): for j, prev in enumerate(prev_regions): iou = calculate_iou(current.bbox, prev.bbox) cost_matrix[i, j] = 1.0 - iou # Convert distance ``` 3. **Hungarian Assignment**: ```python from scipy.optimize import linear_sum_assignment row_ind, col_ind = linear_sum_assignment(cost_matrix) # Apply minimum IoU threshold for r, c in zip(row_ind, col_ind): if (1.0 - cost_matrix[r, c]) < min_iou: # Mark as new cell assign_new_id(r) else: # Assign previous ID assign_previous_id(r, c) ``` 4. **Cell ID Management**: - Frame 0: Assign new IDs 1, 2, 3, ... - Frame n: Matched cells inherit IDs, new cells get new IDs - Disappeared cells: terminate trace #### Output - Labeled tracking: `fov_{fov}/images.zarr/seg_tracked_ch_{pc_id}/0/` - Format: `(T, H, W)` of `uint16` - `0` = background, `1-N` = cell IDs (consistent across frames) - Visualization level (/1/) generated later by CachingService #### Implementation Details - IoU计算使用边界框近似,性能优化 - 最小`min_iou`阈值(默认0.1)过滤低质量匹配 - 匈牙利算法保证全局最优匹配 - 支持`min_size`和`max_size`过滤 ### Step 4: BackgroundEstimationService **Purpose**: Estimate background fluorescence using tiled interpolation for each frame. **Channel Requirements**: Requires FL channels. Skips automatically if no FL channels configured. #### Input - Labeled segmentation from Step 2: `fov_{fov}/images.zarr/seg_labeled_ch_{pc_id}/0/` - `(T, H, W)` of `uint16` - Raw fluorescence from Step 1: `fov_{fov}/images.zarr/fl_ch_{fl_id}/0/` - `(T, H, W)` of `uint16` #### IoU-based Hungarian Assignment Algorithm **Per-frame Processing**: 1. **Extract Regions**: ```python from skimage.measure import regionprops_table props = regionprops_table(labeled_frame, properties=['label', 'area', 'bbox']) ``` 2. **IoU Cost Matrix**: ```python from scipy.spatial.distance import cdist # Calculate IoU for all current vs previous region pairs cost_matrix = np.zeros((n_current, n_previous)) for i, current in enumerate(current_regions): for j, prev in enumerate(prev_regions): iou = calculate_iou(current.bbox, prev.bbox) cost_matrix[i, j] = 1.0 - iou # Convert distance ``` 3. **Hungarian Assignment**: ```python from scipy.optimize import linear_sum_assignment row_ind, col_ind = linear_sum_assignment(cost_matrix) # Apply minimum IoU threshold for r, c in zip(row_ind, col_ind): if (1.0 - cost_matrix[r, c]) < min_iou: # Mark as new cell assign_new_id(r) else: # Assign previous ID assign_previous_id(r, c) ``` 4. **Cell ID Management**: - Frame 0: Assign new IDs 1, 2, 3, ... - Frame n: Matched cells inherit IDs, new cells get new IDs - Disappeared cells: terminate trace #### Output - Labeled tracking: `fov_{fov}/images.zarr/seg_tracked_ch_{pc_id}/0/` - Format: `(T, H, W)` of `uint16` - `0` = background, `1-N` = cell IDs (consistent across frames) #### Implementation Details - IoU计算使用边界框近似,性能优化 - 最小`min_iou`阈值(默认0.1)过滤低质量匹配 - 匈牙利算法保证全局最优匹配 - 支持`min_size`和`max_size`过滤 #### Tiled Interpolation Algorithm For each fluorescence channel and frame `t`: 1. **Mask Foreground**: ``` dilated = binary_dilation(seg_labeled, disk(10)) masked = np.where(dilated, np.nan, fluorescence_image) ``` 2. **Tile Medians**: - Divide frame into overlapping tiles (typical: 50-100 px) - Compute median of non-NaN pixels in each tile - Handle tiles with insufficient background via interpolation 3. **Interpolate Background**: ``` from scipy.interpolate import RectBivariateSpline # Grid of tile medians x_grid, y_grid = np.meshgrid(tile_centers_x, tile_centers_y) z_grid = tile_medians # Interpolate to full resolution spline = RectBivariateSpline(x_grid.ravel(), y_grid.ravel(), z_grid.T) background = spline(flat_x_coords, flat_y_coords) ``` #### Output - Background stack: `fov_{fov}/images.zarr/fl_background_ch_{fl_id}/0/` - Format: `(T, H, W)` of `float32` - Ready for correction during extraction - Visualization level (/1/) generated later by CachingService #### Algorithm Notes - Each fluorescence channel processed independently - Background saved separately for flexible correction weights - Tile size configurable (default: 50-100 px with overlap) - Interpolation preserves spatial variation patterns ### Step 5: CroppingService **Purpose**: Extract cell bounding box crops from tracked segmentation for efficient feature extraction. **Channel Requirements**: Requires PC channel. Skips with warning if no PC channel configured. #### Input - Tracked segmentation from Step 3: `fov_{fov}/images.zarr/seg_tracked_ch_{pc_id}/0/` - `(T, H, W)` of `uint16` - Phase contrast from Step 1: `fov_{fov}/images.zarr/pc_ch_{pc_id}/0/` - `(T, H, W)` of `uint16` - Fluorescence channels (optional): `fov_{fov}/images.zarr/fl_ch_{fl_id}/0/` - `(T, H, W)` of `uint16` - Background channels (optional): `fov_{fov}/images.zarr/fl_background_ch_{fl_id}/0/` - `(T, H, W)` of `float32` #### Processing Algorithm For each tracked cell: 1. Extract bounding boxes across all frames where cell is present 2. Crop regions from all configured channels (PC + FL) 3. Apply background correction to FL crops if available 4. Store crops in per-cell structure: `fov_{fov}/cells.zarr/{channel_name}/{cell_id}/0/` #### Output Format - Cell crops: `fov_{fov}/cells.zarr/{channel_name}/{cell_id}/0/` - `(T, H, W)` of `float32` (normalized [0,1]) - Visualization level: `fov_{fov}/cells.zarr/{channel_name}/{cell_id}/1/` - `(T, H, W)` of `uint8` - Metadata: `fov_{fov}/cells.zarr/metadata/` - cell_ids, bboxes, valid_frames #### Implementation Notes - Works with PC-only data: crops PC channel only - With FL configured: crops both PC and FL channels, applies background if available - Creates cells.zarr with per-cell structure for efficient feature extraction - Essential for extraction step - provides cropped regions ### Step 6: ExtractionService **Purpose**: Extract quantitative features for each tracked cell at each time point and generate CSV traces. **Channel Requirements**: Always runs if PC channel is configured, even with empty features. Creates empty CSV if no channels configured. **Purpose**: Extract quantitative features for each tracked cell at each time point and generate CSV traces. #### Input - Cell crops from Step 5: `fov_{fov}/cells.zarr/{channel_name}/{cell_id}/0/` - `(T, H, W)` of `float32` - Metadata from Step 5: `fov_{fov}/cells.zarr/metadata/` - bboxes, valid_frames - Feature configuration list from config #### Feature Extraction Algorithm For each FOV, cell, and time point: 1. **Load Cell Crop**: ```python # Load cropped region for this cell and frame cell_crop = cells_zarr[f"{channel_name}/{cell_id}/0"][frame_idx] bbox = metadata["bboxes"][cell_idx, frame_idx] # [y0, x0, y1, x1] ``` 2. **Base Features (Always Computed)**: ```python # Base fields are always extracted, regardless of channel configs row['fov'] = fov_id row['cell'] = cell_id row['frame'] = frame_index row['good'] = metadata["valid_frames"][cell_idx, frame_idx] row['position_x'] = (bbox[1] + bbox[3]) / 2 # Centroid x row['position_y'] = (bbox[0] + bbox[2]) / 2 # Centroid y row['bbox_x0'] = bbox[1] # x0 row['bbox_y0'] = bbox[0] # y0 row['bbox_x1'] = bbox[3] # x1 row['bbox_y1'] = bbox[2] # y1 ``` 3. **Channel-Specific Features**: ```python # Phase contrast features (if configured) if pc_features: mask = cell_crop > 0 # Binary mask from crop if 'area' in pc_features: row[f'area_ch_{pc_channel}'] = np.sum(mask) if 'aspect_ratio' in pc_features: ellipse = regionprops(mask.astype(int))[0] row[f'aspect_ratio_ch_{pc_channel}'] = ellipse.major_axis_length / ellipse.minor_axis_length # Fluorescence features with background correction (if configured) if fl_features: raw_intensity = np.sum(cell_crop) if background_available: background_crop = cells_zarr[f"fl_background_ch_{fl_id}/{cell_id}/0"][frame_idx] background_intensity = np.sum(background_crop * background_weight) corrected_intensity = raw_intensity - background_intensity else: corrected_intensity = raw_intensity row[f'intensity_total_ch_{fl_id}'] = corrected_intensity ``` 4. **Background Correction**: ```python # Configurable weight from config.params.background_weight background_weight = clip(config.params.background_weight, 0.0, 1.0) # Applied during extraction from background crops corrected_intensity = raw_intensity - background_weight * background_intensity ``` #### Quality Filtering 1. **Trace Length Filter**: ```python min_frames = params.get('min_frames', 30) trace_lengths = calculate_trace_lengths(traces) filtered = traces[trace_lengths >= min_frames] ``` 2. **Border Filter**: ```python border_margin = params.get('border_margin', 50) def on_border(mask): return np.any(mask[:border_margin, :]) or \ np.any(mask[-border_margin:, :]) or \ np.any(mask[:, :border_margin]) or \ np.any(mask[:, -border_margin:]) # Remove border cells entirely filtered = filtered[~filtered['cell'].isin(border_cells)] ``` #### Output Format **Per-FOV CSV**: `fov_{fov:03d}/{basename}_fov_{fov:03d}_traces.csv` ``` fov,cell,frame,good,position_x,position_y,bbox_x0,bbox_y0,bbox_x1,bbox_y1,area_ch_0,aspect_ratio_ch_0,intensity_total_ch_1 0,0,0,True,100.5,200.3,85,165,115,235,450,1.234,1234.5 0,0,1,True,101.2,199.8,86,166,116,236,455,1.236,1356.2 ``` **Column Naming Convention**: - Base columns: fov, cell, frame, good, position_x/y, bbox_* - Feature columns: `{feature}_ch_{channel_id}` (e.g., `intensity_total_ch_1`) - Base fields are always included if PC channel is configured, even with empty features ### Step 7: CachingService **Purpose**: Generate visualization levels (/1/) for all channels after processing completes. **Channel Requirements**: Runs for all channels that have /0/ level data. #### Input - All channels in `fov_{fov}/images.zarr/` with /0/ level - All channels in `fov_{fov}/cells.zarr/` with /0/ level #### Processing Algorithm For each channel: 1. Check if /0/ exists and /1/ is missing 2. Read /0/ data: `(T, H, W)` array 3. Normalize to uint8: - Intensity channels: Percentile normalization (1st-99th percentile) across entire stack - Segmentation channels: Scale proportionally to [0, 255] based on max label 4. Downsample by 2x using `generate_half_resolution()` 5. Write to /1/ level: `(T, H/2, W/2)` of `uint8` #### Output Format - Visualization level: `{channel_name}/1/` - `(T, H/2, W/2)` of `uint8` - Compression: LZ4 for fast access - Chunks: `(1, min(256, H/2), min(256, W/2))` #### Implementation Notes - Unifies caching logic that was previously scattered across services - Runs after all processing completes for efficiency - Normalization ensures consistent scaling across all frames - Non-critical: Workflow continues even if caching fails ## Output Structure ``` output_dir/ ├── processing_config.yaml # Metadata, channels, parameters ├── fov_000/ │ ├── images.zarr/ # Image data (all channels) │ │ ├── pc_ch_0/ │ │ │ ├── 0/ # Full resolution (T, H, W) uint16 │ │ │ └── 1/ # Visualization level (T, H/2, W/2) uint8 │ │ ├── fl_ch_1/ │ │ │ ├── 0/ # Full resolution (T, H, W) uint16 │ │ │ └── 1/ # Visualization level (T, H/2, W/2) uint8 │ │ ├── seg_labeled_ch_0/ │ │ │ ├── 0/ # Labeled segmentation (T, H, W) uint16 │ │ │ └── 1/ # Visualization level (T, H/2, W/2) uint8 │ │ ├── seg_tracked_ch_0/ │ │ │ ├── 0/ # Tracked segmentation (T, H, W) uint16 │ │ │ └── 1/ # Visualization level (T, H/2, W/2) uint8 │ │ └── fl_background_ch_1/ │ │ ├── 0/ # Background estimate (T, H, W) float32 │ │ └── 1/ # Visualization level (T, H/2, W/2) uint8 │ ├── cells.zarr/ # Per-cell crops │ │ ├── metadata/ │ │ │ ├── cell_ids # (N,) int32 │ │ │ ├── bboxes # (N, T, 4) int32 [y0, x0, y1, x1] │ │ │ └── valid_frames # (N, T) bool │ │ ├── pc_ch_0/ │ │ │ ├── {cell_id}/ │ │ │ │ ├── 0/ # Normalized crop (T, H, W) float32 [0,1] │ │ │ │ └── 1/ # Visualization level (T, H/2, W/2) uint8 │ │ └── fl_ch_1/ │ │ └── {cell_id}/ │ │ ├── 0/ # Normalized crop (T, H, W) float32 [0,1] │ │ └── 1/ # Visualization level (T, H/2, W/2) uint8 │ └── basename_fov_000_traces.csv # Combined feature traces ├── fov_001/ │ └── ... ``` ## Batch Processing Implementation ### Thread Pool Executor Pattern ```python from concurrent.futures import ThreadPoolExecutor, as_completed from pyama.processing.workflow.run import run_single_worker def run_batch(fov_batch, config, n_workers, metadata, output_dir, cancel_event): """Process a batch of FOVs in parallel.""" # Sequential copying (I/O bound) copy_service = CopyingService() copy_service.process_all_fovs( metadata=metadata, config=config, output_dir=output_dir, fov_start=fov_batch[0], fov_end=fov_batch[-1], cancel_event=cancel_event, ) # Copy-only mode: skip processing stages if config.params.copy_only: return # Parallel processing (CPU bound) worker_ranges = _split_worker_ranges(fov_batch, n_workers) with ThreadPoolExecutor(max_workers=n_workers) as executor: futures = { executor.submit( run_single_worker, fov_range, metadata, config, output_dir, cancel_event, ): fov_range for fov_range in worker_ranges } # Wait for completion with progress tracking for future in as_completed(futures): fov_range, successful, failed, message = future.result() update_progress(successful, failed, message) # Generate visualization cache after all processing if overall_success: caching_service = CachingService() caching_service.process_all_fovs( metadata=metadata, config=config, output_dir=output_dir, fov_start=fov_batch[0], fov_end=fov_batch[-1], cancel_event=cancel_event, ) ``` ### Memory Management ```python def run_single_worker(fovs, metadata, config, output_dir, cancel_event): """Worker function for parallel FOV processing.""" try: # Initialize services segmentation = SegmentationService(method=config.params.segmentation_method) tracking = TrackingService(method=config.params.tracking_method) background_estimation = BackgroundEstimationService() cropping = CroppingService() extraction = ExtractionService() # Process each service sequentially for FOV range segmentation.process_all_fovs(metadata, config, output_dir, fovs[0], fovs[-1], cancel_event) if cancel_event and cancel_event.is_set(): return (fovs, 0, len(fovs), "Cancelled") tracking.process_all_fovs(metadata, config, output_dir, fovs[0], fovs[-1], cancel_event) if cancel_event and cancel_event.is_set(): return (fovs, 0, len(fovs), "Cancelled") background_estimation.process_all_fovs(metadata, config, output_dir, fovs[0], fovs[-1], cancel_event) if cancel_event and cancel_event.is_set(): return (fovs, 0, len(fovs), "Cancelled") cropping.process_all_fovs(metadata, config, output_dir, fovs[0], fovs[-1], cancel_event) if cancel_event and cancel_event.is_set(): return (fovs, 0, len(fovs), "Cancelled") extraction.process_all_fovs(metadata, config, output_dir, fovs[0], fovs[-1], cancel_event) return (fovs, len(fovs), 0, "Completed") except Exception as e: logger.error(f"Error processing FOVs {fovs[0]}-{fovs[-1]}: {e}") return (fovs, 0, len(fovs), str(e)) ``` ### Cancellation Support Cancellation is handled via `threading.Event` within the task runner: ```python # Usage in workflow if cancel_event.is_set(): logger.info("Cancellation requested, cleaning up") cleanup_partial_results() return False ``` ## Data Type Specifications ### Image Arrays (Zarr Format) | Stage | Zarr Path | Data Type | Dimensions | Notes | |-------|-----------|-----------|------------|-------| | Raw Images | `images.zarr/{pc\|fl}_ch_{id}/0/` | uint16 | (T, H, W) | Chunked Zarr arrays | | Segmentation | `images.zarr/seg_labeled_ch_{id}/0/` | uint16 | (T, H, W) | Labeled mask (untracked) | | Tracking | `images.zarr/seg_tracked_ch_{id}/0/` | uint16 | (T, H, W) | Cell IDs, 0=background | | Background | `images.zarr/fl_background_ch_{id}/0/` | float32 | (T, H, W) | Estimate per channel | | Visualization | `images.zarr/{channel}/1/` | uint8 | (T, H/2, W/2) | Downsampled for display | | Cell Crops | `cells.zarr/{channel}/{cell_id}/0/` | float32 | (T, H, W) | Normalized [0,1] | | Cell Viz | `cells.zarr/{channel}/{cell_id}/1/` | uint8 | (T, H/2, W/2) | Downsampled for display | ### CSV Schemas **Processing Traces** (per-FOV): - All columns prefixed by channel ID - Frame-based, time computed after loading - Includes quality flag (`good` column) **Merged Traces** (per-sample): - Same format as processing traces - Multiple FOVs combined - Includes sample metadata in headers **Fitted Results** (post-analysis): - One row per cell - Includes model type, R², parameters - Additional columns per model parameters ## Algorithm Parameters ### Segmentation Parameters ```python segmentation_params = { 'logstd_window_size': 3, # Neighborhood for std computation 'morph_size': 7, # Structuring element size 'morph_iterations': 3, # Number of opening/closing iterations 'min_object_size': 50, # Minimum cell size in pixels 'max_object_size': 10000, # Maximum cell size in pixels } ``` ### Tracking Parameters ```python tracking_params = { 'min_iou': 0.1, # Minimum IoU for cell matching 'min_frames': 30, # Minimum trace length 'border_margin': 50, # Exclusion margin (pixels) } ``` ### Extraction Parameters ```python extraction_params = { 'background_weight': 1.0, # Background correction weight [0-1] 'frame_interval': 10.0, # Minutes per frame (default) 'time_mapping': None, # Custom frame->time mapping (dict) 'features': { 'phase': ['area', 'aspect_ratio'], 'fluorescence': ['intensity_total', 'intensity_mean'] } } ``` ## Performance Characteristics ### Memory Usage | Dataset Size | Approximate RAM Usage | Notes | |--------------|----------------------|-------| | 10 FOVs, 50 frames | 1-2 GB | Single workstation | | 100 FOVs, 180 frames | 8-12 GB | Requires 16GB+ RAM | | 500+ FOVs | 32GB+ | Consider distributed processing | ### Processing Speed | Operation | Speed (per FOV) | Parallel Scaling | |-----------|------------------|------------------| | Copying (sequential) | 2-5 sec | No parallelization | | Segmentation | 10-30 sec | 4-8 threads optimal | | Tracking | 5-15 sec | Linear up to CPU count | | Background Estimation | 5-20 sec | CPU-bound, parallel | | Cropping | 2-10 sec | CPU-bound, parallel | | Extraction | 2-8 sec | CPU-bound, parallel | | Caching | 1-5 sec | Sequential (post-processing) | ### Optimization Strategies 1. **Memory Mapping**: Use `mmap_mode='r'` for large arrays 2. **Batch Size**: Tune based on RAM availability 3. **Worker Count**: Match to CPU cores (typically 4-8) 4. **SSD Storage**: Improves I/O for large datasets ## Extension Points ### Custom Features ```python def extract_custom_feature(image, mask, context): """User-defined feature extraction.""" # Implement custom logic return feature_value # Register in feature system PHASE_FEATURES['custom_feature'] = extract_custom_feature ``` ### Alternative Algorithms Replace core algorithms while maintaining interface: - Segmentation: watershed, deep learning - Tracking: Kalman filter, graph-based - Feature extraction: custom metrics ### Integration Hooks ```python class CustomPreprocessor: """Pre-process frames before segmentation.""" def process(self, image): # Custom preprocessing return processed_image # Inject into workflow workflow.register_preprocessor(CustomPreprocessor()) ``` ## Implementation Guidelines ### Plugin Development 1. **Follow Interface Contracts**: Maintain input/output shapes 2. **Handle Errors Gracefully**: Return status codes, not exceptions 3. **Consider Performance**: Use vectorized operations where possible 4. **Document Parameters**: Include bounds, defaults, units 5. **Provide Tests**: Visual verification for image-based operations ### Quality Assurance 1. **Deterministic RNG**: Use fixed seeds for reproducibility 2. **Parameter Validation**: Check bounds before processing 3. **Progress Reporting**: Provide meaningful status updates 4. **Cleanup on Failure**: Preserve partial results for debugging 5. **Logging**: Include sufficient diagnostic information This reference provides complete technical specifications for implementing, extending, or reproducing the PyAMA processing pipeline in any environment.