========================================= Pipeline Architecture: From Lab to Field ========================================= The package provides **two distinct processing pipelines** designed for different stages of the research workflow. Start with post-facto analysis in the lab, then deploy adaptive sensor fusion in the field. Post-Facto Pipeline =================== **Lab Analysis & Validation** The post-facto pipeline is designed for batch analysis where the complete dataset is available. This is the recommended starting point for new deployments. Purpose ------- - Batch analysis of recovered biologger data - Algorithm validation and scientific publications - R-compatibility for historical comparisons - Establishing calibration parameters for future adaptive processing Characteristics --------------- - **Memory**: Full dataset loaded into memory - **Processing**: 8-stage pipeline with non-causal algorithms - **Accuracy**: Highest accuracy through full-dataset calibration Calibration Modes ----------------- ``batch_compute`` Two-pass processing that computes calibrations from the full dataset. First pass collects sensor data, second pass applies computed calibrations. ``fixed`` Single-pass processing with locked parameters from prior analysis. Use when calibration values are already known from previous runs. R Compatibility --------------- The ``batch_compute`` mode replicates the gRumble R package's algorithms: - Perfect numerical agreement: <0.1° error on orientation vs. R reference - Two-pass processing matches R's ``colMeans()`` and ``MagOffset()`` functions - Scientific reproducibility for historical comparisons and validation studies Adaptive Sensor Fusion Pipeline =============================== **Real-Time Deployment** The adaptive sensor fusion pipeline is designed for real-time processing where data arrives sample-by-sample. It uses fully causal algorithms with no lookahead. Purpose ------- - On-tag processing for next-generation PSAT deployments with SD card logging - Embedded systems and edge computing (resource-constrained environments) - Digital twin testing using historical data - Foundation for future satellite uplink and store-and-forward architectures Characteristics --------------- - **Memory**: Fixed memory footprint, O(1) space complexity - **Processing**: Fully causal 8-stage pipeline (no lookahead) - **Latency**: Immediate output per sample Algorithms ---------- Current implementations: - Variance-based attachment angle calibration with convergence detection - Hard-iron magnetometer calibration via sphere-fitting - Tilt-compensated heading calculation - Multi-scale activity-adaptive depth smoothing In progress: - Madgwick filter (quaternion-based orientation) - Kalman filter (optimal state estimation) Calibration Modes ----------------- ``progressive`` Online EMA-based calibration that adapts during deployment. Converges within first 2-3 minutes of data collection. Default mode for adaptive processing. ``fixed`` Use pre-computed calibration parameters from post-facto analysis. Fastest processing with no calibration overhead. Digital Twin Workflow ===================== Run the adaptive sensor fusion pipeline on historical CSV files to validate real-time algorithms before field deployment: .. code-block:: bash # Validate adaptive calibration against post-facto batch results python -m biologger_pseudotrack --config data/config_streaming_Rcompat.yml This workflow enables: - Testing adaptive algorithms without deploying to physical hardware - Comparing streaming vs batch processing results - Tuning calibration parameters for specific species/deployments - Validating that adaptive processing converges to batch-computed values Pipeline Comparison =================== .. list-table:: :header-rows: 1 :widths: 30 35 35 * - Aspect - Post-Facto - Adaptive Sensor Fusion * - **When to Use** - Lab analysis, publications - Field deployment, real-time * - **Data Availability** - Complete dataset - Sample-by-sample * - **Memory** - Full dataset in memory - O(1) fixed footprint * - **Calibration** - batch_compute or fixed - progressive or fixed * - **Accuracy** - Highest (non-causal) - Near-equivalent (causal) * - **R Compatibility** - Exact match available - Approximate match Recommended Workflow ==================== 1. **Recover tag data** from deployment 2. **Run post-facto pipeline** with ``batch_compute`` mode 3. **Validate results** against R reference implementation (if applicable) 4. **Extract calibration parameters** for the deployment 5. **Configure adaptive pipeline** with ``fixed`` mode using extracted parameters 6. **Test adaptive pipeline** on same data (digital twin validation) 7. **Deploy adaptive pipeline** to field systems with confidence Roadmap ======= Data Offloading Architectures ----------------------------- Future versions will support multiple data retrieval pathways: - **Marine store-and-forward nodes**: Underwater acoustic modems for opportunistic data transfer when tagged animals pass near fixed receivers - **Satellite uplink**: Direct-to-satellite transmission for priority metrics (behavioral state summaries, position estimates) - **Hybrid architectures**: Onboard filtering to prioritize high-value data for bandwidth-constrained uplinks GPU Acceleration & Parallelization ---------------------------------- Planned compute enhancements for high-fidelity real-time processing: - **GPU-accelerated sensor fusion**: CUDA/OpenCL backends for matrix operations in Kalman and particle filters - **Online Monte Carlo methods**: Sequential Monte Carlo (particle filters) for probabilistic state estimation with GPU-parallelized particle propagation - **Ensemble dead-reckoning**: Parallel trajectory hypotheses with real-time pruning based on constraint satisfaction Companion Processor Architecture -------------------------------- Deployment architectures pairing minimal on-tag compute with external processing: **On-Animal Companion ("Backpack")** For larger marine animals where a secondary processor package is feasible: - **On-tag**: Lightweight sensor acquisition, basic filtering, data buffering - **Companion processor**: Full sensor fusion, INS/dead-reckoning, behavioral classification - **Benefit**: Real-time 3D trajectory estimation without on-tag compute constraints **On-Ship Simulation** Ship-based processing driven by sparse tag updates received via store-and-forward nodes or satellite surface pings: - **Pre-deployment**: Simulate expected trajectories and behavioral patterns to validate tag configuration - **During deployment**: Real-time trajectory reconstruction from periodic position fixes and behavioral summaries - **Post-deployment**: Rapid preliminary analysis before tag recovery, informing retrieval strategy - **Data sources**: Acoustic modem check-ins, satellite surface transmissions, opportunistic proximity detections AI-Assisted Analysis -------------------- Domain-specialized AI agents for post-facto analysis: - **Custom knowledge agents**: LLM agents trained on marine biologging literature, species-specific behavioral ontologies, and sensor fusion methodology - **Automated interpretation**: Natural language querying of processed datasets ("When did the animal exhibit foraging behavior at depth?") - **Analysis workflows**: AI-assisted hypothesis generation, anomaly detection, and quality control - **Integration**: Built on `kanoa `_ framework for domain-specific agent development Immersive Visualization ----------------------- Post-facto analysis enhancements: - **3D trajectory rendering**: Real-time visualization of reconstructed animal paths with behavioral state overlay - **Unity integration**: Export pseudotracks to Unity for immersive VR/AR analysis of diving behavior - **Simulation playback**: Replay sensor fusion pipeline with adjustable parameters for algorithm tuning