=========================================
Pipeline Architecture: From Lab to Field
=========================================

The package provides **two distinct processing pipelines** designed for different stages of the research workflow. Start with post-facto analysis in the lab, then deploy adaptive sensor fusion in the field.

Post-Facto Pipeline
===================

**Lab Analysis & Validation**

The post-facto pipeline is designed for batch analysis where the complete dataset is available. This is the recommended starting point for new deployments.

Purpose
-------

- Batch analysis of recovered biologger data
- Algorithm validation and scientific publications
- R-compatibility for historical comparisons
- Establishing calibration parameters for future adaptive processing

Characteristics
---------------

- **Memory**: Full dataset loaded into memory
- **Processing**: 8-stage pipeline with non-causal algorithms
- **Accuracy**: Highest accuracy through full-dataset calibration

Calibration Modes
-----------------

``batch_compute``
   Two-pass processing that computes calibrations from the full dataset.
   First pass collects sensor data, second pass applies computed calibrations.

``fixed``
   Single-pass processing with locked parameters from prior analysis.
   Use when calibration values are already known from previous runs.

R Compatibility
---------------

The ``batch_compute`` mode replicates the gRumble R package's algorithms:

- Perfect numerical agreement: <0.1° error on orientation vs. R reference
- Two-pass processing matches R's ``colMeans()`` and ``MagOffset()`` functions
- Scientific reproducibility for historical comparisons and validation studies

Adaptive Sensor Fusion Pipeline
===============================

**Real-Time Deployment**

The adaptive sensor fusion pipeline is designed for real-time processing where data arrives sample-by-sample. It uses fully causal algorithms with no lookahead.

Purpose
-------

- On-tag processing for next-generation PSAT deployments with SD card logging
- Embedded systems and edge computing (resource-constrained environments)
- Digital twin testing using historical data
- Foundation for future satellite uplink and store-and-forward architectures

Characteristics
---------------

- **Memory**: Fixed memory footprint, O(1) space complexity
- **Processing**: Fully causal 8-stage pipeline (no lookahead)
- **Latency**: Immediate output per sample

Algorithms
----------

Current implementations:

- Variance-based attachment angle calibration with convergence detection
- Hard-iron magnetometer calibration via sphere-fitting
- Tilt-compensated heading calculation
- Multi-scale activity-adaptive depth smoothing

In progress:

- Madgwick filter (quaternion-based orientation)
- Kalman filter (optimal state estimation)

Calibration Modes
-----------------

``progressive``
   Online EMA-based calibration that adapts during deployment.
   Converges within first 2-3 minutes of data collection.
   Default mode for adaptive processing.

``fixed``
   Use pre-computed calibration parameters from post-facto analysis.
   Fastest processing with no calibration overhead.

Digital Twin Workflow
=====================

Run the adaptive sensor fusion pipeline on historical CSV files to validate real-time algorithms before field deployment:

.. code-block:: bash

   # Validate adaptive calibration against post-facto batch results
   python -m biologger_pseudotrack --config data/config_streaming_Rcompat.yml

This workflow enables:

- Testing adaptive algorithms without deploying to physical hardware
- Comparing streaming vs batch processing results
- Tuning calibration parameters for specific species/deployments
- Validating that adaptive processing converges to batch-computed values

Pipeline Comparison
===================

.. list-table::
   :header-rows: 1
   :widths: 30 35 35

   * - Aspect
     - Post-Facto
     - Adaptive Sensor Fusion
   * - **When to Use**
     - Lab analysis, publications
     - Field deployment, real-time
   * - **Data Availability**
     - Complete dataset
     - Sample-by-sample
   * - **Memory**
     - Full dataset in memory
     - O(1) fixed footprint
   * - **Calibration**
     - batch_compute or fixed
     - progressive or fixed
   * - **Accuracy**
     - Highest (non-causal)
     - Near-equivalent (causal)
   * - **R Compatibility**
     - Exact match available
     - Approximate match

Recommended Workflow
====================

1. **Recover tag data** from deployment
2. **Run post-facto pipeline** with ``batch_compute`` mode
3. **Validate results** against R reference implementation (if applicable)
4. **Extract calibration parameters** for the deployment
5. **Configure adaptive pipeline** with ``fixed`` mode using extracted parameters
6. **Test adaptive pipeline** on same data (digital twin validation)
7. **Deploy adaptive pipeline** to field systems with confidence

Roadmap
=======

Data Offloading Architectures
-----------------------------

Future versions will support multiple data retrieval pathways:

- **Marine store-and-forward nodes**: Underwater acoustic modems for opportunistic data transfer when tagged animals pass near fixed receivers
- **Satellite uplink**: Direct-to-satellite transmission for priority metrics (behavioral state summaries, position estimates)
- **Hybrid architectures**: Onboard filtering to prioritize high-value data for bandwidth-constrained uplinks

GPU Acceleration & Parallelization
----------------------------------

Planned compute enhancements for high-fidelity real-time processing:

- **GPU-accelerated sensor fusion**: CUDA/OpenCL backends for matrix operations in Kalman and particle filters
- **Online Monte Carlo methods**: Sequential Monte Carlo (particle filters) for probabilistic state estimation with GPU-parallelized particle propagation
- **Ensemble dead-reckoning**: Parallel trajectory hypotheses with real-time pruning based on constraint satisfaction

Companion Processor Architecture
--------------------------------

Deployment architectures pairing minimal on-tag compute with external processing:

**On-Animal Companion ("Backpack")**

For larger marine animals where a secondary processor package is feasible:

- **On-tag**: Lightweight sensor acquisition, basic filtering, data buffering
- **Companion processor**: Full sensor fusion, INS/dead-reckoning, behavioral classification
- **Benefit**: Real-time 3D trajectory estimation without on-tag compute constraints

**On-Ship Simulation**

Ship-based processing driven by sparse tag updates received via store-and-forward nodes or satellite surface pings:

- **Pre-deployment**: Simulate expected trajectories and behavioral patterns to validate tag configuration
- **During deployment**: Real-time trajectory reconstruction from periodic position fixes and behavioral summaries
- **Post-deployment**: Rapid preliminary analysis before tag recovery, informing retrieval strategy
- **Data sources**: Acoustic modem check-ins, satellite surface transmissions, opportunistic proximity detections

AI-Assisted Analysis
--------------------

Domain-specialized AI agents for post-facto analysis:

- **Custom knowledge agents**: LLM agents trained on marine biologging literature, species-specific behavioral ontologies, and sensor fusion methodology
- **Automated interpretation**: Natural language querying of processed datasets ("When did the animal exhibit foraging behavior at depth?")
- **Analysis workflows**: AI-assisted hypothesis generation, anomaly detection, and quality control
- **Integration**: Built on `kanoa <https://github.com/lhzn-io/kanoa>`_ framework for domain-specific agent development

Immersive Visualization
-----------------------

Post-facto analysis enhancements:

- **3D trajectory rendering**: Real-time visualization of reconstructed animal paths with behavioral state overlay
- **Unity integration**: Export pseudotracks to Unity for immersive VR/AR analysis of diving behavior
- **Simulation playback**: Replay sensor fusion pipeline with adjustable parameters for algorithm tuning