Pipeline Architecture: From Lab to Field

The package provides two distinct processing pipelines designed for different stages of the research workflow. Start with post-facto analysis in the lab, then deploy adaptive sensor fusion in the field.

Post-Facto Pipeline

Lab Analysis & Validation

The post-facto pipeline is designed for batch analysis where the complete dataset is available. This is the recommended starting point for new deployments.

Purpose

Batch analysis of recovered biologger data
Algorithm validation and scientific publications
R-compatibility for historical comparisons
Establishing calibration parameters for future adaptive processing

Characteristics

Memory: Full dataset loaded into memory
Processing: 8-stage pipeline with non-causal algorithms
Accuracy: Highest accuracy through full-dataset calibration

Calibration Modes

batch_compute: Two-pass processing that computes calibrations from the full dataset. First pass collects sensor data, second pass applies computed calibrations.
fixed: Single-pass processing with locked parameters from prior analysis. Use when calibration values are already known from previous runs.

R Compatibility

The batch_compute mode replicates the gRumble R package’s algorithms:

Perfect numerical agreement: <0.1° error on orientation vs. R reference
Two-pass processing matches R’s colMeans() and MagOffset() functions
Scientific reproducibility for historical comparisons and validation studies

Adaptive Sensor Fusion Pipeline

Real-Time Deployment

The adaptive sensor fusion pipeline is designed for real-time processing where data arrives sample-by-sample. It uses fully causal algorithms with no lookahead.

Purpose

On-tag processing for next-generation PSAT deployments with SD card logging
Embedded systems and edge computing (resource-constrained environments)
Digital twin testing using historical data
Foundation for future satellite uplink and store-and-forward architectures

Characteristics

Memory: Fixed memory footprint, O(1) space complexity
Processing: Fully causal 8-stage pipeline (no lookahead)
Latency: Immediate output per sample

Algorithms

Current implementations:

Variance-based attachment angle calibration with convergence detection
Hard-iron magnetometer calibration via sphere-fitting
Tilt-compensated heading calculation
Multi-scale activity-adaptive depth smoothing

In progress:

Madgwick filter (quaternion-based orientation)
Kalman filter (optimal state estimation)

Calibration Modes

progressive: Online EMA-based calibration that adapts during deployment. Converges within first 2-3 minutes of data collection. Default mode for adaptive processing.
fixed: Use pre-computed calibration parameters from post-facto analysis. Fastest processing with no calibration overhead.

Digital Twin Workflow

Run the adaptive sensor fusion pipeline on historical CSV files to validate real-time algorithms before field deployment:

# Validate adaptive calibration against post-facto batch results
python -m biologger_pseudotrack --config data/config_streaming_Rcompat.yml

This workflow enables:

Testing adaptive algorithms without deploying to physical hardware
Comparing streaming vs batch processing results
Tuning calibration parameters for specific species/deployments
Validating that adaptive processing converges to batch-computed values

Pipeline Comparison

Aspect	Post-Facto	Adaptive Sensor Fusion
When to Use	Lab analysis, publications	Field deployment, real-time
Data Availability	Complete dataset	Sample-by-sample
Memory	Full dataset in memory	O(1) fixed footprint
Calibration	batch_compute or fixed	progressive or fixed
Accuracy	Highest (non-causal)	Near-equivalent (causal)
R Compatibility	Exact match available	Approximate match

Recommended Workflow

Recover tag data from deployment
Run post-facto pipeline with batch_compute mode
Validate results against R reference implementation (if applicable)
Extract calibration parameters for the deployment
Configure adaptive pipeline with fixed mode using extracted parameters
Test adaptive pipeline on same data (digital twin validation)
Deploy adaptive pipeline to field systems with confidence

Roadmap

Data Offloading Architectures

Future versions will support multiple data retrieval pathways:

Marine store-and-forward nodes: Underwater acoustic modems for opportunistic data transfer when tagged animals pass near fixed receivers
Satellite uplink: Direct-to-satellite transmission for priority metrics (behavioral state summaries, position estimates)
Hybrid architectures: Onboard filtering to prioritize high-value data for bandwidth-constrained uplinks

GPU Acceleration & Parallelization

Planned compute enhancements for high-fidelity real-time processing:

GPU-accelerated sensor fusion: CUDA/OpenCL backends for matrix operations in Kalman and particle filters
Online Monte Carlo methods: Sequential Monte Carlo (particle filters) for probabilistic state estimation with GPU-parallelized particle propagation
Ensemble dead-reckoning: Parallel trajectory hypotheses with real-time pruning based on constraint satisfaction

Companion Processor Architecture

Deployment architectures pairing minimal on-tag compute with external processing:

On-Animal Companion (“Backpack”)

For larger marine animals where a secondary processor package is feasible:

On-tag: Lightweight sensor acquisition, basic filtering, data buffering
Companion processor: Full sensor fusion, INS/dead-reckoning, behavioral classification
Benefit: Real-time 3D trajectory estimation without on-tag compute constraints

On-Ship Simulation

Ship-based processing driven by sparse tag updates received via store-and-forward nodes or satellite surface pings:

Pre-deployment: Simulate expected trajectories and behavioral patterns to validate tag configuration
During deployment: Real-time trajectory reconstruction from periodic position fixes and behavioral summaries
Post-deployment: Rapid preliminary analysis before tag recovery, informing retrieval strategy
Data sources: Acoustic modem check-ins, satellite surface transmissions, opportunistic proximity detections

AI-Assisted Analysis

Domain-specialized AI agents for post-facto analysis:

Custom knowledge agents: LLM agents trained on marine biologging literature, species-specific behavioral ontologies, and sensor fusion methodology
Automated interpretation: Natural language querying of processed datasets (“When did the animal exhibit foraging behavior at depth?”)
Analysis workflows: AI-assisted hypothesis generation, anomaly detection, and quality control
Integration: Built on kanoa framework for domain-specific agent development

Immersive Visualization

Post-facto analysis enhancements:

3D trajectory rendering: Real-time visualization of reconstructed animal paths with behavioral state overlay
Unity integration: Export pseudotracks to Unity for immersive VR/AR analysis of diving behavior
Simulation playback: Replay sensor fusion pipeline with adjustable parameters for algorithm tuning