Pipeline Architecture: From Lab to Field

The package provides two distinct processing pipelines designed for different stages of the research workflow. Start with post-facto analysis in the lab, then deploy adaptive sensor fusion in the field.

Post-Facto Pipeline

Lab Analysis & Validation

The post-facto pipeline is designed for batch analysis where the complete dataset is available. This is the recommended starting point for new deployments.

Purpose

  • Batch analysis of recovered biologger data

  • Algorithm validation and scientific publications

  • R-compatibility for historical comparisons

  • Establishing calibration parameters for future adaptive processing

Characteristics

  • Memory: Full dataset loaded into memory

  • Processing: 8-stage pipeline with non-causal algorithms

  • Accuracy: Highest accuracy through full-dataset calibration

Calibration Modes

batch_compute

Two-pass processing that computes calibrations from the full dataset. First pass collects sensor data, second pass applies computed calibrations.

fixed

Single-pass processing with locked parameters from prior analysis. Use when calibration values are already known from previous runs.

R Compatibility

The batch_compute mode replicates the gRumble R package’s algorithms:

  • Perfect numerical agreement: <0.1° error on orientation vs. R reference

  • Two-pass processing matches R’s colMeans() and MagOffset() functions

  • Scientific reproducibility for historical comparisons and validation studies

Adaptive Sensor Fusion Pipeline

Real-Time Deployment

The adaptive sensor fusion pipeline is designed for real-time processing where data arrives sample-by-sample. It uses fully causal algorithms with no lookahead.

Purpose

  • On-tag processing for next-generation PSAT deployments with SD card logging

  • Embedded systems and edge computing (resource-constrained environments)

  • Digital twin testing using historical data

  • Foundation for future satellite uplink and store-and-forward architectures

Characteristics

  • Memory: Fixed memory footprint, O(1) space complexity

  • Processing: Fully causal 8-stage pipeline (no lookahead)

  • Latency: Immediate output per sample

Algorithms

Current implementations:

  • Variance-based attachment angle calibration with convergence detection

  • Hard-iron magnetometer calibration via sphere-fitting

  • Tilt-compensated heading calculation

  • Multi-scale activity-adaptive depth smoothing

In progress:

  • Madgwick filter (quaternion-based orientation)

  • Kalman filter (optimal state estimation)

Calibration Modes

progressive

Online EMA-based calibration that adapts during deployment. Converges within first 2-3 minutes of data collection. Default mode for adaptive processing.

fixed

Use pre-computed calibration parameters from post-facto analysis. Fastest processing with no calibration overhead.

Digital Twin Workflow

Run the adaptive sensor fusion pipeline on historical CSV files to validate real-time algorithms before field deployment:

# Validate adaptive calibration against post-facto batch results
python -m biologger_pseudotrack --config data/config_streaming_Rcompat.yml

This workflow enables:

  • Testing adaptive algorithms without deploying to physical hardware

  • Comparing streaming vs batch processing results

  • Tuning calibration parameters for specific species/deployments

  • Validating that adaptive processing converges to batch-computed values

Pipeline Comparison

Aspect

Post-Facto

Adaptive Sensor Fusion

When to Use

Lab analysis, publications

Field deployment, real-time

Data Availability

Complete dataset

Sample-by-sample

Memory

Full dataset in memory

O(1) fixed footprint

Calibration

batch_compute or fixed

progressive or fixed

Accuracy

Highest (non-causal)

Near-equivalent (causal)

R Compatibility

Exact match available

Approximate match

Roadmap

Data Offloading Architectures

Future versions will support multiple data retrieval pathways:

  • Marine store-and-forward nodes: Underwater acoustic modems for opportunistic data transfer when tagged animals pass near fixed receivers

  • Satellite uplink: Direct-to-satellite transmission for priority metrics (behavioral state summaries, position estimates)

  • Hybrid architectures: Onboard filtering to prioritize high-value data for bandwidth-constrained uplinks

GPU Acceleration & Parallelization

Planned compute enhancements for high-fidelity real-time processing:

  • GPU-accelerated sensor fusion: CUDA/OpenCL backends for matrix operations in Kalman and particle filters

  • Online Monte Carlo methods: Sequential Monte Carlo (particle filters) for probabilistic state estimation with GPU-parallelized particle propagation

  • Ensemble dead-reckoning: Parallel trajectory hypotheses with real-time pruning based on constraint satisfaction

Companion Processor Architecture

Deployment architectures pairing minimal on-tag compute with external processing:

On-Animal Companion (“Backpack”)

For larger marine animals where a secondary processor package is feasible:

  • On-tag: Lightweight sensor acquisition, basic filtering, data buffering

  • Companion processor: Full sensor fusion, INS/dead-reckoning, behavioral classification

  • Benefit: Real-time 3D trajectory estimation without on-tag compute constraints

On-Ship Simulation

Ship-based processing driven by sparse tag updates received via store-and-forward nodes or satellite surface pings:

  • Pre-deployment: Simulate expected trajectories and behavioral patterns to validate tag configuration

  • During deployment: Real-time trajectory reconstruction from periodic position fixes and behavioral summaries

  • Post-deployment: Rapid preliminary analysis before tag recovery, informing retrieval strategy

  • Data sources: Acoustic modem check-ins, satellite surface transmissions, opportunistic proximity detections

AI-Assisted Analysis

Domain-specialized AI agents for post-facto analysis:

  • Custom knowledge agents: LLM agents trained on marine biologging literature, species-specific behavioral ontologies, and sensor fusion methodology

  • Automated interpretation: Natural language querying of processed datasets (“When did the animal exhibit foraging behavior at depth?”)

  • Analysis workflows: AI-assisted hypothesis generation, anomaly detection, and quality control

  • Integration: Built on kanoa framework for domain-specific agent development

Immersive Visualization

Post-facto analysis enhancements:

  • 3D trajectory rendering: Real-time visualization of reconstructed animal paths with behavioral state overlay

  • Unity integration: Export pseudotracks to Unity for immersive VR/AR analysis of diving behavior

  • Simulation playback: Replay sensor fusion pipeline with adjustable parameters for algorithm tuning