Pipeline Architecture: From Lab to Field
The package provides two distinct processing pipelines designed for different stages of the research workflow. Start with post-facto analysis in the lab, then deploy adaptive sensor fusion in the field.
Post-Facto Pipeline
Lab Analysis & Validation
The post-facto pipeline is designed for batch analysis where the complete dataset is available. This is the recommended starting point for new deployments.
Purpose
Batch analysis of recovered biologger data
Algorithm validation and scientific publications
R-compatibility for historical comparisons
Establishing calibration parameters for future adaptive processing
Characteristics
Memory: Full dataset loaded into memory
Processing: 8-stage pipeline with non-causal algorithms
Accuracy: Highest accuracy through full-dataset calibration
Calibration Modes
batch_computeTwo-pass processing that computes calibrations from the full dataset. First pass collects sensor data, second pass applies computed calibrations.
fixedSingle-pass processing with locked parameters from prior analysis. Use when calibration values are already known from previous runs.
R Compatibility
The batch_compute mode replicates the gRumble R package’s algorithms:
Perfect numerical agreement: <0.1° error on orientation vs. R reference
Two-pass processing matches R’s
colMeans()andMagOffset()functionsScientific reproducibility for historical comparisons and validation studies
Adaptive Sensor Fusion Pipeline
Real-Time Deployment
The adaptive sensor fusion pipeline is designed for real-time processing where data arrives sample-by-sample. It uses fully causal algorithms with no lookahead.
Purpose
On-tag processing for next-generation PSAT deployments with SD card logging
Embedded systems and edge computing (resource-constrained environments)
Digital twin testing using historical data
Foundation for future satellite uplink and store-and-forward architectures
Characteristics
Memory: Fixed memory footprint, O(1) space complexity
Processing: Fully causal 8-stage pipeline (no lookahead)
Latency: Immediate output per sample
Algorithms
Current implementations:
Variance-based attachment angle calibration with convergence detection
Hard-iron magnetometer calibration via sphere-fitting
Tilt-compensated heading calculation
Multi-scale activity-adaptive depth smoothing
In progress:
Madgwick filter (quaternion-based orientation)
Kalman filter (optimal state estimation)
Calibration Modes
progressiveOnline EMA-based calibration that adapts during deployment. Converges within first 2-3 minutes of data collection. Default mode for adaptive processing.
fixedUse pre-computed calibration parameters from post-facto analysis. Fastest processing with no calibration overhead.
Digital Twin Workflow
Run the adaptive sensor fusion pipeline on historical CSV files to validate real-time algorithms before field deployment:
# Validate adaptive calibration against post-facto batch results
python -m biologger_pseudotrack --config data/config_streaming_Rcompat.yml
This workflow enables:
Testing adaptive algorithms without deploying to physical hardware
Comparing streaming vs batch processing results
Tuning calibration parameters for specific species/deployments
Validating that adaptive processing converges to batch-computed values
Pipeline Comparison
Aspect |
Post-Facto |
Adaptive Sensor Fusion |
|---|---|---|
When to Use |
Lab analysis, publications |
Field deployment, real-time |
Data Availability |
Complete dataset |
Sample-by-sample |
Memory |
Full dataset in memory |
O(1) fixed footprint |
Calibration |
batch_compute or fixed |
progressive or fixed |
Accuracy |
Highest (non-causal) |
Near-equivalent (causal) |
R Compatibility |
Exact match available |
Approximate match |
Recommended Workflow
Recover tag data from deployment
Run post-facto pipeline with
batch_computemodeValidate results against R reference implementation (if applicable)
Extract calibration parameters for the deployment
Configure adaptive pipeline with
fixedmode using extracted parametersTest adaptive pipeline on same data (digital twin validation)
Deploy adaptive pipeline to field systems with confidence
Roadmap
Data Offloading Architectures
Future versions will support multiple data retrieval pathways:
Marine store-and-forward nodes: Underwater acoustic modems for opportunistic data transfer when tagged animals pass near fixed receivers
Satellite uplink: Direct-to-satellite transmission for priority metrics (behavioral state summaries, position estimates)
Hybrid architectures: Onboard filtering to prioritize high-value data for bandwidth-constrained uplinks
GPU Acceleration & Parallelization
Planned compute enhancements for high-fidelity real-time processing:
GPU-accelerated sensor fusion: CUDA/OpenCL backends for matrix operations in Kalman and particle filters
Online Monte Carlo methods: Sequential Monte Carlo (particle filters) for probabilistic state estimation with GPU-parallelized particle propagation
Ensemble dead-reckoning: Parallel trajectory hypotheses with real-time pruning based on constraint satisfaction
Companion Processor Architecture
Deployment architectures pairing minimal on-tag compute with external processing:
On-Animal Companion (“Backpack”)
For larger marine animals where a secondary processor package is feasible:
On-tag: Lightweight sensor acquisition, basic filtering, data buffering
Companion processor: Full sensor fusion, INS/dead-reckoning, behavioral classification
Benefit: Real-time 3D trajectory estimation without on-tag compute constraints
On-Ship Simulation
Ship-based processing driven by sparse tag updates received via store-and-forward nodes or satellite surface pings:
Pre-deployment: Simulate expected trajectories and behavioral patterns to validate tag configuration
During deployment: Real-time trajectory reconstruction from periodic position fixes and behavioral summaries
Post-deployment: Rapid preliminary analysis before tag recovery, informing retrieval strategy
Data sources: Acoustic modem check-ins, satellite surface transmissions, opportunistic proximity detections
AI-Assisted Analysis
Domain-specialized AI agents for post-facto analysis:
Custom knowledge agents: LLM agents trained on marine biologging literature, species-specific behavioral ontologies, and sensor fusion methodology
Automated interpretation: Natural language querying of processed datasets (“When did the animal exhibit foraging behavior at depth?”)
Analysis workflows: AI-assisted hypothesis generation, anomaly detection, and quality control
Integration: Built on kanoa framework for domain-specific agent development
Immersive Visualization
Post-facto analysis enhancements:
3D trajectory rendering: Real-time visualization of reconstructed animal paths with behavioral state overlay
Unity integration: Export pseudotracks to Unity for immersive VR/AR analysis of diving behavior
Simulation playback: Replay sensor fusion pipeline with adjustable parameters for algorithm tuning