EfficientNetV2-L
Mixed FP16 · A100
BLOOFINZ-2022
19 Discrete Casts
Logit-Adjusted CE
Eastern Indian Ocean

Phytoplankton
Classification

Goes Lab · Lamont-Doherty Earth Observatory · Columbia University

Deep-learning pipeline for autonomous FlowCAM-based identification of marine phytoplankton species. Trained on 12,315 library images · inference on 69,851 discrete cast particles · BLOOFINZ-2022 eastern Indian Ocean expedition · R/V Roger Revelle.

🌊
Library Images
0
↑ Library 1 + 2
🔬
Species Classes
0
8 in training
Val Accuracy
0%
↑ Best ep7 / Stage 2
Parameters
118M
662K trainable
🚢
Discrete Casts
19
69,851 particles
F1
Macro F1
0.50
110× class imbalance
01
BLOOFINZ-2022 Cruise Overview
Eastern Indian Ocean · Argo Basin · Feb–Mar 2022 · R/V Roger Revelle · 19 Discrete Cast stations · FlowCAM flow cytometry
Total Casts
19 stations
BFZ-Cast7 → BFZ-Cast143
Raw Particles
69,851 imaged
FlowCAM flow-through
Sampled (v9)
3,800 images
200 per cast · fast inference
High Confidence
524 labeled
≥ 80% confidence threshold
CSV Columns
70 features
ESD, biovolume, RGB, chains
Depth Range
5–10 m
Surface + subsurface
Segment Format
.jpg crops
Binary-masked particle images
Collection Date
2022 Feb
Eastern Indian Ocean expedition
Discrete Cast Folders
Expedition Summary
Cruise IDBLOOFINZ-2022
PlatformResearch Vessel
RegionArabian Sea
InstrumentFlowCAM (flow cytometry)
Image typeBrightfield + fluorescence
Calibration factor1.3699 µm/pixel
LabGoes Lab · LDEO · Columbia
● ALL CAST STATIONS — CLICK TO EXPLORE
02
Training Dataset
Class distribution across 2 merged library sources · 10,473 training images
Class Distribution — Library 1 (val split · 2,095 samples)
Balance Ratio
110× imbalance
Sqrt sampling
Stratified split
Multi-Library
Library 110,473 img · 8 classes
Library 21,842 img · 5 classes
Combined12,315 images
Extra classesPennate Diatoms · Richelia
Train/Val Split
Train8,378 (80%)
Validation2,095 (20%)
StrategyStratified
Steps/epoch262
Batch size32
Augmentation Pipeline
RandomFlip horizontal
RandomRotation ±10°
RandomZoom ±10%
RandomTranslation ±6%
No contrast aug ← microscopy
03
Discrete Cast Inference Results
V9 model applied to BLOOFINZ-2022 FlowCAM segments · 200 images/cast · 19 casts sampled
Inference Summary
Total sampled3,800 images
High-conf (≥80%)524 (13.8%)
Confidence threshold0.80
Model usedv9_infer.keras
Runtime (A100)~3 minutes
Batch size64 (fast mode)
Full dataset69,851 particles
High-Confidence Class Distribution (≥80%)
Confidence Analysis
Low Confidence Explanation — Why Only 13.8% Pass ≥80% Threshold
Domain Gap
Library vs Real-World
V9 trained on curated library images. Discrete cast particles are raw FlowCAM scans with varying focus, orientation, and debris — introducing a visual domain gap.
Debris & Noise
Non-biological particles
Mineral particles, detritus, and aggregates appear frequently in real ocean samples. These ambiguous particles naturally receive low classification confidence.
Next Step
Discrete retraining
Pseudo-labeling the 524 high-confidence particles and retraining on Discrete data should dramatically improve real-world accuracy — the professor's directed approach.
04
Model Architecture
EfficientNetV2-L · 384×384 input · 118M params · Mixed FP16 · Two-stage transfer learning
Forward Pass — hover each block
Input
384×384×3
Augment
Flip·Rot·Zoom
Preprocess
ENv2 normalize
EfficientNetV2-L
117.7M · frozen S1
GAP
1280-dim
BN + Drop 0.4
1280-dim
Dense Swish
512-dim
Drop 0.3
512-dim
Logits fp32
8 classes out
Stage 1: Head only · LR 2e-4 cosine · 12 epochs · backbone frozen
Stage 2: Top 5% unfrozen · LR 5e-6 constant · 10 epochs · BN frozen
Loss: SparseCatCE from_logits · AdamW wd 1e-4
Logit-adjusted CE with sqrt-inverse class priors
Batch 32 · Steps 262/epoch · Mixed FP16 · A100
05
Training History
Stage 1 (ep 1–12) + Stage 2 fine-tune (ep 13–22) · 22 total epochs · data from metrics.json
Accuracy — Train vs Validation
Loss — Train vs Validation
Stage 1 — Head Training
Epochs1–12
Best val_acc59.86%
LR scheduleCosine 2e-4 → 2e-5
Trainable params662K (head only)
Loss at ep120.964
Stage 2 — Gentle Fine-Tune
Epochs13–22
Best val_acc60.62%
LRConstant 5e-6
Unfrozen layersTop 5% backbone
BN layersAll frozen
Best Val Accuracy
Epoch 19 · Stage 2
06
Model Evaluation
Confusion matrix · per-class F1 · precision · recall · 2,095 validation samples
Confusion Matrix — hover cells for exact counts
F1 Score Radar
Per-Class Metrics — Precision · Recall · F1 · Support
ClassPrecisionRecallF1 ScoreSupportF1 BarStatus
07
Model Confidence Analysis
Softmax probability distribution across classes on discrete cast inference
Confidence by Class (High-Conf Only)
Discrete vs Library Performance
V9 was trained on clean library images. Discrete cast particles are raw ocean samples — more challenging.
Library val_acc60.62%
Discrete high-conf rate13.8%
Dominant class (discrete)Nanophytoplankton (294)
Rarest class (discrete)Synechococcus (1)
Improvement pathRetrain on Discrete pseudo-labels
Top-3 Accuracy Insight
Top-3 accuracy during library training reached 98%+ — the correct class is almost always in the top 3 predictions.
Val Top-1 acc60.62%
Val Top-3 acc~98.2%
Biggest confusion pairNanophyto ↔ Picocyano
Best performingMineral particles (F1 0.96)
Most confusedCrocosphaera (F1 0.08)
08
ESD & Morphological Parameters
Equivalent Spherical Diameter · biovolume · carbon · aspect ratio · all from FlowCAM CSV columns
Derivation Formulas — based on Menden-Deuer & Lessard 2000
ESD (Diameter)
D_ESD = √(4A/π)
FlowCAM column: "Diameter (ESD)" · units: µm
Biovolume
V = (π/6) × ESD³
Sphere approximation · FlowCAM: "Volume (ESD)" · µm³
Carbon Content
C = 0.109 × V^0.991
Menden-Deuer & Lessard 2000 · units: pgC/cell
Size Class
<2 · 2–20 · >20 µm
Pico · Nano · Micro phytoplankton size classes
ESD (µm) · Diameter
FlowCAM col: "Diameter (ESD)"
Volume (µm³) · Biovolume
FlowCAM col: "Volume (ESD)"
Carbon (pgC/cell)
Derived: 0.109 × V^0.991
Aspect Ratio
FlowCAM col: "Aspect Ratio"
ESD Distribution by Species — Literature Values (µm)
Mean ± Std · Menden-Deuer & Lessard 2000 · Chisholm et al. 1992 · Selph et al. 2022
Biovolume vs Carbon Content — Allometric Scaling
C = 0.109 × V^0.991 · pgC/cell vs µm³ per species
09
Carbon Biomass Estimator
Interactive calculator · Menden-Deuer & Lessard 2000 · pgC/cell from ESD
Interactive Carbon Calculator
36.3
µm³ / cell · Biovolume
4.01
pgC / cell · Carbon
4.01
µgC / mL · Biomass
V = (π/6) × ESD³  ·  C = 0.109 × V^0.991  ·  Menden-Deuer & Lessard 2000
Representative Size Classes
Prochlorococcus (Pico, <2 µm)
ESD ~0.6 µm · V ~0.11 µm³ · C ~0.015 pgC/cell
Smallest free-living photosynthetic organism. Dominant in oligotrophic waters.
Synechococcus (Pico, 1–3 µm)
ESD ~1.5 µm · V ~1.77 µm³ · C ~0.21 pgC/cell
Ubiquitous marine picocyanobacterium. Important in nutrient-rich upwelling zones.
Nanophytoplankton (2–20 µm)
ESD ~5–15 µm · V varies · C 1–50 pgC/cell
Major contributors to primary production. Includes flagellates, small diatoms.
Nitzschia (Micro diatom)
ESD ~10–50 µm · High biovolume · C ~100+ pgC/cell
Pennate diatom. Chain-forming. High carbon per cell due to silica frustule.
10
Optical & Fluorescence Properties
RGB channel analysis · scatter · fluorescence channels Ch1/Ch2 · from FlowCAM CSV columns
Color Channel Composition
Key Optical Columns (FlowCAM)
Average Red Mean pixel intensity R-channel
Average Green Chlorophyll proxy signal
Average Blue Scatter / transparency proxy
Ch2/Ch1 Ratio Fluorescence emission ratio
Scatter Area/Peak Side scatter · cell size proxy
Optical Classification Significance
Chlorophyll Fluorescence
Green channel (Average Green) and Ch2 correlate with chlorophyll-a content. High values indicate healthy, photosynthetically active cells.
Phycoerythrin
Red fluorescence (Ch1) distinguishes Synechococcus from Prochlorococcus — Synechococcus has phycoerythrin, Prochlorococcus does not.
Side Scatter (SSC)
Scatter Area/Peak captures cell complexity and internal structure. Used to separate mineral particles from biological cells.
Per-Species Optical Fingerprints — Normalized FlowCAM Channels (literature + probe values)
Prochlorococcus · Synechococcus
Nanophyto · Picocyano
Nitzschia · Mineral particles
Radar axes: Red channel · Green channel · Blue channel · Scatter Area · Ch2/Ch1 ratio · Intensity · Values normalized 0–100
11
Depth Profile Analysis
Sample depth extracted from cast folder names · BFZ-CAST-XXX-YM → Y meters depth
Known Cast Depths (from folder names)
Depth Distribution
Predicted Species at Depth — Discrete Cast Inference (high-conf ≥80%)
5m vs 10m · stacked bar · species predicted by V9 across all casts · exact cast-level data pending Anima
12
Geospatial Distribution
Eastern Indian Ocean · Argo Basin · R/V Roger Revelle · Feb–Mar 2022 · real GPS positions from BLOOFINZ_MET.xlsx
Eastern Indian Ocean — Argo Basin · 19 Cast Stations · Feb–Mar 2022
5m depth casts
10m depth casts
Mixed/both depths
Cycle stations
Cruise Metadata
VesselR/V Roger Revelle
DatesFeb–Mar 2022
RegionEastern Indian Ocean
Sub-regionArgo Basin
Lat range~17°S – 24°S
Lon range~110°E – 121°E
StrategyAdaptive sampling
Ecological Context
Oligotrophic Waters
Argo Basin is nutrient-poor (low NO₃, PO₄). Dominated by pico/nanoplankton (Prochlorococcus, Synechococcus). N₂ fixation hotspot.
Bluefin Tuna Habitat
Southern Bluefin Tuna spawn here. BLOOFINZ = Bluefin Larvae in Oligotrophic Ocean Foodwebs, Investigations of Nutrients to Zooplankton.
Real GPS — BLOOFINZ_MET.xlsx
Positions derived from R/V Roger Revelle underway GPS data. Swapped Lat/Lon columns corrected. Click any station for SST, salinity, and fluorescence.
Sea Surface Conditions — Real R/V Revelle Underway Data
SST (°C) + Fluorescence · from BLOOFINZ_MET.xlsx · swapped Lat/Lon corrected · 19 cast stations
Daily Chlorophyll-a & Fv/Fm — 133,806 CLS Measurements · Full Cruise Track
Calibrated Chl-a (µg/L) + Photosynthetic Efficiency (Fv/Fm) · BLOOFINZ_CLS_FINAL.xlsx · Jan 29 – Mar 3 2022
● BIODIVERSITY METRICS — ALL CAST STATIONS
Station Depth Shannon H′ Simpson D Pielou J′ Richness Dominant %
13
Feature Histograms
Yeshitha's ESD distribution analysis · BTR-1, BTR-3, BTR-4 · Real FlowCAM data · 12,838 particles · BLOOFINZ-2022 Discrete Casts
BTR-1 — ESD Distribution Analysis 3,130 particles · 2022-03-04
BTR-1 ESD Distribution Histogram
BTR-3 — ESD Distribution Analysis 7,055 particles
BTR-3 ESD Distribution Histogram
BTR-4 — ESD Distribution Analysis 2,653 particles
BTR-4 ESD Distribution Histogram
Source: Yeshitha · Results-Histograms · Discrete-Updated-March 2026 · BLOOFINZ-2022 · ESD (Equivalent Spherical Diameter) Distribution Analysis · Diameter (ABD)
Morphological & Optical Distributions Computed from BTR-1 + BTR-3 + BTR-4 raw CSV data · 12,838 particles
Cell Size Distribution (ESD µm)
87% in 0–10µm · picoplankton dominant
Fluorescence · Avg Green (chl-a proxy)
Average Green channel · oligotrophic signal
Pixel Intensity (Grayscale)
FlowCAM Intensity column · mean 105.8
Aspect Ratio Distribution
Mean 0.615 · mostly near-spherical
Circularity Distribution
Mean 0.834 · 43% perfectly circular
Elongation Distribution
69% single-cell · diatom long tail
14
Long Chain Finder
Ranked by Particles Per Chain · FlowCAM direct measurement · biological significance for nutrient stress and bloom dynamics
🔗
Why Chains Form
Phytoplankton form chains when daughter cells remain attached after division — a response to nutrient limitation (especially Fe, NO₃) or as a competitive strategy in turbulent environments.
🌊
Bloom Dynamics
Chain length correlates with growth phase. Longer chains signal rapid division in nutrient-rich waters. Short chains or single cells indicate senescence or nutrient limitation.
📏
FlowCAM "Particles Per Chain"
FlowCAM directly measures chain length during imaging. The column "Particles Per Chain" counts individual cells per chain aggregate — this is the exact biological signal used here.
🌿
Trichodesmium — Special Significance for Goes Lab
Trichodesmium is a nitrogen-fixing cyanobacterium that forms long trichome bundles (puff or raft colonies) visible as chains in FlowCAM. It is a major focus of LDEO/Goes Lab research in the Argo Basin (Eastern Indian Ocean) — responsible for substantial N₂ fixation in oligotrophic waters. Long chain detections of Crocosphaera or Cyanobacteria classes may indicate Trichodesmium or related colonial diazotrophs. Chain length and colony morphology are key distinguishing features that should be cross-referenced with Ch1/Ch2 fluorescence ratios for confirmation.
Search Parameters
Set parameters and press Find Chains
Particle images load when chain_images.json is pushed from Colab
💡 To enable particle images: Run the chain image export cell in Colab → pushes chain_images.json → images appear automatically
Particles Per Chain · direct FlowCAM measurement
Phytoplankton Classification · Goes Lab (LDEO) · Columbia University · Model v9 · EfficientNetV2-L · TF 2.19
Session Active
Station
Overview
Species
Carbon Biomass
Environment
Biodiversity
click anywhere to close
🏠
Dashboard
🚢
Cruise
📊
Dataset
🔬
Discrete
Evaluation
ESD
Carbon
Optical
Depth
Chains