Concepts¶
If this is your first underwater-robotics project, this page is a survival glossary. Every other page assumes you've at least skimmed these terms. Read it linearly. Each concept builds on the previous.
ROS 2 in 90 seconds¶
The Robot Operating System (ROS) isn't an OS. It's a middleware for distributed robot software. You run lots of small programs ("nodes") that talk to each other over a network. ROS 2 is the second major version; this workspace uses the Humble distribution.
The four communication primitives you'll see everywhere:
| Primitive | Shape | When to use | Example in this workspace |
|---|---|---|---|
| Topic | Pub / sub, one-way, streaming | High-rate data (images, IMU, odometry) | /bluerov/odom, /bluerov/front_cam/image |
| Service | Request / response, blocking, one-shot | "Convert X for me right now" | /bluerov/convert_to_controls_pose |
| Action | Request → feedback → result, long-running | "Drive to this pose; tell me when you arrive" | /bluerov/controls |
| Parameter | Key-value config on a node | Tuning at launch / runtime | use_sim_time, camera_info_topic |
How to inspect what's running
Inside the container:
ros2 node list # which nodes are alive
ros2 topic list # which topics exist
ros2 topic hz <topic> # how often is it being published
ros2 topic echo <topic> # print the messages
ros2 service list
ros2 action list
ros2 param list /<node> # parameters of one node
ros2 run tf2_ros tf2_echo <parent> <child> # check a TF
These five commands solve ~80% of "why isn't this working" questions.
Coordinate frames and TF¶
A frame is a 3D coordinate system attached to something. The world, the robot's body, a camera, a torpedo panel. ROS tracks the relationships between frames in a tree called TF (transform).
In this workspace:
| Frame | What it's attached to | Convention |
|---|---|---|
map |
The simulated world origin | ENU. +x east, +y north, +z up |
base_link |
The robot's body centre | FLU. +x forward, +y left, +z up |
front_cam_optical |
Front camera, looking forward | Standard ROS optical: +z out of lens |
bottom_cam_optical |
Bottom camera, looking down | Same |
bin/yolo, torpedo/yolo, Task04_Tagging_01_optical, … |
Detected objects | Set by the pose estimator that broadcasts them |
The TF tree at any moment lets you answer: "where is this point, expressed in that frame?". Without anyone hand-coding the maths.
Three frame conventions you must keep straight
| Name | +x |
+y |
+z |
|---|---|---|---|
ENU (East, North, Up): world / map |
east | north | up |
| NED (North, East, Down): used by ArduSub internally | north | east | down |
FLU (Forward, Left, Up): vehicle / base_link |
forward | left | up |
A 2 m forward goal in FLU is (2, 0, 0). In NED that same direction
might be (0, 2, 0) if the vehicle faces east. Mixing them up is
the single most common source of "the robot drove the wrong way"
bugs. See Conventions.
Why a TF tree instead of just hand-coding offsets
Because vehicles rotate. The camera's pose relative to the world
changes every tick. Asking "where is the bin in base_link?" is
only useful if someone keeps the lookup table up to date. Which is
what TF does, automatically, every time a publisher broadcasts a
transform.
Behaviour trees and py_trees¶
A behaviour tree (BT) is a structured way to write mission logic. It's a tree of nodes; each tick, the root asks its children to run. The two control-flow nodes you'll see most:
- Sequence: runs children left-to-right. Succeeds only if every child succeeded. Stops at the first failure.
- Selector (a.k.a. Fallback): runs children left-to-right. Succeeds as soon as any child succeeds. Used to express "try plan A; if it fails, try plan B."
Leaves do the actual work. Call a service, send an action goal, set a blackboard key, wait, etc.
Why a BT instead of a state machine?
State machines (FSMs) get tangled fast: every state needs explicit
transitions to every recoverable state. BTs encode the same logic
more compactly: a Selector is "try this, then try that"; a
Sequence is "do these in order". Recovery shows up as
selector-branches, not as N² transition arrows.
py_trees is the Python BT library
this workspace uses. py_trees_ros is its ROS-2 integration:
action-client helpers, blackboard subscribers, etc. You'll see
py_trees.composites.Sequence, py_trees.composites.Selector,
py_trees.decorators.Retry, etc. in mission code.
memory=True. The one BT detail you must remember¶
Sequence(memory=True) and Selector(memory=True) remember which
child they're currently on. Without memory=True, every tick re-runs
from child 0. That breaks any leg that takes more than one tick (which
is all of them. goto legs take seconds).
Always use memory=True for sequences and selectors in this
workspace.
The blackboard¶
py_trees has a shared key-value store called the blackboard.
Behaviours read and write keys to share state. E.g. one behaviour
estimates a pose, writes it to /bluerov/torpedo/pose, and a later
goto.FromBlackboard reads that key to build the goal.
In this workspace, every mission key is namespaced (/bluerov/bins/...,
/bluerov/torpedo/...) so the bin and torpedo trees can't clobber each
other's state.
Anatomy of a "mission" in this workspace¶
1. tmuxp launches 5 panes per mission
2. Each pane runs a ros2 launch <something>.launch.py
3. The BT pane spins up the py_trees tree
4. Each tree leg either:
- sends a Locomotion action goal (move)
- calls a service (e.g. toggle a vision template)
- reads/writes the blackboard
- waits / retries / fails
5. The Locomotion action server translates each goal into MAVROS setpoints
6. MAVROS sends MAVLink to ArduSub SITL
7. ArduSub computes thruster commands and sends them through the
ArduPilot Gazebo plugin
8. Gazebo simulates the thrusters and physics; that updates the model's
position
9. ground_truth_to_mavros publishes the new position back as TF and
odometry, closing the loop
The whole loop runs at ~1 Hz on the setpoint side and 10 Hz on the BT side. Slow enough that you can watch it tick.
MAVROS, MAVLink, ArduSub, SITL. What are they?¶
This workspace simulates a BlueROV2, an off-the-shelf ROV that runs ArduSub firmware. Real BlueROV2 hardware uses the same firmware. So the simulation talks to the firmware exactly the way a real vehicle would.
| Term | What it is |
|---|---|
| ArduPilot | The open-source autopilot project (planes, copters, ROVs). |
| ArduSub | ArduPilot's flavour for underwater vehicles. The actual firmware running on the BlueROV2. |
| SITL (Software-in-the-Loop) | A native build of the firmware that runs on your PC instead of on the autopilot board. Same code, different target. We use ArduSub SITL. |
| MAVLink | The lightweight serial protocol the firmware speaks. Used over UDP in sim. |
| MAVROS | A ROS package that translates MAVLink ↔ ROS. So mission code in ROS doesn't have to speak MAVLink directly. |
| GUIDED mode | ArduPilot's flight mode that accepts position-setpoint commands. The mission code arms the vehicle and switches it to GUIDED before sending setpoints. |
The hard rule: mission code only talks to MAVROS: never directly to ArduSub or Gazebo. That's what makes the same mission code work on the real vehicle.
Gazebo and the ArduPilot Gazebo plugin¶
Gazebo is the physics simulator. We use Gazebo Harmonic (one of the new "gz-sim" releases. Not the old gazebo-classic).
The ArduPilot Gazebo plugin is a Gazebo system plugin that bridges ArduSub SITL ↔ Gazebo over a JSON / UDP protocol. ArduSub sends desired thruster outputs; Gazebo sends back sensor data and the vehicle's true pose.
You'll never edit this plugin yourself, but you should know it exists: when "the vehicle is in Gazebo but ArduSub thinks it's disconnected", this plugin is almost always the suspect.
Vision. YOLO, pose estimators, point correspondences¶
In RoboSub, you have to find competition targets visually. The detection pipeline in this workspace:
- YOLO (yolo_ros_trt): neural net that finds objects in camera images. Outputs 2D bounding boxes / segmentation masks.
- Pose estimator (pose_estimator): takes the 2D YOLO output plus the camera intrinsics and runs PnP (Perspective-n-Point) to produce a 3D pose of the object. Broadcasts the pose as a TF frame (so the rest of the system can use it).
- Image matching (image_matching):
matches a specific template image (e.g.
Task04_Tagging_01.png) against the camera frame using XFeat. Sends "here are the 2D ↔ 3D point correspondences" to a separate pose estimator that produces another TF. - Cluster_tf (filters): collects a few seconds of noisy per-frame TFs and outputs a single stable averaged TF. The BT consumes this stable TF rather than the jittery raw one.
Mental model: vision is just another TF broadcaster
From the BT's perspective, vision = "some other process is broadcasting a TF that gives me the target's 3D pose". The BT doesn't care whether it came from YOLO + PnP, template matching, a fiducial marker, or sonar. As long as a TF appears in the tree with the right name, the BT can plan around it.
Cluster_tf. What does "clustering" actually do?¶
A pose estimator broadcasts a new TF every frame (often 30 Hz). Detections are noisy. The pose jitters by centimetres frame-to-frame. You don't want to drive a 2.5-cm-tolerance alignment off a jittery single-frame estimate.
cluster_tf buffers a few seconds of TFs, runs HDBSCAN on the
positions to throw out outliers (e.g. detections from before the
vehicle yawed to face the panel), and broadcasts the average of the
largest inlier cluster as a single new TF. The BT then uses that.
| Input | Output |
|---|---|
torpedo/yolo. Noisy, 30 Hz |
torpedo/yolo/clustered. Stable, single TF |
bin/yolo. Noisy, 30 Hz |
bin/yolo/clustered. Stable, single TF |
You only see the input source frames when their broadcaster is alive and publishing. Missing input = clustering fails with "0 valid transforms collected".
Anchor frames¶
Most goals in this workspace are "drive to a target pose". But which part of the robot do you want on the target? The body centre? The dropper? The shooter? The camera?
That's what an anchor frame answers. goto takes an
anchor_frame_name parameter; the conversion service shifts the
commanded pose so the anchor (not base_link) lands on the target.
anchor_frame_name="base_link"(default): body centre on target.anchor_frame_name="dropper_link". Drop opening on target. Used by bin.anchor_frame_name="torpedo_shooter_left_link". Left shooter on target. Used by torpedo.anchor_frame_name="front_cam_optical". Camera lens on target. Used to "look at" something rather than "arrive at" it.
Sim time vs wall time¶
Gazebo publishes /clock. A simulated wall clock that ticks with the
simulation, not with your computer's real-time clock. Most ROS nodes
in this workspace are set with use_sim_time: true so timestamps on
TFs and topics all share that simulated clock.
If you forget use_sim_time: true on a node, it'll stamp its messages
with real wall time (seconds since 1970 = ~1.78e9), while everyone
else is on Gazebo time (~7000). The two never line up → every
synchronisation is broken → things look mysteriously frozen.
How everything connects, one more time¶
flowchart LR
CAM[Camera image] --> YOLO
YOLO --> POSE[Pose estimator]
POSE -- broadcasts TF --> TFT[TF tree]
TFT --> CLUSTER[cluster_tf]
CLUSTER -- stable TF --> TFT
TFT --> BT
BT -- Locomotion goal --> LAS[Locomotion action server]
LAS -- setpoint @ 1 Hz --> MAVROS
MAVROS --> ArduSub
ArduSub -- thruster cmd --> GZ[Gazebo plugin]
GZ -- physics --> POSE
GZ -- ground truth --> GT[ground_truth_to_mavros]
GT -- map→base_link TF --> TFT
GT -- odom --> MAVROS
The arrows close into a loop. That's the closed-loop control system the mission code is sitting on top of.
Where to go next¶
- Architecture: the same diagram but with per-layer responsibilities and contracts.
- Running the sim: copy-paste commands to actually launch everything.
- Conventions: frames, signs, units, naming rules.
- Strategies: what each mission actually does, leg by leg.