Coordinate Frames
A real robot doesn’t have just one coordinate system — it has many. The world has a fixed frame, the robot base has its own frame, each joint defines a frame, the camera sees in its own frame, and the gripper tip has yet another. Every piece of data — positions, velocities, forces — is expressed relative to some frame, and confusing which frame you’re in is one of the most common (and dangerous) bugs in robotics software.
This lesson is about managing that complexity rigorously.
What Is a Coordinate Frame?
A coordinate frame (or reference frame) is an origin point plus a set of orthogonal basis vectors that define directions. In 3D, a frame consists of:
- An origin — the “zero point”
- Three unit vectors — the axis directions
A point can be described in frame or frame , and the coordinates will generally be different numbers representing the same physical point.
We write to mean “the coordinates of point expressed in frame .”
Transforms Between Frames
The homogeneous transformation (or equivalently ) converts a point from frame coordinates to frame coordinates:
This matrix encodes both the rotation and translation of frame relative to frame :
where:
- is the rotation: the columns are the unit vectors of ‘s axes expressed in
- is the position of ‘s origin expressed in
Notation Varies Across Textbooks
There is no universal standard for transform notation. Common conventions include:
| Notation | Meaning |
|---|---|
| or | Transforms points from to |
| Ambiguous — some books mean , others mean |
In this course, always means “converts from to .” Read it as: “the transform of frame expressed in frame .” Always verify the convention when reading papers or using libraries.
Chaining Transforms
To go from frame to frame via frame , chain the transforms:
Read right-to-left: first convert from to , then from to .
The subscript/superscript indices “cancel” like fractions: . This mnemonic helps catch mistakes — if adjacent indices don’t match, the chain is wrong.
For a full kinematic chain with frames:
Inverse Transforms
To go the other direction: .
For homogeneous transforms, the inverse has a cheap closed-form (from Module 3):
No need for general matrix inversion — just transpose the rotation and adjust the translation.
Common Frames in Robotics
A typical robotic system involves several standard frames:
World Frame
The global or map frame. Fixed to the environment. All global planning and mapping happens here. Gravity points along (or , depending on convention).
Base Frame
Fixed to the robot’s base. For a mobile robot, it moves with the robot. For a fixed manipulator, it’s often coincident with the world frame.
Joint Frames
Each joint in a serial manipulator defines a frame according to the DH convention (from Module 3). The chain gives forward kinematics.
Tool/End-Effector Frame
Attached to the gripper or tool tip. This is what you want to control — positioning at the desired pose in is the goal of motion planning.
Sensor Frames
Each sensor has its own frame:
- Camera frame : origin at the optical center, z-axis along the viewing direction
- LiDAR frame : origin at the scanner, measurements in its local coordinates
- IMU frame : measures accelerations and angular velocities in its own axes
Camera-to-World Transform Robotics Application
A camera mounted on a robot arm sees an object at position m in its local frame. To find the object’s world position:
The chain: camera → end-effector → base → world. Each transform comes from:
- : camera extrinsic calibration (measured once, fixed)
- : robot forward kinematics (computed from joint encoders)
- : robot localization (from SLAM or known placement)
An error in any of these transforms propagates to the final world-frame position.
Frame Trees
In a complex system, frames form a tree structure rooted at the world frame. Every frame has exactly one parent, and the path from any frame to any other frame is unique.
{World}
├── {Map}
│ └── {Odom}
│ └── {Base}
│ ├── {Joint1}
│ │ └── {Joint2}
│ │ └── {EE}
│ │ └── {Tool}
│ ├── {Camera}
│ ├── {LiDAR}
│ └── {IMU}
└── {Target}
To transform between any two frames, walk up the tree to their common ancestor, then back down:
Or equivalently: .
ROS tf2 — The Standard Frame Manager Robotics Application
In ROS (Robot Operating System), the tf2 library manages the frame tree automatically:
- Each node publishes transforms (e.g., the SLAM node publishes )
- Any node can query the transform between any two frames at any time
- tf2 handles the chaining, inversion, and time synchronization internally
This is why consistent frame conventions matter in practice. If one node publishes its transform using a different convention than what another node expects, the result is a subtle, hard-to-debug spatial error — the robot reaches to the wrong location, the map drifts, or the obstacle detector gives false positives.
Extrinsic Calibration
Extrinsic calibration is the process of determining the fixed transform between two frames — most commonly between a sensor and the robot body.
Hand-Eye Calibration
A camera is mounted on a robot’s end-effector. You know:
- from forward kinematics (changes with joint angles)
- from the camera (sees a calibration target)
You need to find the unknown fixed transform (camera relative to end-effector). The key observation: for any pose , the chain from calibration target to base through the camera and end-effector must equal the fixed (unknown) target-to-base transform:
By taking two poses and eliminating the unknown constant, you get the classic problem, where is the relative end-effector motion and is the relative camera observation. It requires at least two poses with non-parallel rotation axes, and in practice you use 10+ poses with a least-squares solver.
Why Calibration Matters
A 1° rotation error in a camera extrinsic calibration mounted 1 meter from the robot base translates to ~1.7 cm position error at the base. For a camera looking at objects 5 meters away, that same 1° error becomes ~8.7 cm. Calibration accuracy directly limits task accuracy.
LiDAR-to-IMU Calibration
For autonomous vehicles, the LiDAR and IMU frames must be precisely aligned. Errors here cause the point cloud to “smear” when the vehicle turns, corrupting the map. The fixed transform is typically found by:
- Collecting data while performing varied motions (turns, accelerations)
- Optimizing to minimize inconsistency between the LiDAR-based motion estimate and the IMU-based motion estimate
Velocity and Force Transformations
Transforms don’t just apply to positions. Velocities and forces also need frame conversions, but the rules differ.
Velocity Transform (Adjoint)
A twist (linear + angular velocity) in frame transforms to frame via the adjoint of :
The adjoint is a matrix — not the same as simply applying to a position vector. This is because angular velocity doesn’t transform like a point.
Force/Wrench Transform
A wrench (force + torque) transforms with the inverse transpose of the adjoint. Forces and velocities are dual — they transform differently to preserve the power relationship .
Don't Transform Velocities Like Positions
A common mistake: applying a homogeneous transform directly to a velocity vector. This is wrong. Velocities are not points — they don’t have a position component, and the translational part of doesn’t apply the same way. Use the adjoint representation, or carefully separate the rotation (which does apply) from the translation (which generates a cross-product coupling term).
Common Pitfalls
1. Frame Mismatch
Combining data from different frames without transforming first. Every vector and matrix has an implicit frame — treat frame labels as rigorously as you treat units (meters vs. millimeters).
2. Transform Direction
Applying when you needed (or vice versa). If the result looks mirrored or the robot moves in the opposite direction, check the transform direction.
3. Pre-Multiply vs. Post-Multiply
For body-fixed (intrinsic) operations, post-multiply: . For world-fixed (extrinsic) operations, pre-multiply: .
4. Stale Transforms
On a moving robot, transforms change over time. Using a camera image from time with a robot pose from time introduces error proportional to the robot’s velocity and the time gap.
Frame {B} relative to {A}
Point in frame {B}
Transform T_B^A
| [ | 0.71 | -0.71 | 2.00 | ] |
| [ | 0.71 | 0.71 | 1.00 | ] |
| [ | 0.00 | 0.00 | 1.00 | ] |
Point Coordinates
Try this: Move frame {B} around and watch how the same physical point has different coordinates in each frame. The transform T_B^A converts coordinates from {B} to {A}: multiply T_B^A by the point's {B}-coordinates to get its {A}-coordinates.
Practice Problems
-
Frames , , are arranged such that and . What is ?
-
A camera sees an object at m. The camera-to-base transform is . What is ?
-
You have transforms , , and . Write the expression for (from to ).
-
A LiDAR measures a wall point at . The LiDAR is mounted 0.2 m above and 0.1 m forward of the robot base, with no rotation offset. Write and compute .
-
A mobile robot has base-to-world transform and a camera with . The camera sees an AprilTag at . Write the full expression for the tag position in the world frame.
Answers
-
. With , this rotation maps to , and adds the translation . Result: has rotation and translation . The point at ‘s origin maps to in .
-
m.
-
where . So .
-
. m.
-
.
Key Takeaways
- Every measurement lives in some frame — always track which one
- converts points from frame to frame ; chain by matching inner indices
- Frames form a tree; transforming between any two frames follows a unique path through the tree
- Extrinsic calibration determines the fixed transforms between sensors and the robot body
- Velocities and forces transform differently from positions — use the adjoint, not the raw transform
- Frame errors are among the most common and hardest-to-debug issues in robotics software
Next Steps
You can now manage multiple coordinate frames confidently. The final lesson in this module tackles rotation representations — axis-angle, rotation vectors, and quaternions — representations that overcome Euler angle limitations and enable smooth interpolation for motion planning.