Module 4: Advanced Lesson 2 of 3

Coordinate Frames

A real robot doesn’t have just one coordinate system — it has many. The world has a fixed frame, the robot base has its own frame, each joint defines a frame, the camera sees in its own frame, and the gripper tip has yet another. Every piece of data — positions, velocities, forces — is expressed relative to some frame, and confusing which frame you’re in is one of the most common (and dangerous) bugs in robotics software.

This lesson is about managing that complexity rigorously.

What Is a Coordinate Frame?

A coordinate frame (or reference frame) is an origin point plus a set of orthogonal basis vectors that define directions. In 3D, a frame {A}\{A\} consists of:

A point p\mathbf{p} can be described in frame {A}\{A\} or frame {B}\{B\}, and the coordinates will generally be different numbers representing the same physical point.

We write Ap{}^A\mathbf{p} to mean “the coordinates of point p\mathbf{p} expressed in frame {A}\{A\}.”

Transforms Between Frames

The homogeneous transformation TBAT_B^A (or equivalently ATB{}^A T_B) converts a point from frame {B}\{B\} coordinates to frame {A}\{A\} coordinates:

Ap=TBA  Bp{}^A\mathbf{p} = T_B^A \; {}^B\mathbf{p}

This 4×44 \times 4 matrix encodes both the rotation and translation of frame {B}\{B\} relative to frame {A}\{A\}:

TBA=[RBAdBA0T1]T_B^A = \begin{bmatrix} R_B^A & \mathbf{d}_B^A \\ \mathbf{0}^T & 1 \end{bmatrix}

where:

Notation Varies Across Textbooks

There is no universal standard for transform notation. Common conventions include:

NotationMeaning
TBAT_B^A or ATB{}^AT_BTransforms points from {B}\{B\} to {A}\{A\}
TABT_{AB}Ambiguous — some books mean ABA \to B, others mean BAB \to A

In this course, TBAT_B^A always means “converts from {B}\{B\} to {A}\{A\}.” Read it as: “the transform of frame {B}\{B\} expressed in frame {A}\{A\}.” Always verify the convention when reading papers or using libraries.

Chaining Transforms

To go from frame {C}\{C\} to frame {A}\{A\} via frame {B}\{B\}, chain the transforms:

TCA=TBATCBT_C^A = T_B^A \cdot T_C^B

Read right-to-left: first convert from {C}\{C\} to {B}\{B\}, then from {B}\{B\} to {A}\{A\}.

The subscript/superscript indices “cancel” like fractions: TBATCB=TCAT_{\cancel{B}}^A \cdot T_C^{\cancel{B}} = T_C^A. This mnemonic helps catch mistakes — if adjacent indices don’t match, the chain is wrong.

For a full kinematic chain with nn frames:

Tn0=T10T21T32Tnn1T_n^0 = T_1^0 \cdot T_2^1 \cdot T_3^2 \cdots T_n^{n-1}

Inverse Transforms

To go the other direction: TAB=(TBA)1T_A^B = (T_B^A)^{-1}.

For homogeneous transforms, the inverse has a cheap closed-form (from Module 3):

(TBA)1=[(RBA)T(RBA)TdBA0T1](T_B^A)^{-1} = \begin{bmatrix} (R_B^A)^T & -(R_B^A)^T \mathbf{d}_B^A \\ \mathbf{0}^T & 1 \end{bmatrix}

No need for general matrix inversion — just transpose the rotation and adjust the translation.

Common Frames in Robotics

A typical robotic system involves several standard frames:

World Frame {W}\{W\}

The global or map frame. Fixed to the environment. All global planning and mapping happens here. Gravity points along z-z (or y-y, depending on convention).

Base Frame {B}\{B\}

Fixed to the robot’s base. For a mobile robot, it moves with the robot. For a fixed manipulator, it’s often coincident with the world frame.

Joint Frames {Ji}\{J_i\}

Each joint in a serial manipulator defines a frame according to the DH convention (from Module 3). The chain TJ1BTJ2J1TJnJn1T_{J_1}^B \cdot T_{J_2}^{J_1} \cdots T_{J_n}^{J_{n-1}} gives forward kinematics.

Tool/End-Effector Frame {T}\{T\}

Attached to the gripper or tool tip. This is what you want to control — positioning {T}\{T\} at the desired pose in {W}\{W\} is the goal of motion planning.

Sensor Frames

Each sensor has its own frame:

Camera-to-World Transform Robotics Application

A camera mounted on a robot arm sees an object at position Cp=(0.1,0.05,0.8){}^C\mathbf{p} = (0.1, -0.05, 0.8) m in its local frame. To find the object’s world position:

Wp=TBWTEEBTCEECp{}^W\mathbf{p} = T_B^W \cdot T_{EE}^B \cdot T_C^{EE} \cdot {}^C\mathbf{p}

The chain: camera → end-effector → base → world. Each transform comes from:

  • TCEET_C^{EE}: camera extrinsic calibration (measured once, fixed)
  • TEEBT_{EE}^B: robot forward kinematics (computed from joint encoders)
  • TBWT_B^W: robot localization (from SLAM or known placement)

An error in any of these transforms propagates to the final world-frame position.

Frame Trees

In a complex system, frames form a tree structure rooted at the world frame. Every frame has exactly one parent, and the path from any frame to any other frame is unique.

{World}
├── {Map}
│   └── {Odom}
│       └── {Base}
│           ├── {Joint1}
│           │   └── {Joint2}
│           │       └── {EE}
│           │           └── {Tool}
│           ├── {Camera}
│           ├── {LiDAR}
│           └── {IMU}
└── {Target}

To transform between any two frames, walk up the tree to their common ancestor, then back down:

TTargetCamera=(TCameraWorld)1TTargetWorldT_{\text{Target}}^{\text{Camera}} = (T_{\text{Camera}}^{\text{World}})^{-1} \cdot T_{\text{Target}}^{\text{World}}

Or equivalently: TTargetCamera=TWorldCameraTTargetWorldT_{\text{Target}}^{\text{Camera}} = T_{\text{World}}^{\text{Camera}} \cdot T_{\text{Target}}^{\text{World}}.

ROS tf2 — The Standard Frame Manager Robotics Application

In ROS (Robot Operating System), the tf2 library manages the frame tree automatically:

  • Each node publishes transforms (e.g., the SLAM node publishes TBaseMapT_{\text{Base}}^{\text{Map}})
  • Any node can query the transform between any two frames at any time
  • tf2 handles the chaining, inversion, and time synchronization internally

This is why consistent frame conventions matter in practice. If one node publishes its transform using a different convention than what another node expects, the result is a subtle, hard-to-debug spatial error — the robot reaches to the wrong location, the map drifts, or the obstacle detector gives false positives.

Extrinsic Calibration

Extrinsic calibration is the process of determining the fixed transform between two frames — most commonly between a sensor and the robot body.

Hand-Eye Calibration

A camera is mounted on a robot’s end-effector. You know:

You need to find the unknown fixed transform TCEET_C^{EE} (camera relative to end-effector). The key observation: for any pose ii, the chain from calibration target to base through the camera and end-effector must equal the fixed (unknown) target-to-base transform:

TEE,iBTCEETtarget,iC=TtargetB=constT_{EE,i}^B \cdot T_C^{EE} \cdot T_{\text{target},i}^C = T_{\text{target}}^B = \text{const}

By taking two poses i,ji, j and eliminating the unknown constant, you get the classic AX=XBAX = XB problem, where AA is the relative end-effector motion and BB is the relative camera observation. It requires at least two poses with non-parallel rotation axes, and in practice you use 10+ poses with a least-squares solver.

Why Calibration Matters

A 1° rotation error in a camera extrinsic calibration mounted 1 meter from the robot base translates to ~1.7 cm position error at the base. For a camera looking at objects 5 meters away, that same 1° error becomes ~8.7 cm. Calibration accuracy directly limits task accuracy.

LiDAR-to-IMU Calibration

For autonomous vehicles, the LiDAR and IMU frames must be precisely aligned. Errors here cause the point cloud to “smear” when the vehicle turns, corrupting the map. The fixed transform TLIT_L^I is typically found by:

  1. Collecting data while performing varied motions (turns, accelerations)
  2. Optimizing TLIT_L^I to minimize inconsistency between the LiDAR-based motion estimate and the IMU-based motion estimate

Velocity and Force Transformations

Transforms don’t just apply to positions. Velocities and forces also need frame conversions, but the rules differ.

Velocity Transform (Adjoint)

A twist (linear + angular velocity) in frame {B}\{B\} transforms to frame {A}\{A\} via the adjoint of TBAT_B^A:

VA=AdTBA  VB\mathcal{V}^A = \text{Ad}_{T_B^A} \; \mathcal{V}^B

The adjoint is a 6×66 \times 6 matrix — not the same as simply applying TT to a position vector. This is because angular velocity doesn’t transform like a point.

Force/Wrench Transform

A wrench (force + torque) transforms with the inverse transpose of the adjoint. Forces and velocities are dual — they transform differently to preserve the power relationship P=FTVP = \mathcal{F}^T \mathcal{V}.

Don't Transform Velocities Like Positions

A common mistake: applying a homogeneous transform directly to a velocity vector. This is wrong. Velocities are not points — they don’t have a position component, and the translational part of TT doesn’t apply the same way. Use the adjoint representation, or carefully separate the rotation (which does apply) from the translation (which generates a cross-product coupling term).

Common Pitfalls

1. Frame Mismatch

Combining data from different frames without transforming first. Every vector and matrix has an implicit frame — treat frame labels as rigorously as you treat units (meters vs. millimeters).

2. Transform Direction

Applying TBAT_B^A when you needed TABT_A^B (or vice versa). If the result looks mirrored or the robot moves in the opposite direction, check the transform direction.

3. Pre-Multiply vs. Post-Multiply

For body-fixed (intrinsic) operations, post-multiply: Tnew=ToldTdeltaT_{\text{new}} = T_{\text{old}} \cdot T_{\text{delta}}. For world-fixed (extrinsic) operations, pre-multiply: Tnew=TdeltaToldT_{\text{new}} = T_{\text{delta}} \cdot T_{\text{old}}.

4. Stale Transforms

On a moving robot, transforms change over time. Using a camera image from time t1t_1 with a robot pose from time t2t_2 introduces error proportional to the robot’s velocity and the time gap.

Controls

Frame {B} relative to {A}

-3.1415926535897933.141592653589793
-33
-33

Point in frame {B}

-22
-22

Transform T_B^A

[0.71-0.712.00]
[0.710.711.00]
[0.000.001.00]

Point Coordinates

In {A}: (2.35, 2.06)
In {B}: (1.00, 0.50)

Try this: Move frame {B} around and watch how the same physical point has different coordinates in each frame. The transform T_B^A converts coordinates from {B} to {A}: multiply T_B^A by the point's {B}-coordinates to get its {A}-coordinates.

Practice Problems

  1. Frames {A}\{A\}, {B}\{B\}, {C}\{C\} are arranged such that TBA=[Rz(90°)(1,0,0)T01]T_B^A = \begin{bmatrix} R_z(90°) & (1, 0, 0)^T \\ 0 & 1\end{bmatrix} and TCB=[I(0,2,0)T01]T_C^B = \begin{bmatrix} I & (0, 2, 0)^T \\ 0 & 1\end{bmatrix}. What is TCAT_C^A?

  2. A camera sees an object at Cp=(0,0,1.5){}^C\mathbf{p} = (0, 0, 1.5) m. The camera-to-base transform is TCB=[I(0.5,0,0.8)T01]T_C^B = \begin{bmatrix} I & (0.5, 0, 0.8)^T \\ 0 & 1\end{bmatrix}. What is Bp{}^B\mathbf{p}?

  3. You have transforms TBAT_B^A, TCBT_C^B, and TDCT_D^C. Write the expression for TADT_A^D (from {A}\{A\} to {D}\{D\}).

  4. A LiDAR measures a wall point at Lp=(3,0,0){}^L\mathbf{p} = (3, 0, 0). The LiDAR is mounted 0.2 m above and 0.1 m forward of the robot base, with no rotation offset. Write TLBT_L^B and compute Bp{}^B\mathbf{p}.

  5. A mobile robot has base-to-world transform TBWT_B^W and a camera with TCBT_C^B. The camera sees an AprilTag at Cptag{}^C\mathbf{p}_{\text{tag}}. Write the full expression for the tag position in the world frame.

Answers
  1. TCA=TBATCBT_C^A = T_B^A \cdot T_C^B. With Rz(90°)R_z(90°), this rotation maps (0,2,0)(0,2,0) to (2,0,0)(-2,0,0), and adds the translation (1,0,0)(1,0,0). Result: TCAT_C^A has rotation Rz(90°)R_z(90°) and translation (1,0,0)T(-1, 0, 0)^T. The point at {C}\{C\}‘s origin maps to (1,0,0)(-1, 0, 0) in {A}\{A\}.

  2. Bp=TCBCp=[I(0.5,0,0.8)T](0,0,1.5,1)T=(0.5,0,2.3){}^B\mathbf{p} = T_C^B \cdot {}^C\mathbf{p} = \begin{bmatrix} I & (0.5, 0, 0.8)^T \end{bmatrix} (0, 0, 1.5, 1)^T = (0.5, 0, 2.3) m.

  3. TAD=(TDA)1T_A^D = (T_D^A)^{-1} where TDA=TBATCBTDCT_D^A = T_B^A \cdot T_C^B \cdot T_D^C. So TAD=(TBATCBTDC)1=(TDC)1(TCB)1(TBA)1T_A^D = (T_B^A \cdot T_C^B \cdot T_D^C)^{-1} = (T_D^C)^{-1} (T_C^B)^{-1} (T_B^A)^{-1}.

  4. TLB=[I(0.1,0,0.2)T0T1]T_L^B = \begin{bmatrix} I & (0.1, 0, 0.2)^T \\ 0^T & 1\end{bmatrix}. Bp=TLBLp=(3.1,0,0.2){}^B\mathbf{p} = T_L^B \cdot {}^L\mathbf{p} = (3.1, 0, 0.2) m.

  5. Wptag=TBWTCBCptag{}^W\mathbf{p}_{\text{tag}} = T_B^W \cdot T_C^B \cdot {}^C\mathbf{p}_{\text{tag}}.

Key Takeaways

  1. Every measurement lives in some frame — always track which one
  2. TBAT_B^A converts points from frame {B}\{B\} to frame {A}\{A\}; chain by matching inner indices
  3. Frames form a tree; transforming between any two frames follows a unique path through the tree
  4. Extrinsic calibration determines the fixed transforms between sensors and the robot body
  5. Velocities and forces transform differently from positions — use the adjoint, not the raw transform
  6. Frame errors are among the most common and hardest-to-debug issues in robotics software

Next Steps

You can now manage multiple coordinate frames confidently. The final lesson in this module tackles rotation representations — axis-angle, rotation vectors, and quaternions — representations that overcome Euler angle limitations and enable smooth interpolation for motion planning.