Forget about the 'extrinsics' for a second and assume a 3D point x in a camera's coordinate frame.I will use x as a real world 3D point, and y as a 2D point after projecting onto the image plane, but still in units of meters, not yet converted to pixels. For this comment, I will use u and v as pixel coordinates since that's a common notation (and we don't need to care about the image size in this question). First you're using u and v as the total size of the image, but then you use it in conjunction with d as a coordinate. Look up some university lecture slides on pinhole camera projection model to refine these concepts in your head.īefore I continue, your notation is a bit wonky.