I have a kinect camera that can move around a certain object. I have computed 3d corresponding points in two consecutive images and got 3*3 rotation matrix and 3*1 translation matrix to convert the first 3d point clouds to the second ones but I need to obtain the camera pose (the location and orientation (yaw,pitch,roll) of the camera) along times and then track it. I don't know how I can use obtained matrix for computing the camera pose.

My gut feeling says that there's not enough information to do this in just the rotation matrix and the translation vector. The distance of the camera (or its projection parameters to get nonarbitrary units) would be also needed.

Thanks Jami but I have the distance between the camera and 3d points at the first position (the center of the camera is (0,0,0)) and I move the camera and again I have the distance between the camera(new position) and new point clouds at the second position. Also I have computed 3*3 rotation matrix and 3*1 translation to register point clouds together.

Hi Mahdi, this is definetly possible. If you have two sets of 3d points, the search for a rotation matrix and a translation is called "registration". The ICP (Iterative closest Point) is the standard approach for this. Check out the PCL C++ Implementation if you would like to do some experiments with this: http://pointclouds.org/documentation/tutorials/iterative_closest_point.php

Hi Eugen. Thank you. I registered two point clouds with each other using feature points methods or ICP and have the rotation matrix and translation matrix. I intend to get the camera pose. it is different from it.

Hmm I think you need to be clearer concerning what you want to achieve.
You said you are able to register two pointclouds (given by two consecutives kinect frames, is it correct?). Which means that you found the relative camera position of your destination point cloud to your source point cloud. Is it not what you want?
You should maybe take a look to some paper like kinectFusion that track the camera motion during a full kinect sequence via a point to plane ICP. All frames are registered in the same coordinate (the one given by the first input frame). There is an opensource implementation on GPU of the tracker in the PCL library (the module is called kinfu). 
Hi Mahdi,
as Damien said, I would also assume that by finding the rotation and translation between two point clouds, you also have (its the same or the inverse) the camera movement between these two frames. 
yes, He says what you said but he didn't care about the coordinates of 3d points.
the coordinates of 3d points for the first point clouds are with respect to the center of the first position of the camera(0,0,0) and also the coordinates of 3d points for the second point clouds are with respect to the center of the second position of the camera(0,0,0). we have two origins. I think the relative camera position is different from the relative rotation matrix between two point clouds. 
Hi Madhi,
Sorry, I still don't really understand the problem but i will try to give more details. The relative rotation and translation matrices between two point clouds are directly linked with the camera motion. Knowing the camera motion and the camera position of the previous acquisition, you can compute easily the camera position of the current acquisition.
Let's assume that you have two depth images D0 and D1 taken at two different camera positions (but not so far from each other that ICP can operate).
Both resulting point clouds (PC0 and PC1) are with respect to each of both camera centers.
The result of ICP is the transformation matrix that express one point cloud to the coordinate of the other one. Let's assume that you actually run ICP in such a way that you can express the point cloud PC1 in the coordinate PC0; Let T1>0 be this 4x4 transformation matrix. Note that the invert of T1>0 is the matrix T0>1 that transform the pointcloud PC0 to the coordinate of PC1 (it is really easy to compute the inverse of such a matrix)
Now let's assume that the camera position C0 (with respect to D0) is centered at (0,0,0) and its viewing ray is pointing to the negative Z axis.
Then the camera position C1 will be centered at
T0>1 * (0;0;0;1)
And its viewing ray will be
T0>1 * (0;0;1;0)
Note that if you want to process a sequence of n frames, you will need to accumulate the transformation matrix.
Hope that it helps. 
Thank you Damien Lefloch . I computed RPE(relative pose error) on RGBD freiburg dataset.

Hi again. I converted all frames to the first frame but whatever I proceed, my fitting becomes worse due to cumulative errors in the rotation and translation. How can I prevent the errors?

Tell me if I am interpreting correctly your problem:
1)you said your system is giving you the relative pose between two successive camera's positions. Let's say X1 is a 3D point coordinate in the camera at time t1 and X2 at time t2, so you have (R,T) such that:
X1=R.X2+T.
2)If (R1,T1) , (R2,T2) etc are the poses of the camera with respect to an arbitrary world coordinate frame (e.g. one in which the 3D points are not moving), then the (Ri,Ti) are exactly what you are looking for.
3)So if X is the coordinate of the previous 3D point in this coordinate frame we have:
X1=R1.X+T1 and
X2=R2.X+T2
hence, X1 = R (R2.X+T2) + T, thus we get
R^t (X1  T) =[R2 T2] . (X;1)
The last equation can be rearranged into Ax = B with x contening R2 and T2, A the R, X1 and the T and finally B contains X. The system can be solved with some least square technique assuming you are providing enough 3D points X.
R,T are as you stated, given by your system. X is also easy to have since the world coordinate frame is chosen. The only not so easy part is to ensure that X is track across de time (which I assumed is the case since you are able to compute the relative pose from the points cloud i.e. the X1,X2,...)
 Views 2697
 Followers 7
 Answers 11
All Answers (11)