A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy

1Yale University, *Equal Contribution

Top row shows third-person view of each task. Bottom row shows the viewpoint-arm view.

Abstract

Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipulation. However, it remains unclear which state-action space is most suitable for this complex learning process. To enable manipulation with dynamic viewpoints and to better understand impacts from different state-action spaces on this policy learning process, we conduct a comparative study on the state-action spaces for policy learning and their impacts on the performance of visuomotor policies that integrate viewpoint selection with manipulation. Specifically, we examine the configuration space of the robotic system, the end-effector space with a dual-arm Inverse Kinematics (IK) solver, and the reduced end-effector space with a look-at IK solver to optimize rotation for viewpoint selection. We also assess variants with different rotation representations. Our results demonstrate that state-action spaces utilizing Euler angles with the look-at IK achieve superior task success rates compared to other spaces. Further analysis suggests that these performance differences are driven by inherent variations in the high-frequency components across different state-action spaces and rotation representations.

Video

State-Action Spaces

Overview

We explore different state-action spaces, including configuration space, end-effector space, and look-at space. We also include variants of these state-action spaces with different rotation representations in our comparison.


Dual-Arm End-Effector IK

To facilitate policy learning in the end-effector space, we use a dual-arm end-effector IK solver. Dual-arm end-effector IK computes the joint configurations for both robot arms to achieve specified positions and orientations of their end-effectors simultaneously within the workspace.

End-Effecto Space.

Look-At IK

To further reduce the dimensionality of the state-action space, we introduce a look-at IK solver for the look-at space. Look-at IK extends this by automatically determining the orientation of the viewpoint end-effector to focus on the manipulation task.

Look-At Space.

BibTeX

@article{sun2024comparative,
  title={A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy},
  author={Sun, Xiatao and Fan, Francis and Chen, Yinxing and Rakita, Daniel},
  journal={arXiv preprint arXiv:2409.14615},
  year={2024}
}