Simultaneous Pick and Place Detection by Combining SE(3) Diffusion Models with Differential Kinematics

Grasp detection methods typically target the detection of a set of free-floating hand poses that can grasp the object. However, not all of the detected grasp poses are executable due to physical constraints. Even though it is straightforward to filter invalid grasp poses in the post-process, such a two-staged approach is computationally inefficient, especially when the constraint is hard. In this work, we propose an approach to take the following two constraints into account during the grasp detection stage, namely, (i) the picked object must be able to be placed with a predefined configuration without in-hand manipulation (ii) it must be reachable by the robot under the joint limit and collision-avoidance constraints for both pick and place cases. Our key idea is to train an SE(3) grasp diffusion network to estimate the noise in the form of spatial velocity, and constrain the denoising process by a multi-target differential inverse kinematics with an inequality constraint, so that the states are guaranteed to be reachable and placement can be performed without collision. In addition to an improved success ratio, we experimentally confirmed that our approach is more efficient and consistent in computation time compared to a naive two-stage approach.
View on arXiv@article{ko2025_2504.19502, title={ Simultaneous Pick and Place Detection by Combining SE(3) Diffusion Models with Differential Kinematics }, author={ Tianyi Ko and Takuya Ikeda and Koichi Nishiwaki }, journal={arXiv preprint arXiv:2504.19502}, year={ 2025 } }