3D-ViTac

Learning Fine-Grained Manipulation with
Visuo-Tactile Sensing

Binghao Huang1, Yixuan Wang1, Xinyi Yang2, Yiyue Luo3, Yunzhu Li1
1Columbia University, 2University of Illinois Urbana-Champaign, 3University of Washington

Conference on Robot Learning (CoRL) 2024

Our Visuo-Tactile Policy can deal with fragile objects and precise in-hand manipulation!

(The robot keeps retrying until it successfully grasps the grapes.)

Abstract

Tactile and visual perception are both crucial for humans to perform fine-grained interactions with their environment. Developing similar multi-modal sensing capabilities for robots can significantly enhance and expand their manipulation skills. This paper introduces 3D-ViTac, a multi-modal sensing and learning system designed for dexterous bimanual manipulation. Our system features tactile sensors equipped with dense sensing units, each covering an area of 3 mm^2. These sensors are low-cost and flexible, providing detailed and extensive coverage of physical contacts, effectively complementing visual information. To integrate tactile and visual data, we fuse them into a unified 3D representation space that preserves their 3D structures and spatial relationships. The multi-modal representation can then be coupled with diffusion policies for imitation learning. Through concrete hardware experiments, we demonstrate that even low-cost robots can perform precise manipulations and significantly outperform vision-only policies, particularly in safe interactions with fragile items and executing long-horizon tasks involving in-hand manipulation.

Summary Video

Tactile Sensor Hardware Design

Tactile Hardware Introduction

Load Test and Flexibility


Visuo-Tactile Points Representation

The blue points are observed by camera while the purple points are tactile points. The points are projected into the same 3D space for policy learning.

3D Data Processing

Visuo-Tactile Points


Capability Experiments

Tasks Requiring Fine-Grained Force Information

(1) Egg Steaming. The robot uses its right hand to open the egg tray first. Then the robot must grasp and place an egg into an egg cooker. Subsequently, the left hand is used to relocate and secure the cooker’s cover over the egg.
(2) Fruit Preparation. The robot uses its left hand to grasp the plate and place it on the table. Subsequently, both robot arms collaborate to open the plastic bag. Then, the right arm grasps a grape or several grapes and places them on the plate.

Tasks Requiring In-Hand State Information

(3) Hex Key Collection. The right hand is required to grasp the Hex Key, and then the left hand needs to grasp the Hex key followed by an in-hand adjustment of the Hex Key’s position using its left hand. Subsequently, the robot is required to accurately insert the Allen wrench into the hole in the box.
(4) Sandwich Serving. First, the right hand is required to grasp the serving spoon. Second, the left hand needs to hold the pot handle and then tilt the pot. The right hand should then retrieve the fried egg from the pot and serve it on the bread.

Policy Robustness: Comparison with Baselines

Comparison with w/ Tactile and w/o Tactile

Ours: Success In-hand Adjustment

Baselines: Fail In-Hand Adjustment

Ours: Safely Grasp Fruit

Baselines: Broken when Grasp Multiple Ones



More Policy Rollout: Fine-grained Single Arm Tasks

Retrieve Light Bulb

Fruit Collection

Insertion

Insertion



Human Interference Test

Special Task

We collect the data that we keep trying to grasp the only single grape in the bag, and we can see the policy could regrasp until there is only one single grape in hand.

BibTeX

@inproceedings{huang20243dvitac,
title={3D ViTac:Learning Fine-Grained Manipulation with Visuo-Tactile Sensing},
author={Huang, Binghao and Wang, Yixuan and Yang, Xinyi and Luo, Yiyue and Li, Yunzhu}
booktitle={Proceedings of Robotics: Conference on Robot Learning(CoRL)},
year={2024}
}