Touch in the Wild

Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper

Xinyue Zhu^*, Binghao Huang^*, Yunzhu Li

^*Equal contribution

Best Demo Award at RSS 2025 Workshop on Robot Hardware-Aware Intelligence [Link]

Paper arXiv Poster Video Twitter Code

In-the-wild tactile interaction data can help with fine-grained manipulation!

(Under human disturbances and visual occlusions, the robot relies on tactile feedback to guide its decisions.)

Summary Video

Visuo-Tactile Hardware Design

Reproduction Resources

Hardware
Codebase

Tactile Sensor
Fabrication

Visuo-Tactile Gripper Design

In-the-Wild Data Collection

We collected over 2,700 demonstrations covering 43 manipulation tasks across 12 indoor and outdoor environments. This provided us with more than 2.6 million visuo-tactile pairs for Visuo-Tactile Pretraining & Downstream Imitation Learning.

Visuo-Tactile Pretraining & Downstream Imitation Learning

We pretrain on a large corpus of image-tactile pairs using a cross-attention mechanism. The model learns to reconstruct tactile images conditioned on masked tactile inputs and associated camera images. This pretraining yields a joint visuo-tactile representation, which is then combined with robot proprioceptive states and used as input for downstream manipulation tasks.

Autonomous Policy Execution

Tasks Requiring In-Hand State Information

(1) Test Tube Collection. The robot must pick up a test tube from a box, reorient it in-hand using the test tube rack, and precisely insert it into the test tube rack.

(2) Pencil Insertion. The robot needs to insert a pencil into a sharpener. Because the pencil is initially grasped with a tilt, the robot must first reorient it so that it is parallel to the gripper before making the precise insertion.

Tasks Requiring Fine-Grained Force Information

(3) Fluid Transfer. The robot uses a pipette to transfer fluid between containers. It must grasp the pipette firmly, apply just enough pressure to extract liquid without dropping it. Then the robot needs to move to the top of the other container and gently squeeze to release the fluid into it.

(4) Whiteboard Erasing. The robot uses a soft eraser to remove two strokes of text from the whiteboard. It must apply the right amount of pressure to erase the marker ink without exceeding force limits that could damage the system. The task requires consistent and controlled force application throughout.

Robustness Evaluation

(1) Test Tube Collection

Under human disturbances, the policy uses tactile feedback to determine whether to reorient or insert the test tube, based on whether the tactile signal is tilted or upright.

(2) Whiteboard Erasing

Under human disturbances, the policy reliably detects and erases any newly written text in real time.

Policy Robustness: Comparison with Baselines

Performance Comparison: Ours vs. Vision-Only Policies

Ours: Successful Reorientation and Insertion

Vision-Only Baseline: Repeated Reorientation

Ours: Successful Expelling Fluid

Vision-Only Baseline: Skip Expelling Fluid

Ours: Clean Erase

Vision-Only Baseline: Unclean Erase

Ours: Reliable Reorientation with Precise In-Hand Information

Vision-Only Baseline: Missed Reorientation

Pretraining Ablations

BibTeX

@article{zhu2025touch,
title={Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper},
author={Zhu, Xinyue and Huang, Binghao and Li, Yunzhu},
booktitle={RSS 2025 Workshop Robot Hardware-Aware Intelligence},
year={2025},
}

Related Works

3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing
Binghao Huang¹, Yixuan Wang¹, Xinyi Yang², Yiyue Luo³, Yunzhu Li¹
Conference on Robot Learning (CoRL), 2024
[Webpage] [Paper] [Hardware Tutorial] [Video]

VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning
Binghao Huang¹, Jie Xu², Iretiayo Akinola², Wei Yang², Balakumar Sundaralingam², Rowland O'Flaherty², Dieter Fox², Xiaolong Wang^2,3, Arsalan Mousavian², Yu-Wei Chao^2†, Yunzhu Li^1†
[Webpage] [Paper]

Learning the Signatures of the Human Grasp Using a Scalable Tactile Glove
Subramanian Sundaram, Petr Kellnhofer, Yunzhu Li, Jun-Yan Zhu, Antonio Torralba, and Wojciech Matusik
Nature 569, 698–702 (2019), 5-year Impact Factor: 54.637
[Project] [Paper] [Code] [BibTex], Collected by MIT Museum
Covered by [MIT News] [Nature News & Views] [Nature communities] [The Economist] [PBS NOVA] [BBC Radio] [NewScientist]