Touch in the Wild

Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper

Xinyue Zhu*, Binghao Huang*, Yunzhu Li

*Equal contribution

     

Best Demo Award at RSS 2025 Workshop on Robot Hardware-Aware Intelligence [Link]

In-the-wild tactile interaction data can help with fine-grained manipulation!

(Under human disturbances and visual occlusions, the robot relies on tactile feedback to guide its decisions.)


Summary Video

Visuo-Tactile Hardware Design

Visuo-Tactile Gripper Design

Sensor Fabrication and PCB

In-the-Wild Data Collection

We collected over 2,700 demonstrations covering 43 manipulation tasks across 12 indoor and outdoor environments. This provided us with more than 2.6 million visuo-tactile pairs for Visuo-Tactile Pretraining & Downstream Imitation Learning.

Visuo-Tactile Pretraining & Downstream Imitation Learning

We pretrain on a large corpus of image-tactile pairs using a cross-attention mechanism. The model learns to reconstruct tactile images conditioned on masked tactile inputs and associated camera images. This pretraining yields a joint visuo-tactile representation, which is then combined with robot proprioceptive states and used as input for downstream manipulation tasks.

Pretraining Illustration


Autonomous Policy Execution

Tasks Requiring In-Hand State Information

(1) Test Tube Collection. The robot must pick up a test tube from a box, reorient it in-hand using the test tube rack, and precisely insert it into the test tube rack.
(2) Pencil Insertion. The robot needs to insert a pencil into a sharpener. Because the pencil is initially grasped with a tilt, the robot must first reorient it so that it is parallel to the gripper before making the precise insertion.

Tasks Requiring Fine-Grained Force Information

(3) Fluid Transfer. The robot uses a pipette to transfer fluid between containers. It must grasp the pipette firmly, apply just enough pressure to extract liquid without dropping it. Then the robot needs to move to the top of the other container and gently squeeze to release the fluid into it.
(4) Whiteboard Erasing. The robot uses a soft eraser to remove two strokes of text from the whiteboard. It must apply the right amount of pressure to erase the marker ink without exceeding force limits that could damage the system. The task requires consistent and controlled force application throughout.


Robustness Evaluation


(1) Test Tube Collection

Under human disturbances, the policy uses tactile feedback to determine whether to reorient or insert the test tube, based on whether the tactile signal is tilted or upright.

(2) Whiteboard Erasing

Under human disturbances, the policy reliably detects and erases any newly written text in real time.



Policy Robustness: Comparison with Baselines

Performance Comparison: Ours vs. Vision-Only Policies

Ours: Successful Reorientation and Insertion

Vision-Only Baseline: Repeated Reorientation

Ours: Successful Expelling Fluid

Vision-Only Baseline: Skip Expelling Fluid

Ours: Clean Erase

Vision-Only Baseline: Unclean Erase

Ours: Reliable Reorientation with Precise In-Hand Information

Vision-Only Baseline: Missed Reorientation



Pretraining Ablations


BibTeX

@article{zhu2025touch,
title={Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper},
author={Zhu, Xinyue and Huang, Binghao and Li, Yunzhu},
booktitle={RSS 2025 Workshop Robot Hardware-Aware Intelligence},
year={2025},
}

Related Works