Touch in the Wild

Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper

Xinyue Zhu*, Binghao Huang*, Yunzhu Li

*Equal contribution

     

Best Demo Award at RSS 2025 Workshop on Robot Hardware-Aware Intelligence [Link]

In the wild tactile interaction data can help with fine-grained manipulation!

(Under human disturbances and visual occlusions, the robot relies on tactile feedback to guide its decisions.)


Summary Video

Visuo-Tactile Hardware Design

Visuo-Tactile Gripper Design

Sensor Fabrication and PCB

In-the-Wild Data Collection

We collected over 2,700 demonstrations covering 43 manipulation tasks across 12 indoor and outdoor environments. This provided us with more than 2.6 million visuo-tactile pairs for Visuo-Tactile Pretraining & Downstream Imitation Learning.

Visuo-Tactile Pretraining & Downstream Imitation Learning

We pretrain on a large corpus of image-tactile pairs using a cross-attention mechanism. The model learns to reconstruct tactile images conditioned on masked tactile inputs and associated camera images. This pretraining yields a joint visuo-tactile representation, which is then combined with robot proprioceptive states and used as input for downstream manipulation tasks.

Pretraining Illustration


Autonomous Policy Eexcution

Tasks Requiring In-Hand State Information

(1) Transparent Tube Collection. The robot must pick up a test tube from a box, reorient it in-hand using the test tube rack, and precisely insert it into the test tube rack.
(2) Pencil Insertion. The robot needs to insert a pencil into a sharpener. Since the pencil is initially grasped upright and vertically, the robot must first reorient it carefully before performing a precise insertion.

Tasks Requiring Fine-Grained Force Information

(3) Fluid Transfer. The robot uses a pipette to transfer water between containers. It must grasp the pipette firmly, apply just enough pressure to extract liquid without dropping it. Then the robot need to move to the top of other container and gently squeeze to release the water into it.
(4) Whiteboard Erasing. The robot uses a soft eraser to remove two strokes of text from the whiteboard. It must apply the right amount of pressure to erase the marker ink without exceeding force limits that could damage the system. The task requires consistent and controlled force application throughout.

Policy Robustness: Comparison with Baselines

Performance Comparison: Ours vs. Vision-Only Policies

Ours: Successful Reorientation and Insertion

Vision-Only Baseline: Repeated Reorientation

Ours: Successful Expelling Fluid

Vision-Only Baseline: Skip Expelling Fluid

Ours: Clean Erase

Vision-Only Baseline: Unclean Erase

Ours: Reliable Reorientation with Precise In-Hand Info

Vision-Only Baseline: Missed Reorientation



Pretraining Ablations


BibTeX

@article{zhu2025touch,
title={Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper},
author={Zhu, Xinyue and Huang, Binghao and Li, Yunzhu}
booktitle={RSS 2025 Workshop Robot Hardware-Aware Intelligence},
year={2025}
}

Related Works