Tactile Sensing in Robot Manipulation (2020β2026)
A multi-source survey across hardware, simulation, perception, policy learning, and foundation models
This survey reviews five years of progress in tactile sensing for robotic manipulation, organized across hardware, simulation, perception, policy learning, and foundation models. We draw on roughly fourteen independent review papers from eight research traditions, using cross-source corroboration where reviewers converge and flagging disagreement where they don't. We supplement these with primary research papers from 2020 through early 2026, including major hardware milestones (DIGIT, Digit 360, AnySkin, FlexiTac, 3D-ViTac), simulators (Tacchi family, DiffTactile, TacIPC, Taccel), tactile foundation models (T3, Sparsh family, UniTouch, TVL, Octopi, AnyTouch), and emerging touch-augmented vision-language-action models.
Introduction
For decades, vision-only manipulation dominated, but the field has hit a wall on contact-rich tasks: insertion, in-hand reorientation, deformable object handling, occluded grasping, and force-modulated tasks. Vision cannot see what the gripper itself occludes, and it cannot directly measure forces. As the ManiSkill-ViTac 2025 challenge organizers framed it, performance in contact-rich scenarios remains far from satisfactory because most works rely on vision alone and lack the ability to detect contact status for decision making.1
The framings used by the most-cited reviews converge on the same central question. Li, Kroemer, Su, Veiga, Kaboli and Ritter, in their 2020 IEEE T-RO review, frame tactile sensing as a hierarchy: raw signals β contact-level information β object-level information β action-level information.2 Luo, Bimbo, Dahiya and Liu organize the field around three object properties β shape, surface material, and object pose β and emphasize the role of tactile in combination with other modalities.3 Both reviews β written years apart, in different research traditions β agree on the practical motivation: tactile is the modality that closes the gap between knowing where an object is and knowing what is happening at the contact.
A persistent tension runs through every honest review. The 2026 IJRR survey on imitation learning for contact-rich tasks puts it bluntly: despite efforts, tactile sensors remain a technology mostly confined to research applications with rare industrial use-cases.4
Why now
Two structural drivers explain the recent acceleration. The industrial humanoid wave (Tesla Optimus, Figure, 1X, Unitree, Apptronik, Sanctuary) needs touch because human-form factories and homes were built around tactile interaction. Independently, the LLM/VLM/VLA boom created a natural appetite for adding a tactile modality to foundation models, the same way audio and depth were added in the 2022β2024 multimodal wave.
The numbers are indicative β keyword choices vary across bibliometric reviews β but the shape is consistent. Lepora notes that 15 reviews on tactile robotics were published in 2024 alone.5 A separate analysis of dexterous-manipulation publications through January 2025 identified eight distinct research categories and noted that haptic and tactile interfaces are under-represented relative to their importance.7
The information hierarchy
The clearest conceptual scaffold for this survey comes from the Li-Kroemer hierarchy, which we adopt as a recurring lens.
The hierarchy is useful diagnostically: when a paper claims "tactile foundation model," ask which level it operates at. T3 and Sparsh sit between raw and contact. Octopi and UniTouch reach the object level. VLA-Touch and Tactile-VLA bridge all four.
Taxonomy of the field
Five research pillars emerge across reviews. The pillars feed each other: a useful new sensor needs a simulator, a perception model, and a learnable policy to matter, plus an application to ground it.
Hardware: the sensor zoo
There is genuine disagreement across reviews about how to slice the sensor space. Three independent reviews give three different vision-based-tactile-sensor (VBTS) taxonomies. Shimonomura's 2019 review classifies camera-based tactile sensors into three types based on contact-module hardware.8 Shah et al.'s 2021 JINT review proposes three design paradigms β waveguide-type, marker displacement-based, and reflective membrane designs.9 Lepora's 2025 classification argues both prior taxonomies overlook designs that combine these principles.10 Materials-science reviews give a different slice again: Jin et al.'s 2023 review of flexible tactile sensors organizes the area around six sensing mechanisms.11
The point is not to pick a winner among taxonomies, but to acknowledge that the underlying design space is still actively being explored.
Sensing principles compared
VBTS wins on resolution; magnetic wins on durability; piezoresistive wins on cost and area; piezoelectric and triboelectric win on frequency response. Different applications genuinely demand different transduction principles β which is part of why the field has not converged on a single sensor.
What's actually winning
| Family | Sources | Strength | Standing problem |
|---|---|---|---|
| Vision-based GelSight, DIGIT, TacTip, GelSlim, Digit 360 | [8]β[10], [14], [15] | High spatial resolution; ML-friendly images | Bulky; gel wear; latency |
| Magnetic ReSkin, AnySkin, eFlesh, uSkin | [4], [16] | Robust, replaceable, cross-instance generalization | Lower resolution than VBTS |
| Flexible piezoresistive STAG-glove, FlexiTac, 3D-ViTac | [11], [13], [17], [18] | Large area, low cost, full-finger coverage | Lower force resolution |
| Capacitive / piezoelectric e-skins | [6], [11], [19] | Whole-body coverage; low power | Wiring complexity; resolution trade-off |
| Triboelectric / self-powered | [6], [11] | Self-powered, vibration sensing | Calibration drift; no DC response |
| MEMS-based force/pressure | [12] | Industrial maturity; small footprint | Less expressive than VBTS |
The big practical hardware story since 2020 has five chapters that the reviews mostly agree on:
- DIGIT (Lambeta et al., 2020) miniaturized vision-based sensing for multi-fingered hands, with an open-sourced design that significantly lowered the barrier to entry.20
- AnySkin (Bhirangi et al., 2024) showed that decoupling sensing electronics from the elastomer interface produces cross-instance policy generalization.16
- 3D-ViTac (Huang et al., CoRL 2024) showed dense flexible piezoresistive arrays at 3 mmΒ² resolution outperform vision-only policies on long-horizon and fragile-object tasks.17
- Digit 360 (Meta + GelSight, 2024) introduced a fingertip with over 18 sensing features.21
- Industrial deployment reached scale: Amazon Vulcan manipulates 75% of the 1M items at the Spokane warehouse using force feedback and AI.22
How a vision-based tactile sensor works
The trick that makes VBTS work is recasting a contact-mechanics problem as a vision problem. Once it's an image, the entire deep-learning toolkit becomes available. That observation unlocked the 2017β2024 VBTS wave.
The standardization problem
Multiple reviews independently flag the same hardware problem: standardization. TLV-CoRe authors observe that tactile sensors are not yet fully standardized; tactile images differ significantly under identical touch conditions due to camera type, lighting, color, and illumination.24 UniVTAC authors (2026) note that the current infrastructure for visuo-tactile manipulation remains underdeveloped, with scarcity of large-scale tactile data severely limiting tactile-centric representation models.25
Simulation and sim-to-real
Sim-to-real for vision is hard but well-trodden. For tactile it is structurally different because what transfers is a physical contact response, not an appearance. The 2025 Tactile Robotics Outlook reviews how data-driven simulation has matured, noting that approaches like image-to-image translation via GANs work for surface tracing and bi-manual manipulation but at the cost of a substantive sim-to-real gap.26
Simulator landscape
| Year | Simulator | Method | Contribution |
|---|---|---|---|
| 2022 | Taxim | Example-based image rendering | Fast GelSight image generation |
| 2022 | TACTO | Pyrender + soft body approx | First open-source flexible VBTS sim |
| 2023 | Tacchi | Material Point Method | Particle-based elastomer; low GPU cost |
| 2024 | DiffTactile | Differentiable FEM | Physics-accurate, gradient-based learning |
| 2024 | TacIPC | Incremental Potential Contact | Numerical stability for friction |
| 2024 | FOTS | Optical | Fast sim-to-real for motor skills |
| 2024 | TacEx | Soft-body in Isaac Sim | Bridges robotics and tactile rendering |
| 2025 | Tacchi 2.0 | Dynamic MPM | Captures press, slip, rotate dynamics |
| 2025 | Taccel | High-performance GPU | Scaling vision-based tactile robotics |
| 2026 | Tac2Real | GPU visuotactile | Online RL with zero-shot real deployment |
The transfer pipeline
The bottleneck has shifted from the simulator (now adequate for many tasks) to the transfer pipeline. Even with a perfect simulator and a real sensor, transfer is mediated by domain randomization, image translation, latent alignment, real fine-tuning, and per-sensor calibration. Most papers use two or three together.
Policy learning: a critical view
I want to be more honest in this section than typical surveys.
Xie & Correll's 2025 "Towards Forceful Robotic Foundation Models" survey reaches a pointed conclusion: while there are tasks like pouring, peg-in-hole insertion, and handling delicate objects where force matters, the performance of imitation learning models is generally not at a level of dynamics where force truly matters.37
Methods landscape
What works and what is early
Sim-to-real RL for in-hand manipulation works for specific tasks. DLR's purely tactile in-hand manipulation work used off-policy deep RL trained in 600 CPU hours, then transferred to a humanoid hand achieving 46+ full rotations of a cube in a single run.38
Imitation learning + visuotactile fusion has dominated since 2022 because diffusion policies and ALOHA-style teleoperation made demonstration collection cheap. 3D-ViTac shows that dense flexible piezoresistive sensors fused with vision in a 3D representation enable bimanual manipulation that significantly outperforms vision-only policies on fragile and long-horizon tasks.17
Force-aware diffusion is the most recent direction. FARM (2025) integrates high-dimensional tactile data to infer tactile-conditioned force signals which define a matching force-based action space.41
Contact-grounding policies are the structural response to Xie & Correll's critique. CGP (2026) argues most prior work uses tactile signals as additional observations rather than modeling contact state or how action outputs interact with low-level controller dynamics.43 FBI (2025) takes a complementary direction: extract tactile information from temporal object motion flow via a dynamics-aware latent model.44
Foundation models for touch
Three families of tactile foundation models coexist. They have different scaling laws and different bottlenecks.
Encoder family
T3 (Zhao et al., 2024) introduced the FoTa dataset β over 3 million data points from 13 sensors and 11 tasks β and showed zero-shot transferability across sensor-task pairings.45 Sparsh (Higuera et al., CoRL 2024) curated ~661K samples and trained foundation models using MAE, DINO/DINOv2, and JEPA.46 Sparsh-X (2025) extends this to a unified backbone fusing image, audio, motion, and pressure signals.47 AnyTouch (Feng et al., 2025) introduced TacQuad with 72,606 contact frames from four different visuo-tactile sensors.48
Binding family
UniTouch (CVPR 2024) connects touch to vision, language and sound by aligning UniTouch embeddings to pretrained image embeddings, with learnable sensor-specific tokens.49 The TVL dataset (ICML 2024 oral) provides 44K in-the-wild vision-touch pairs with English labels, with ~29% improvement over prior models.50 Octopi (NUS, 2024) reasons about tactile inputs β for example, identifying the softer of two avocados via touch and inferring it is ripe via commonsense.51
Policy family
VLA-Touch enhances generalist robot policies with tactile sensing without fine-tuning the base VLA.52 Tactile-VLA goes further by unlocking the VLA's physical knowledge for tactile generalization β generalizing language-based force control by applying force modifiers like "gently" learned from one task to a new task that was only trained on motion.53
The data scarcity problem
The honest picture for foundation models with touch is that a "GPT moment" for tactile manipulation does not yet exist. The biggest single dataset is FoTa at 3M points. This is roughly 1000Γ smaller than the equivalent vision corpora.
Xie & Correll observe that data collection of tactile robot data is an active research problem. The next breakthrough may come from data-collection methods (Touch in the Wild, OpenTouch, FlexiTac-equipped UMI) rather than from architectures.
Application landscape
A useful diagnostic when reading any individual tactile-manipulation paper is asking which application driver it is targeting, and whether the metric of success aligns with that application.
Two things to read off this map. First, research and industrial tactile manipulation are not yet the same field. Industrial deployments mostly use force/torque sensors with simple control rather than the high-resolution VBTS dominating academic literature. Second, the transitional cluster β sub-mm assembly, tool use, fragile-item handling β is where the two will collide first, probably 2027β2029.
Research labs and lineages
Tactile manipulation is genuinely global, but lineages cluster geographically in ways that reflect funding traditions and equipment availability.
Two patterns are worth flagging. First, the four-region split is real: North American groups dominate VBTS and foundation models, European groups dominate biomimetic and active-touch traditions, Asian groups dominate materials science, simulation, and emerging dexterous-hand work. Second, the industry side is concentrated, with Amazon, Toyota Research, GelSight Inc., and the humanoid startups doing most of the deployment-driven work.
Key papers (2020β2026)
A non-exhaustive shortlist of papers that recur across recent surveys, organized by impact area.
| Paper | Year | Venue | Why it matters |
|---|---|---|---|
| DIGIT (Lambeta et al.) | 2020 | RA-L | Open-source, miniaturized GelSight; the workhorse VBTS |
| Touch and Go | 2022 | NeurIPS D&B | First in-the-wild paired vision + touch dataset |
| 3D-ViTac | 2024 | CoRL | Dense flexible visuotactile + 3D fusion + diffusion policy |
| DiffTactile | 2024 | ICLR | Differentiable physics-based tactile sim |
| AnySkin | 2024 | arXiv | Plug-and-play, cross-instance generalizable magnetic sensor |
| Digit 360 (Meta + GelSight) | 2024 | β | First multimodal commercial fingertip sensor |
| T3 | 2024 | NeurIPS / CoRL | Tactile foundation transformer + FoTa dataset |
| Sparsh / Sparsh-X | 2024β25 | CoRL | SSL touch representations across sensor types |
| UniTouch | 2024 | CVPR | Touch β vision/language/sound binding |
| TVL | 2024 | ICML (oral) | Touch + vision + language alignment |
| Octopi / Octopi-1.5 | 2024β25 | RSS | Tactile-language model for property reasoning |
| Tactile-VLA / VLA-Touch | 2025 | arXiv | VLAs with native touch grounding |
| FARM | 2025 | arXiv | Force-aware diffusion policy with GelSight Mini |
| Text2Touch | 2025 | arXiv | LLM-designed rewards for tactile in-hand RL |
| TactileAloha | 2025 | RA-L | Tactile-augmented bimanual ACT |
| Touch in the Wild | 2025 | NeurIPS | Portable visuotactile gripper, large-scale in-the-wild data |
| VT-Refine | 2025 | CoRL | Bimanual assembly via visuotactile sim fine-tuning |
| OpenTouch | 2025 | arXiv | First in-the-wild full-hand egocentric tactile dataset |
| ManiFeel | 2025 | CoRL | First reproducible visuotactile policy benchmark |
| FBI (Flow Before Imitation) | 2025 | arXiv | Learns tactile from visual flow; works without sensors |
| Tacchi 2.0 / Taccel | 2025 | arXiv | Dynamic MPM / GPU-scale tactile sim |
| eFlesh | 2025 | arXiv | Customizable magnetic touch via cut-cell microstructures |
| FlexiTac | 2026 | β | Open-source scalable flexible tactile platform |
| CGP (Contact-Grounded Policy) | 2026 | arXiv | Reframes visuotactile policy learning as contact grounding |
| UniVTAC | 2026 | arXiv | Unified visuotactile sim platform + 8-task benchmark |
Open challenges
Each challenge below is corroborated across multiple independent reviews.
Sensor heterogeneity / standardization
Tactile sensors are not yet fully standardized; tactile images differ significantly under identical touch conditions due to camera type, lighting, color, and illumination. The current infrastructure for visuo-tactile manipulation remains underdeveloped, with scarcity of large-scale tactile data severely limiting tactile-centric representation models. Jin et al.'s materials review identifies six different sensing mechanisms in active use, with no convergence in sight.
Sim-to-real on tactile signals
Direct zero-shot sim-to-real transfer is challenging due to complex nonlinear deformation of soft sensors, motivating two-stage pipelines.
Data scarcity
Data collection of tactile robot data is an active research problem, and the community remains in an exploratory phase where developing new practical data collection methods is as important as refining existing ones.
Generalization across objects
The heart of tactile-based manipulation: how do you train a tactile-based policy that can manipulate unseen and diverse objects?
Whole-body / large-area touch is under-developed
A 2025 humanoid loco-manipulation survey explicitly highlights whole-body tactile sensing as a crucial modality, observing that most robotic hands prioritize dexterity at the expense of payload capacity.
Force is not the same as tactile
Xie & Correll's most useful conceptual point: force and touch are abstract quantities that can be inferred through a wide range of modalities β proprioception, force-torque sensors, and tactile sensors all reveal aspects of the same underlying physical interaction. Conflating these in surveys can mislead.
Where the field disagrees
Reading multiple reviews together surfaces three live disagreements that single-source surveys obscure.
Disagreement 1: Is more sensing always better?
Xie & Correll observe that force and touch are abstract quantities that can be inferred through a wide range of modalities and are often measured and controlled implicitly, implying that better policy learning, not more sensing, may be the bottleneck. Materials-science reviews advocate higher resolution, larger area, more modalities. The FBI paper is on the smarter-learning side; 3D-ViTac's 1024-taxel approach is on the denser-sensing side. Both work. The field has not settled this.
Disagreement 2: Is there a "winning" tactile sensor architecture?
Lepora's 2025 VBTS classification argues marker-based and intensity-based vision sensors are both viable for high-resolution tasks. AnySkin's authors argue magnetic, replaceable skins are the practical winner. 3D-ViTac's authors argue dense flexible piezoresistive arrays are the practical winner. There is no consensus, and the practical winner is likely task-dependent.
Disagreement 3: Is industrial deployment a sign of progress or a distraction?
Amazon Vulcan's force feedback sensors are simple six-axis force-torque, not high-resolution tactile. Reviews from the academic side note that high-resolution tactile sensing β the kind dominating research papers β is barely deployed industrially. The same word, "tactile," covers very different things in research vs. industry.
Forecast
Predictions are calibrated to the strength of evidence behind each claim.
High confidence (multiple independent reviews agree)
- Touch foundation models will grow rapidly through 2027 and likely plateau as the field hits the data-scaling wall around 2028.
- Sim-to-real for tactile will keep improving but at a slower pace than RGB sim-to-real did, because contact physics is fundamentally harder.
- Visuotactile benchmarks (ManiFeel, UniVTAC, ManiSkill-ViTac) will become standard reporting expectations within two years.
- Industrial deployment will use force/torque + simple tactile, not high-resolution VBTS, for at least 3β5 more years.
Medium confidence (one or two reviews supportive)
- Magnetic skins will become the default for deployment-oriented research because of cross-instance generalization.
- Dense flexible piezoresistive arrays will become the default for fingertip + full-finger coverage in research because they integrate cleanly with vision.
- Contact-grounded policies will be the dominant policy framework by 2027β2028.
Contested or speculative
- Whether vision-only "inferred tactile" (FBI-style) will partially replace physical sensors. Genuine open question.
- When whole-body / large-area e-skin will hit production. Could be 2027 with humanoid push, or 2030+ given skin durability and wiring problems.
- Whether a "GPT moment" for tactile manipulation is possible at all. Xie & Correll are skeptical; foundation-model optimists are more sanguine.
Concluding diagnostic
A useful test for any new tactile-manipulation paper, derived from synthesizing the reviews above:
The seven-question test
- Which level of the Li-Kroemer hierarchy does this operate at?
- Which sensing principle, and what trade-offs did it accept?
- Is the simulatorβtransferβreal pipeline complete or partial?
- Does the policy operate in a regime where force is the binding constraint, or only where vision would already suffice?
- Is this in the research-driven, transitional, or deployed cluster of applications?
- Which family of foundation models β encoder, binding, or policy β does it interface with, if any?
- What data did it train on, and how does that compare to the 1000Γ gap with vision?
A paper that addresses all seven explicitly is doing the field a service. A paper that addresses none of them is hard to position.
Methodological notes
The references below span roughly fourteen review papers from at least eight research traditions plus primary research papers from 2020 through early 2026. Where reviews disagree the disagreement is shown in the prose; where they converge the convergence is documented with multiple citations. Bibliometric numbers are estimated rather than measured. Two honest limitations: this survey works primarily from review snippets and abstracts rather than full-text PDFs, and the 2026 picture will not crystallize until autumn ICRA/IROS/RSS proceedings are available.
A LaTeX/IEEE version of this survey with full BibTeX references is available on request.
References
Numbered in citation order, IEEE style. Click any [N] superscript in the body to jump to its entry.
- [1] C. Li, R. Dang, X. Li, Z. Wu, J. Xu, H. Kasaei, R. Calandra, N. Lepora, S. Luo, H. Su, and R. Chen, "ManiSkill-ViTac 2025: Challenge on manipulation skill learning with vision and tactile sensing," arXiv preprint arXiv:2411.12503, 2024.
- [2] Q. Li, O. Kroemer, Z. Su, F. F. Veiga, M. Kaboli, and H. J. Ritter, "A review of tactile information: Perception and action through touch," IEEE Transactions on Robotics, vol. 36, no. 6, pp. 1619β1634, 2020.
- [3] S. Luo, J. Bimbo, R. Dahiya, and H. Liu, "Robotic tactile perception of object properties: A review," Mechatronics, vol. 48, pp. 54β67, 2017.
- [4] T. Tsuji, Y. Kato, G. Solak, H. Zhang, T. PetriΔ, F. Nori, and A. Ajoudani, "A survey on imitation learning for contact-rich tasks in robotics," The International Journal of Robotics Research, 2026.
- [5] N. F. Lepora, "Tactile robotics: Past and future," arXiv preprint arXiv:2512.01106, 2025.
- [6] Y. Wang et al., "Recent advances and challenges of tactile sensing for robotics: From fundamentals to applications," Materials Today Physics, 2025.
- [7] A. Welte and R. Rayyes, "Interactive imitation learning for dexterous robotic manipulation: Challenges and perspectives," Frontiers in Robotics and AI, vol. 12, 2025.
- [8] K. Shimonomura, "Tactile image sensors employing camera: A review," Sensors, vol. 19, no. 18, p. 3933, 2019.
- [9] U. H. Shah, R. Muthusamy, D. Gan, Y. Zweiri, and L. Seneviratne, "On the design and development of vision-based tactile sensors," Journal of Intelligent and Robotic Systems, 2021.
- [10] N. F. Lepora, "Classification of vision-based tactile sensors: A review," arXiv preprint arXiv:2509.02478, 2025.
- [11] J. Jin, S. Wang et al., "Progress on flexible tactile sensors in robotic applications on object properties recognition, manipulation and human-machine interactions," Soft Science, 2023.
- [12] I. S. Bayer, "MEMS-based tactile sensors: Materials, processes and applications in robotics," Micromachines, vol. 13, no. 12, p. 2051, 2022.
- [13] L. Yu and D. Liu, "Recent progress in tactile sensing and machine learning for texture perception in humanoid robotics," Interdisciplinary Materials, vol. 4, pp. 235β248, 2025.
- [14] A. C. Abad and A. Ranasinghe, "Visuotactile sensors with emphasis on GelSight sensor: A review," IEEE Sensors Journal, vol. 20, no. 14, pp. 7628β7638, 2020.
- [15] N. F. Lepora, "Soft biomimetic optical tactile sensing with the TacTip: A review," IEEE Sensors Journal, vol. 21, no. 19, pp. 21131β21143, 2021.
- [16] R. Bhirangi, V. Pattabiraman, E. Erciyes, Y. Cao, T. Hellebrekers, and L. Pinto, "AnySkin: Plug-and-play skin sensing for robotic touch," arXiv preprint arXiv:2409.08276, 2024.
- [17] B. Huang, Y. Wang, X. Yang, Y. Luo, and Y. Li, "3D-ViTac: Learning fine-grained manipulation with visuo-tactile sensing," in Conference on Robot Learning (CoRL), 2024.
- [18] B. Huang and Y. Li, "FlexiTac: An open-source, scalable tactile solution for robotic systems," 2026, https://flexitac.github.io/.
- [19] Y. Liu et al., "A neuromorphic robotic electronic skin with active pain and injury perception," Proceedings of the National Academy of Sciences, 2025.
- [20] M. Lambeta, P.-W. Chou, S. Tian, B. Yang, B. Maloon, V. R. Most, D. Stroud, R. Santos, A. Byagowi, G. Kammerer, D. Jayaraman, and R. Calandra, "DIGIT: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation," IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 3838β3845, 2020.
- [21] Meta AI and GelSight, "Digit 360: An artificial fingertip-shaped tactile sensor," 2024, https://ai.meta.com/blog/fair-robotics-open-source/.
- [22] Amazon Robotics, "Meet Amazon's vulcan: A warehouse robot with a sense of touch," 2025, https://www.aboutamazon.com/news/operations/amazon-vulcan-robot-pick-stow-touch.
- [23] W. Yuan, S. Dong, and E. H. Adelson, "GelSight: High-resolution robot tactile sensors for estimating geometry and force," Sensors, vol. 17, no. 12, p. 2762, 2017.
- [24] Anonymous, "Collaborative representation learning for alignment of tactile, language, and vision modalities," arXiv preprint arXiv:2511.11512, 2025.
- [25] Various authors, "UniVTAC: A unified simulation platform for visuo-tactile manipulation data generation, learning, and benchmarking," arXiv preprint arXiv:2602.10093, 2026.
- [26] Various authors, "Tactile robotics: An outlook," arXiv preprint arXiv:2508.11261, 2025.
- [27] Various authors, "Tac2Real: Reliable and GPU visuotactile simulation for online reinforcement learning and zero-shot real-world deployment," arXiv preprint arXiv:2603.28475, 2026.
- [28] Z. Si and W. Yuan, "Taxim: An example-based simulation model for GelSight tactile sensors," IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2361β2368, 2022.
- [29] S. Wang, M. Lambeta, P.-W. Chou, and R. Calandra, "TACTO: A fast, flexible, and open-source simulator for high-resolution vision-based tactile sensors," IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3930β3937, 2022.
- [30] Z. Chen, S. Zhang, S. Luo, F. Sun, and B. Fang, "Tacchi: A pluggable and low computational cost elastomer deformation simulator for optical tactile sensors," IEEE Robotics and Automation Letters, vol. 8, no. 3, pp. 1239β1246, 2023.
- [31] Z. Si, G. Zhang, Q. Ben, B. Romero, Z. Xian, C. Liu, and C. Gan, "DiffTactile: A physics-based differentiable tactile simulator for contact-rich robotic manipulation," in International Conference on Learning Representations (ICLR), 2024.
- [32] W. Du, W. Xu, J. Ren, Z. Yu, and C. Lu, "TacIPC: Intersection- and inversion-free FEM-based elastomer simulation for optical tactile sensors," arXiv preprint, 2024.
- [33] Y. Zhao, K. Qian, B. Duan, and S. Luo, "FOTS: A fast optical tactile simulator for sim2real learning of tactile-motor robot manipulation skills," IEEE Robotics and Automation Letters, 2024.
- [34] Anonymous, "TacEx: GelSight tactile simulation in Isaac Sim β combining soft-body and visuotactile simulators," arXiv preprint arXiv:2411.04776, 2024.
- [35] Y. Sun et al., "Tacchi 2.0: A low computational cost and comprehensive dynamic contact simulator for vision-based tactile sensors," arXiv preprint arXiv:2503.09100, 2025.
- [36] Anonymous, "Taccel: Scaling up vision-based tactile robotics via high-performance GPU simulation," arXiv preprint arXiv:2504.12908, 2025.
- [37] W. Xie and N. Correll, "Towards forceful robotic foundation models: A literature survey," arXiv preprint arXiv:2504.11827, 2025.
- [38] J. Pitz, L. R\"ostel, L. Sievers, and B. B\"auml, "Learning purely tactile in-hand manipulation with a torque-controlled hand," arXiv preprint arXiv:2204.03698, 2022.
- [39] W. Hu, B. Huang, W. W. Lee, S. Yang, Y. Zheng, and Z. Li, "Dexterous in-hand manipulation of slender cylindrical objects through deep reinforcement learning with tactile sensing," Robotics and Autonomous Systems, vol. 186, p. 104904, 2025.
- [40] N. Gu, K. Kosuge, and M. Hayashibe, "TactileAloha: Learning bimanual manipulation with tactile sensing," IEEE Robotics and Automation Letters, vol. 10, no. 8, pp. 8348β8355, 2025.
- [41] E. Helmut, N. Funk, T. Schneider, C. de Farias, and J. Peters, "Tactile-conditioned diffusion policy for force-aware robotic manipulation," arXiv preprint arXiv:2510.13324, 2025.
- [42] Anonymous, "Text2Touch: Tactile in-hand manipulation with LLM-designed reward functions," arXiv preprint arXiv:2509.07445, 2025.
- [43] Various authors, "Contact-grounded policy: Dexterous visuotactile policy with generative contact grounding," arXiv preprint arXiv:2603.05687, 2026.
- [44] Various authors, "FBI: Learning dexterous in-hand manipulation with dynamic visuotactile shortcut policy," arXiv preprint arXiv:2508.14441, 2025.
- [45] J. Zhao, Y. Ma, L. Wang, and E. H. Adelson, "Transferable tactile transformers for representation learning across diverse sensors and tasks," arXiv preprint arXiv:2406.13640, 2024.
- [46] C. Higuera, A. Sharma, C. K. Bodduluri, T. Fan, P. Lancaster, M. Kalakrishnan, M. Kaess, B. Boots, M. Lambeta, T. Wu, and M. Mukadam, "Sparsh: Self-supervised touch representations for vision-based tactile sensing," in Conference on Robot Learning (CoRL), 2024.
- [47] Anonymous, "Sparsh-X: Tactile beyond pixels: Multisensory touch representations for robot manipulation," arXiv preprint arXiv:2506.14754, 2025.
- [48] A. Feng et al., "AnyTouch: Learning unified static-dynamic representation across multiple visuo-tactile sensors," arXiv preprint arXiv:2502.12191, 2025.
- [49] F. Yang, C. Feng, Z. Chen, H. Park, D. Wang, Y. Dou, Z. Zeng, X. Chen, R. Gangopadhyay, A. Owens, and A. Wong, "Binding touch to everything: Learning unified multimodal tactile representations," in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 26340β26353.
- [50] L. Fu, G. Datta, H. Huang, W. C.-H. Panitch, J. Drake, J. Ortiz, M. Mukadam, M. Lambeta, R. Calandra, and K. Goldberg, "A touch, vision, and language dataset for multimodal alignment," in International Conference on Machine Learning (ICML), 2024.
- [51] S. Yu, K. Lin, A. Xiao, J. Duan, and H. Soh, "Octopi: Object property reasoning with large tactile-language models," in Robotics: Science and Systems (RSS), 2024.
- [52] J. Bi, K. Y. Ma, C. Hao, M. Z. Shou, and H. Soh, "VLA-Touch: Enhancing vision-language-action models with dual-level tactile feedback," arXiv preprint arXiv:2507.17294, 2025.
- [53] Anonymous, "Tactile-VLA: Unlocking vision-language-action model's physical knowledge for tactile generalization," arXiv preprint arXiv:2507.09160, 2025.
- [54] Various authors, "OmniVTLA: Vision-tactile-language-action model with semantic-aligned tactile sensing," arXiv preprint arXiv:2508.08706, 2025.
- [55] Y. R. Song, J. Li, R. Fu et al., "OPENTOUCH: Bringing full-hand touch to real-world interaction," arXiv preprint arXiv:2512.16842, 2025.
- [56] X. Zhu, B. Huang, and Y. Li, "Touch in the wild: Learning fine-grained manipulation with a portable visuo-tactile gripper," arXiv preprint arXiv:2507.15062, 2025.
- [57] F. Yang, C. Ma, J. Zhang, J. Zhu, W. Yuan, and A. Owens, "Touch and go: Learning from human-collected vision and touch," in NeurIPS Datasets and Benchmarks, 2022.
- [58] Anonymous, "VT-Refine: Learning bimanual assembly with visuo-tactile feedback via simulation fine-tuning," arXiv preprint arXiv:2510.14930, 2025.
- [59] Q. K. Luu, P. Zhou, Z. Xu, Z. Zhang, Q. Qiu, and Y. She, "ManiFeel: Benchmarking and understanding visuotactile manipulation policy learning," arXiv preprint arXiv:2505.18472, 2025.
- [60] V. Pattabiraman, Z. Huang, D. Panozzo, D. Zorin, L. Pinto, and R. Bhirangi, "eFlesh: Highly customizable magnetic touch sensing using cut-cell microstructures," arXiv preprint arXiv:2506.09994, 2025.
- [61] Anonymous, "Crossing the reality gap in tactile-based learning," arXiv preprint arXiv:2305.09870, 2023.
- [62] Various authors, "Semantic-contact fields for category-level generalizable tactile tool manipulation," arXiv preprint arXiv:2602.13833, 2026.
- [63] Various authors, "Sim2Real manipulation on unknown objects with tactile-based reinforcement learning," arXiv preprint arXiv:2403.12170, 2024.
- [64] Various authors, "Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning," arXiv preprint arXiv:2501.02116, 2025.