Tactile Sensing in Robot Manipulation (2020–2026)

A multi-source survey across hardware, simulation, perception, policy learning, and foundation models

Reading time Β· ~25 min Sources Β· 64 references Last updated Β· May 2026

This survey reviews five years of progress in tactile sensing for robotic manipulation, organized across hardware, simulation, perception, policy learning, and foundation models. We draw on roughly fourteen independent review papers from eight research traditions, using cross-source corroboration where reviewers converge and flagging disagreement where they don't. We supplement these with primary research papers from 2020 through early 2026, including major hardware milestones (DIGIT, Digit 360, AnySkin, FlexiTac, 3D-ViTac), simulators (Tacchi family, DiffTactile, TacIPC, Taccel), tactile foundation models (T3, Sparsh family, UniTouch, TVL, Octopi, AnyTouch), and emerging touch-augmented vision-language-action models.

A consistent finding across reviews is that tactile sensing is clearly important for contact-rich manipulation, yet remains rarely deployed industrially. We adopt this tension as the survey's organizing lens.

Introduction

For decades, vision-only manipulation dominated, but the field has hit a wall on contact-rich tasks: insertion, in-hand reorientation, deformable object handling, occluded grasping, and force-modulated tasks. Vision cannot see what the gripper itself occludes, and it cannot directly measure forces. As the ManiSkill-ViTac 2025 challenge organizers framed it, performance in contact-rich scenarios remains far from satisfactory because most works rely on vision alone and lack the ability to detect contact status for decision making.1

The framings used by the most-cited reviews converge on the same central question. Li, Kroemer, Su, Veiga, Kaboli and Ritter, in their 2020 IEEE T-RO review, frame tactile sensing as a hierarchy: raw signals β†’ contact-level information β†’ object-level information β†’ action-level information.2 Luo, Bimbo, Dahiya and Liu organize the field around three object properties β€” shape, surface material, and object pose β€” and emphasize the role of tactile in combination with other modalities.3 Both reviews β€” written years apart, in different research traditions β€” agree on the practical motivation: tactile is the modality that closes the gap between knowing where an object is and knowing what is happening at the contact.

A persistent tension runs through every honest review. The 2026 IJRR survey on imitation learning for contact-rich tasks puts it bluntly: despite efforts, tactile sensors remain a technology mostly confined to research applications with rare industrial use-cases.4

Why now

Two structural drivers explain the recent acceleration. The industrial humanoid wave (Tesla Optimus, Figure, 1X, Unitree, Apptronik, Sanctuary) needs touch because human-form factories and homes were built around tactile interaction. Independently, the LLM/VLM/VLA boom created a natural appetite for adding a tactile modality to foundation models, the same way audio and depth were added in the 2022–2024 multimodal wave.

Figure 1. Estimated annual and cumulative publications mentioning "tactile" and "robot manipulation." Numbers are indicative; the shape is consistent across bibliometric sections of multiple reviews.

The numbers are indicative β€” keyword choices vary across bibliometric reviews β€” but the shape is consistent. Lepora notes that 15 reviews on tactile robotics were published in 2024 alone.5 A separate analysis of dexterous-manipulation publications through January 2025 identified eight distinct research categories and noted that haptic and tactile interfaces are under-represented relative to their importance.7

The information hierarchy

The clearest conceptual scaffold for this survey comes from the Li-Kroemer hierarchy, which we adopt as a recurring lens.

Action level Grasp planning, exploration, regrasping, slip recovery Object level Shape, material, pose, in-hand state, friction Contact level Contact location, force vector, slip, normal/shear Raw signal Pressure array, gel image, magnetometer, voltage Sensors Perception Models Policies DIGIT, AnySkin Slip nets Tac2Pose FARM, CGP Each level builds on the one below β€” but not every system explicitly models all four. A common critique: end-to-end policies skip the middle levels and lose interpretability.
Figure 2. Tactile information hierarchy adapted from Li, Kroemer et al., 2020.

The hierarchy is useful diagnostically: when a paper claims "tactile foundation model," ask which level it operates at. T3 and Sparsh sit between raw and contact. Octopi and UniTouch reach the object level. VLA-Touch and Tactile-VLA bridge all four.

Taxonomy of the field

Five research pillars emerge across reviews. The pillars feed each other: a useful new sensor needs a simulator, a perception model, and a learnable policy to matter, plus an application to ground it.

Tactile manipulation Five active research pillars 1. HardwareSensors and skins 2. SimulationSim-to-real bridge 3. PerceptionContact, slip, pose 4. Policy learningRL, IL, diffusion 5. FoundationVLA, TLM Vision-based (GelSight) Magnetic (AnySkin) Flexible (FlexiTac) Capacitive e-skin Multi-modal (Digit 360) FEM (DiffTactile) MPM (Tacchi) GPU sim (Taccel) IPC (TacIPC) Image-based (Taxim) Slip / incipient slip Force/shear field Object pose (Tac2Pose) Material / texture Shape from touch Sim-to-real RL Imitation / BC Diffusion policy Force-aware control Contact grounding T3, Sparsh UniTouch / TVL Octopi (TLM) VLA-Touch Tactile-VLA Application drivers Humanoids Β· dexterous in-hand Β· warehouse Β· surgery Β· prosthetics Β· cable / cloth / food Each pillar feeds the others β€” a useful new sensor needs a simulator, a perception model, and a learnable policy to matter.
Figure 3. Five-pillar taxonomy of tactile manipulation research, with example methods under each pillar and the shared application drivers.

Hardware: the sensor zoo

There is genuine disagreement across reviews about how to slice the sensor space. Three independent reviews give three different vision-based-tactile-sensor (VBTS) taxonomies. Shimonomura's 2019 review classifies camera-based tactile sensors into three types based on contact-module hardware.8 Shah et al.'s 2021 JINT review proposes three design paradigms β€” waveguide-type, marker displacement-based, and reflective membrane designs.9 Lepora's 2025 classification argues both prior taxonomies overlook designs that combine these principles.10 Materials-science reviews give a different slice again: Jin et al.'s 2023 review of flexible tactile sensors organizes the area around six sensing mechanisms.11

The point is not to pick a winner among taxonomies, but to acknowledge that the underlying design space is still actively being explored.

Sensing principles compared

Figure 4. Qualitative comparison of six transduction principles for tactile sensing across six dimensions (resolution, area coverage, durability, force range, frequency response, low cost). No principle dominates everywhere.

VBTS wins on resolution; magnetic wins on durability; piezoresistive wins on cost and area; piezoelectric and triboelectric win on frequency response. Different applications genuinely demand different transduction principles β€” which is part of why the field has not converged on a single sensor.

What's actually winning

FamilySourcesStrengthStanding problem
Vision-based
GelSight, DIGIT, TacTip, GelSlim, Digit 360
[8]–[10], [14], [15]High spatial resolution; ML-friendly imagesBulky; gel wear; latency
Magnetic
ReSkin, AnySkin, eFlesh, uSkin
[4], [16]Robust, replaceable, cross-instance generalizationLower resolution than VBTS
Flexible piezoresistive
STAG-glove, FlexiTac, 3D-ViTac
[11], [13], [17], [18]Large area, low cost, full-finger coverageLower force resolution
Capacitive / piezoelectric e-skins[6], [11], [19]Whole-body coverage; low powerWiring complexity; resolution trade-off
Triboelectric / self-powered[6], [11]Self-powered, vibration sensingCalibration drift; no DC response
MEMS-based force/pressure[12]Industrial maturity; small footprintLess expressive than VBTS

The big practical hardware story since 2020 has five chapters that the reviews mostly agree on:

  1. DIGIT (Lambeta et al., 2020) miniaturized vision-based sensing for multi-fingered hands, with an open-sourced design that significantly lowered the barrier to entry.20
  2. AnySkin (Bhirangi et al., 2024) showed that decoupling sensing electronics from the elastomer interface produces cross-instance policy generalization.16
  3. 3D-ViTac (Huang et al., CoRL 2024) showed dense flexible piezoresistive arrays at 3 mmΒ² resolution outperform vision-only policies on long-horizon and fragile-object tasks.17
  4. Digit 360 (Meta + GelSight, 2024) introduced a fingertip with over 18 sensing features.21
  5. Industrial deployment reached scale: Amazon Vulcan manipulates 75% of the 1M items at the Spokane warehouse using force feedback and AI.22

How a vision-based tactile sensor works

Object Camera RGB LEDs Reflective gel Contact zone Membrane Photometric stereo recovers surface normals from three-color illumination, letting a single camera image reconstruct sub-millimeter 3D geometry of contact. Marker-basedTacTip, GelSlim 3 Intensity-basedGelSight, DIGIT Multimodal fingertipDigit 360
Figure 5. Cross-section of a vision-based tactile sensor. The contact-mechanics problem is recast as a vision problem.

The trick that makes VBTS work is recasting a contact-mechanics problem as a vision problem. Once it's an image, the entire deep-learning toolkit becomes available. That observation unlocked the 2017–2024 VBTS wave.

The standardization problem

Multiple reviews independently flag the same hardware problem: standardization. TLV-CoRe authors observe that tactile sensors are not yet fully standardized; tactile images differ significantly under identical touch conditions due to camera type, lighting, color, and illumination.24 UniVTAC authors (2026) note that the current infrastructure for visuo-tactile manipulation remains underdeveloped, with scarcity of large-scale tactile data severely limiting tactile-centric representation models.25

Simulation and sim-to-real

Sim-to-real for vision is hard but well-trodden. For tactile it is structurally different because what transfers is a physical contact response, not an appearance. The 2025 Tactile Robotics Outlook reviews how data-driven simulation has matured, noting that approaches like image-to-image translation via GANs work for surface tracing and bi-manual manipulation but at the cost of a substantive sim-to-real gap.26

Simulator landscape

YearSimulatorMethodContribution
2022TaximExample-based image renderingFast GelSight image generation
2022TACTOPyrender + soft body approxFirst open-source flexible VBTS sim
2023TacchiMaterial Point MethodParticle-based elastomer; low GPU cost
2024DiffTactileDifferentiable FEMPhysics-accurate, gradient-based learning
2024TacIPCIncremental Potential ContactNumerical stability for friction
2024FOTSOpticalFast sim-to-real for motor skills
2024TacExSoft-body in Isaac SimBridges robotics and tactile rendering
2025Tacchi 2.0Dynamic MPMCaptures press, slip, rotate dynamics
2025TaccelHigh-performance GPUScaling vision-based tactile robotics
2026Tac2RealGPU visuotactileOnline RL with zero-shot real deployment

The transfer pipeline

Simulation side Soft-body contactFEM, MPM, IPC Tactile renderingTaxim, FOTS, TacEx Differentiable physicsDiffTactile GPU-scale simTaccel, Tac2Real Policy trainingRL or BC in sim Transfer techniques Domain randomizationFriction, stiffness, noise Image translationSim ↔ real GANs Latent alignmentShared encoders Real fine-tuningSmall in-domain data CalibrationPer-sensor tuning Real-world side Physical sensorDIGIT, AnySkin, etc. Robot platformAllegro, Leap, Franka Closed-loop controlCompliance, force Real evaluationBenchmarks, success Residual / online RLAdaptation in deployment No simulator currently dominates: each row trades fidelity, speed, differentiability, and ease of integration differently β€” choice depends on the downstream policy.
Figure 6. The sim-to-real pipeline for tactile manipulation. The middle column (transfer techniques) is the most important practically.

The bottleneck has shifted from the simulator (now adequate for many tasks) to the transfer pipeline. Even with a perfect simulator and a real sensor, transfer is mediated by domain randomization, image translation, latent alignment, real fine-tuning, and per-sensor calibration. Most papers use two or three together.

Policy learning: a critical view

I want to be more honest in this section than typical surveys.

Xie & Correll's 2025 "Towards Forceful Robotic Foundation Models" survey reaches a pointed conclusion: while there are tasks like pouring, peg-in-hole insertion, and handling delicate objects where force matters, the performance of imitation learning models is generally not at a level of dynamics where force truly matters.37

Sobering take: most "tactile-augmented" imitation learning has not yet reached the regime where the tactile signal is what is actually limiting performance.

Methods landscape

Maturity / deployment evidence β†’ Force regime relevance β†’ High Low Early Mature Sim-to-real RL (in-hand)DLR, OpenAI, UCL stick Visuotactile imitation3D-ViTac, TactileAloha Force-aware diffusionFARM, VT-Refine Contact groundingCGP, FBI, SC-Fields Tactile-VLA / VLA-TouchOmniVTLA, Tactile-VLA Tactile foundationT3, Sparsh, AnyTouch LLM-designed rewardsText2Touch Most "tactile-augmented" IL has not been shown in regimes where force is the binding constraint (Xie & Correll 2025) Force-aware diffusion and contact grounding are the responses to that critique
Figure 7. Tactile policy-learning methods plotted by deployment maturity and force-regime relevance.

What works and what is early

Sim-to-real RL for in-hand manipulation works for specific tasks. DLR's purely tactile in-hand manipulation work used off-policy deep RL trained in 600 CPU hours, then transferred to a humanoid hand achieving 46+ full rotations of a cube in a single run.38

Imitation learning + visuotactile fusion has dominated since 2022 because diffusion policies and ALOHA-style teleoperation made demonstration collection cheap. 3D-ViTac shows that dense flexible piezoresistive sensors fused with vision in a 3D representation enable bimanual manipulation that significantly outperforms vision-only policies on fragile and long-horizon tasks.17

Force-aware diffusion is the most recent direction. FARM (2025) integrates high-dimensional tactile data to infer tactile-conditioned force signals which define a matching force-based action space.41

Contact-grounding policies are the structural response to Xie & Correll's critique. CGP (2026) argues most prior work uses tactile signals as additional observations rather than modeling contact state or how action outputs interact with low-level controller dynamics.43 FBI (2025) takes a complementary direction: extract tactile information from temporal object motion flow via a dynamics-aware latent model.44

Foundation models for touch

Three families of tactile foundation models coexist. They have different scaling laws and different bottlenecks.

Encoder family T3, Sparsh, AnyTouch Multi-sensor tactile Sensor-specific tokens Shared transformer Tactile representation Task heads Pretrained via SSL (MAE, DINO, JEPA) on millions of frames across diverse sensors Binding family UniTouch, TVL, Octopi, TLV-CoRe Touch Vision Lang Aligned latentContrastive loss Open-vocab tasksProperty reasoning Aligns touch with other modalities, borrowing CLIP-style contrastive recipes Policy family VLA-Touch, Tactile-VLA, OmniVTLA Vision + language input VLA backbone + tactile encoder Force-aware policy Robot actions + forces Touch is wired into the action generation step, not just perception All three families coexist; the policy family is the newest and most contested.
Figure 8. Three families of tactile foundation models. Encoders trained by family 1 are often plugged into family 2 or family 3 systems.

Encoder family

T3 (Zhao et al., 2024) introduced the FoTa dataset β€” over 3 million data points from 13 sensors and 11 tasks β€” and showed zero-shot transferability across sensor-task pairings.45 Sparsh (Higuera et al., CoRL 2024) curated ~661K samples and trained foundation models using MAE, DINO/DINOv2, and JEPA.46 Sparsh-X (2025) extends this to a unified backbone fusing image, audio, motion, and pressure signals.47 AnyTouch (Feng et al., 2025) introduced TacQuad with 72,606 contact frames from four different visuo-tactile sensors.48

Binding family

UniTouch (CVPR 2024) connects touch to vision, language and sound by aligning UniTouch embeddings to pretrained image embeddings, with learnable sensor-specific tokens.49 The TVL dataset (ICML 2024 oral) provides 44K in-the-wild vision-touch pairs with English labels, with ~29% improvement over prior models.50 Octopi (NUS, 2024) reasons about tactile inputs β€” for example, identifying the softer of two avocados via touch and inferring it is ripe via commonsense.51

Policy family

VLA-Touch enhances generalist robot policies with tactile sensing without fine-tuning the base VLA.52 Tactile-VLA goes further by unlocking the VLA's physical knowledge for tactile generalization β€” generalizing language-based force control by applying force modifiers like "gently" learned from one task to a new task that was only trained on motion.53

The data scarcity problem

The honest picture for foundation models with touch is that a "GPT moment" for tactile manipulation does not yet exist. The biggest single dataset is FoTa at 3M points. This is roughly 1000Γ— smaller than the equivalent vision corpora.

Figure 9. Logarithmic comparison of dataset sizes β€” tactile datasets are roughly 1000–10000Γ— smaller than mature vision corpora.

Xie & Correll observe that data collection of tactile robot data is an active research problem. The next breakthrough may come from data-collection methods (Touch in the Wild, OpenTouch, FlexiTac-equipped UMI) rather than from architectures.

Application landscape

A useful diagnostic when reading any individual tactile-manipulation paper is asking which application driver it is targeting, and whether the metric of success aligns with that application.

Deployment maturity β†’ Tactile dependence β†’ High Low Research Industrial In-hand reorientationCubes, sticks, articulated Cable / cloth handlingDeformable objects Surgical assistanceTissue, needle insertion Sub-mm assemblyConnector insertion Tool useScrewdriver, hammer Fragile / fresh itemsFruit, glass, electronics ProstheticsWearable feedback Warehouse pick / stowAmazon Vulcan Industrial assemblyForce-controlled robots Bin picking with slipF/T sensor based Humanoid loco-manipWhole-body contact Material classificationTexture, hardness High tactile dependence + research-only: in-hand, cable/cloth, surgery, sub-mm Industrial deployment uses simpler sensing (F/T) β€” research and industry are not yet the same field Color: red = research-driven Β· amber = transitional Β· blue = deployed Β· gray = early/sleeper
Figure 10. Application landscape showing where research and industry diverge.

Two things to read off this map. First, research and industrial tactile manipulation are not yet the same field. Industrial deployments mostly use force/torque sensors with simple control rather than the high-resolution VBTS dominating academic literature. Second, the transitional cluster β€” sub-mm assembly, tool use, fragile-item handling β€” is where the two will collide first, probably 2027–2029.

Research labs and lineages

Tactile manipulation is genuinely global, but lineages cluster geographically in ways that reflect funding traditions and equipment availability.

North America MIT (Adelson)GelSight family Meta FAIRDIGIT, Digit360, Sparsh CMU (Kroemer)Manipulation, IL NYU (Pinto)AnySkin, eFlesh Columbia / UIUC (Y. Li)3D-ViTac, FlexiTac UW / BerkeleySTAG glove, TVL Europe Bristol (Lepora)TacTip, Tactile Gym KCL (Luo)Tacchi, FOTS Bielefeld (Ritter)Active touch, hands DLR (BΓ€uml)Pure-tactile in-hand TUM (Kaboli)Active tactile, e-skin IIT GenoaiCub skin, biomimetic Asia-Pacific Tsinghua (Xu)ManiSkill-ViTac SUSTech / HKUTacGNN, in-hand NUS (Soh)Octopi, VLA-Touch Tencent Robotics XIndustrial dexterity Tohoku (Kosuge)TactileAloha, IL Beihang / CASMaterials, e-skin Industry tactile programs Amazon RoboticsVulcan stow / pick Toyota ResearchSoft-bubble, UMI GelSight Inc.Commercial sensors Wonik / SanctuaryAllegro, dex hands BMW (Kaboli)Industrial e-skin Humanoid co'sFigure, 1X, Apptronik
Figure 11. Major tactile manipulation research and industry programs grouped by region.

Two patterns are worth flagging. First, the four-region split is real: North American groups dominate VBTS and foundation models, European groups dominate biomimetic and active-touch traditions, Asian groups dominate materials science, simulation, and emerging dexterous-hand work. Second, the industry side is concentrated, with Amazon, Toyota Research, GelSight Inc., and the humanoid startups doing most of the deployment-driven work.

Key papers (2020–2026)

A non-exhaustive shortlist of papers that recur across recent surveys, organized by impact area.

PaperYearVenueWhy it matters
DIGIT (Lambeta et al.)2020RA-LOpen-source, miniaturized GelSight; the workhorse VBTS
Touch and Go2022NeurIPS D&BFirst in-the-wild paired vision + touch dataset
3D-ViTac2024CoRLDense flexible visuotactile + 3D fusion + diffusion policy
DiffTactile2024ICLRDifferentiable physics-based tactile sim
AnySkin2024arXivPlug-and-play, cross-instance generalizable magnetic sensor
Digit 360 (Meta + GelSight)2024β€”First multimodal commercial fingertip sensor
T32024NeurIPS / CoRLTactile foundation transformer + FoTa dataset
Sparsh / Sparsh-X2024–25CoRLSSL touch representations across sensor types
UniTouch2024CVPRTouch ↔ vision/language/sound binding
TVL2024ICML (oral)Touch + vision + language alignment
Octopi / Octopi-1.52024–25RSSTactile-language model for property reasoning
Tactile-VLA / VLA-Touch2025arXivVLAs with native touch grounding
FARM2025arXivForce-aware diffusion policy with GelSight Mini
Text2Touch2025arXivLLM-designed rewards for tactile in-hand RL
TactileAloha2025RA-LTactile-augmented bimanual ACT
Touch in the Wild2025NeurIPSPortable visuotactile gripper, large-scale in-the-wild data
VT-Refine2025CoRLBimanual assembly via visuotactile sim fine-tuning
OpenTouch2025arXivFirst in-the-wild full-hand egocentric tactile dataset
ManiFeel2025CoRLFirst reproducible visuotactile policy benchmark
FBI (Flow Before Imitation)2025arXivLearns tactile from visual flow; works without sensors
Tacchi 2.0 / Taccel2025arXivDynamic MPM / GPU-scale tactile sim
eFlesh2025arXivCustomizable magnetic touch via cut-cell microstructures
FlexiTac2026β€”Open-source scalable flexible tactile platform
CGP (Contact-Grounded Policy)2026arXivReframes visuotactile policy learning as contact grounding
UniVTAC2026arXivUnified visuotactile sim platform + 8-task benchmark

Open challenges

Each challenge below is corroborated across multiple independent reviews.

Sensor heterogeneity / standardization

Tactile sensors are not yet fully standardized; tactile images differ significantly under identical touch conditions due to camera type, lighting, color, and illumination. The current infrastructure for visuo-tactile manipulation remains underdeveloped, with scarcity of large-scale tactile data severely limiting tactile-centric representation models. Jin et al.'s materials review identifies six different sensing mechanisms in active use, with no convergence in sight.

Sim-to-real on tactile signals

Direct zero-shot sim-to-real transfer is challenging due to complex nonlinear deformation of soft sensors, motivating two-stage pipelines.

Data scarcity

Data collection of tactile robot data is an active research problem, and the community remains in an exploratory phase where developing new practical data collection methods is as important as refining existing ones.

Generalization across objects

The heart of tactile-based manipulation: how do you train a tactile-based policy that can manipulate unseen and diverse objects?

Whole-body / large-area touch is under-developed

A 2025 humanoid loco-manipulation survey explicitly highlights whole-body tactile sensing as a crucial modality, observing that most robotic hands prioritize dexterity at the expense of payload capacity.

Force is not the same as tactile

Xie & Correll's most useful conceptual point: force and touch are abstract quantities that can be inferred through a wide range of modalities β€” proprioception, force-torque sensors, and tactile sensors all reveal aspects of the same underlying physical interaction. Conflating these in surveys can mislead.

Where the field disagrees

Reading multiple reviews together surfaces three live disagreements that single-source surveys obscure.

Disagreement 1: Is more sensing always better?

Xie & Correll observe that force and touch are abstract quantities that can be inferred through a wide range of modalities and are often measured and controlled implicitly, implying that better policy learning, not more sensing, may be the bottleneck. Materials-science reviews advocate higher resolution, larger area, more modalities. The FBI paper is on the smarter-learning side; 3D-ViTac's 1024-taxel approach is on the denser-sensing side. Both work. The field has not settled this.

Disagreement 2: Is there a "winning" tactile sensor architecture?

Lepora's 2025 VBTS classification argues marker-based and intensity-based vision sensors are both viable for high-resolution tasks. AnySkin's authors argue magnetic, replaceable skins are the practical winner. 3D-ViTac's authors argue dense flexible piezoresistive arrays are the practical winner. There is no consensus, and the practical winner is likely task-dependent.

Disagreement 3: Is industrial deployment a sign of progress or a distraction?

Amazon Vulcan's force feedback sensors are simple six-axis force-torque, not high-resolution tactile. Reviews from the academic side note that high-resolution tactile sensing β€” the kind dominating research papers β€” is barely deployed industrially. The same word, "tactile," covers very different things in research vs. industry.

Forecast

Figure 12. Forecast 2020–2030 for tactile manipulation research interest by direction. Solid through 2026 = observed (sources cross-validated from multiple reviews); dashed = projection.

Predictions are calibrated to the strength of evidence behind each claim.

High confidence (multiple independent reviews agree)

  • Touch foundation models will grow rapidly through 2027 and likely plateau as the field hits the data-scaling wall around 2028.
  • Sim-to-real for tactile will keep improving but at a slower pace than RGB sim-to-real did, because contact physics is fundamentally harder.
  • Visuotactile benchmarks (ManiFeel, UniVTAC, ManiSkill-ViTac) will become standard reporting expectations within two years.
  • Industrial deployment will use force/torque + simple tactile, not high-resolution VBTS, for at least 3–5 more years.

Medium confidence (one or two reviews supportive)

  • Magnetic skins will become the default for deployment-oriented research because of cross-instance generalization.
  • Dense flexible piezoresistive arrays will become the default for fingertip + full-finger coverage in research because they integrate cleanly with vision.
  • Contact-grounded policies will be the dominant policy framework by 2027–2028.

Contested or speculative

  • Whether vision-only "inferred tactile" (FBI-style) will partially replace physical sensors. Genuine open question.
  • When whole-body / large-area e-skin will hit production. Could be 2027 with humanoid push, or 2030+ given skin durability and wiring problems.
  • Whether a "GPT moment" for tactile manipulation is possible at all. Xie & Correll are skeptical; foundation-model optimists are more sanguine.

Concluding diagnostic

A useful test for any new tactile-manipulation paper, derived from synthesizing the reviews above:

The seven-question test

  1. Which level of the Li-Kroemer hierarchy does this operate at?
  2. Which sensing principle, and what trade-offs did it accept?
  3. Is the simulator–transfer–real pipeline complete or partial?
  4. Does the policy operate in a regime where force is the binding constraint, or only where vision would already suffice?
  5. Is this in the research-driven, transitional, or deployed cluster of applications?
  6. Which family of foundation models β€” encoder, binding, or policy β€” does it interface with, if any?
  7. What data did it train on, and how does that compare to the 1000Γ— gap with vision?

A paper that addresses all seven explicitly is doing the field a service. A paper that addresses none of them is hard to position.

Methodological notes

The references below span roughly fourteen review papers from at least eight research traditions plus primary research papers from 2020 through early 2026. Where reviews disagree the disagreement is shown in the prose; where they converge the convergence is documented with multiple citations. Bibliometric numbers are estimated rather than measured. Two honest limitations: this survey works primarily from review snippets and abstracts rather than full-text PDFs, and the 2026 picture will not crystallize until autumn ICRA/IROS/RSS proceedings are available.

A LaTeX/IEEE version of this survey with full BibTeX references is available on request.

References

Numbered in citation order, IEEE style. Click any [N] superscript in the body to jump to its entry.

  1. [1] C. Li, R. Dang, X. Li, Z. Wu, J. Xu, H. Kasaei, R. Calandra, N. Lepora, S. Luo, H. Su, and R. Chen, "ManiSkill-ViTac 2025: Challenge on manipulation skill learning with vision and tactile sensing," arXiv preprint arXiv:2411.12503, 2024.
  2. [2] Q. Li, O. Kroemer, Z. Su, F. F. Veiga, M. Kaboli, and H. J. Ritter, "A review of tactile information: Perception and action through touch," IEEE Transactions on Robotics, vol. 36, no. 6, pp. 1619–1634, 2020.
  3. [3] S. Luo, J. Bimbo, R. Dahiya, and H. Liu, "Robotic tactile perception of object properties: A review," Mechatronics, vol. 48, pp. 54–67, 2017.
  4. [4] T. Tsuji, Y. Kato, G. Solak, H. Zhang, T. Petrič, F. Nori, and A. Ajoudani, "A survey on imitation learning for contact-rich tasks in robotics," The International Journal of Robotics Research, 2026.
  5. [5] N. F. Lepora, "Tactile robotics: Past and future," arXiv preprint arXiv:2512.01106, 2025.
  6. [6] Y. Wang et al., "Recent advances and challenges of tactile sensing for robotics: From fundamentals to applications," Materials Today Physics, 2025.
  7. [7] A. Welte and R. Rayyes, "Interactive imitation learning for dexterous robotic manipulation: Challenges and perspectives," Frontiers in Robotics and AI, vol. 12, 2025.
  8. [8] K. Shimonomura, "Tactile image sensors employing camera: A review," Sensors, vol. 19, no. 18, p. 3933, 2019.
  9. [9] U. H. Shah, R. Muthusamy, D. Gan, Y. Zweiri, and L. Seneviratne, "On the design and development of vision-based tactile sensors," Journal of Intelligent and Robotic Systems, 2021.
  10. [10] N. F. Lepora, "Classification of vision-based tactile sensors: A review," arXiv preprint arXiv:2509.02478, 2025.
  11. [11] J. Jin, S. Wang et al., "Progress on flexible tactile sensors in robotic applications on object properties recognition, manipulation and human-machine interactions," Soft Science, 2023.
  12. [12] I. S. Bayer, "MEMS-based tactile sensors: Materials, processes and applications in robotics," Micromachines, vol. 13, no. 12, p. 2051, 2022.
  13. [13] L. Yu and D. Liu, "Recent progress in tactile sensing and machine learning for texture perception in humanoid robotics," Interdisciplinary Materials, vol. 4, pp. 235–248, 2025.
  14. [14] A. C. Abad and A. Ranasinghe, "Visuotactile sensors with emphasis on GelSight sensor: A review," IEEE Sensors Journal, vol. 20, no. 14, pp. 7628–7638, 2020.
  15. [15] N. F. Lepora, "Soft biomimetic optical tactile sensing with the TacTip: A review," IEEE Sensors Journal, vol. 21, no. 19, pp. 21131–21143, 2021.
  16. [16] R. Bhirangi, V. Pattabiraman, E. Erciyes, Y. Cao, T. Hellebrekers, and L. Pinto, "AnySkin: Plug-and-play skin sensing for robotic touch," arXiv preprint arXiv:2409.08276, 2024.
  17. [17] B. Huang, Y. Wang, X. Yang, Y. Luo, and Y. Li, "3D-ViTac: Learning fine-grained manipulation with visuo-tactile sensing," in Conference on Robot Learning (CoRL), 2024.
  18. [18] B. Huang and Y. Li, "FlexiTac: An open-source, scalable tactile solution for robotic systems," 2026, https://flexitac.github.io/.
  19. [19] Y. Liu et al., "A neuromorphic robotic electronic skin with active pain and injury perception," Proceedings of the National Academy of Sciences, 2025.
  20. [20] M. Lambeta, P.-W. Chou, S. Tian, B. Yang, B. Maloon, V. R. Most, D. Stroud, R. Santos, A. Byagowi, G. Kammerer, D. Jayaraman, and R. Calandra, "DIGIT: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation," IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 3838–3845, 2020.
  21. [21] Meta AI and GelSight, "Digit 360: An artificial fingertip-shaped tactile sensor," 2024, https://ai.meta.com/blog/fair-robotics-open-source/.
  22. [22] Amazon Robotics, "Meet Amazon's vulcan: A warehouse robot with a sense of touch," 2025, https://www.aboutamazon.com/news/operations/amazon-vulcan-robot-pick-stow-touch.
  23. [23] W. Yuan, S. Dong, and E. H. Adelson, "GelSight: High-resolution robot tactile sensors for estimating geometry and force," Sensors, vol. 17, no. 12, p. 2762, 2017.
  24. [24] Anonymous, "Collaborative representation learning for alignment of tactile, language, and vision modalities," arXiv preprint arXiv:2511.11512, 2025.
  25. [25] Various authors, "UniVTAC: A unified simulation platform for visuo-tactile manipulation data generation, learning, and benchmarking," arXiv preprint arXiv:2602.10093, 2026.
  26. [26] Various authors, "Tactile robotics: An outlook," arXiv preprint arXiv:2508.11261, 2025.
  27. [27] Various authors, "Tac2Real: Reliable and GPU visuotactile simulation for online reinforcement learning and zero-shot real-world deployment," arXiv preprint arXiv:2603.28475, 2026.
  28. [28] Z. Si and W. Yuan, "Taxim: An example-based simulation model for GelSight tactile sensors," IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2361–2368, 2022.
  29. [29] S. Wang, M. Lambeta, P.-W. Chou, and R. Calandra, "TACTO: A fast, flexible, and open-source simulator for high-resolution vision-based tactile sensors," IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3930–3937, 2022.
  30. [30] Z. Chen, S. Zhang, S. Luo, F. Sun, and B. Fang, "Tacchi: A pluggable and low computational cost elastomer deformation simulator for optical tactile sensors," IEEE Robotics and Automation Letters, vol. 8, no. 3, pp. 1239–1246, 2023.
  31. [31] Z. Si, G. Zhang, Q. Ben, B. Romero, Z. Xian, C. Liu, and C. Gan, "DiffTactile: A physics-based differentiable tactile simulator for contact-rich robotic manipulation," in International Conference on Learning Representations (ICLR), 2024.
  32. [32] W. Du, W. Xu, J. Ren, Z. Yu, and C. Lu, "TacIPC: Intersection- and inversion-free FEM-based elastomer simulation for optical tactile sensors," arXiv preprint, 2024.
  33. [33] Y. Zhao, K. Qian, B. Duan, and S. Luo, "FOTS: A fast optical tactile simulator for sim2real learning of tactile-motor robot manipulation skills," IEEE Robotics and Automation Letters, 2024.
  34. [34] Anonymous, "TacEx: GelSight tactile simulation in Isaac Sim – combining soft-body and visuotactile simulators," arXiv preprint arXiv:2411.04776, 2024.
  35. [35] Y. Sun et al., "Tacchi 2.0: A low computational cost and comprehensive dynamic contact simulator for vision-based tactile sensors," arXiv preprint arXiv:2503.09100, 2025.
  36. [36] Anonymous, "Taccel: Scaling up vision-based tactile robotics via high-performance GPU simulation," arXiv preprint arXiv:2504.12908, 2025.
  37. [37] W. Xie and N. Correll, "Towards forceful robotic foundation models: A literature survey," arXiv preprint arXiv:2504.11827, 2025.
  38. [38] J. Pitz, L. R\"ostel, L. Sievers, and B. B\"auml, "Learning purely tactile in-hand manipulation with a torque-controlled hand," arXiv preprint arXiv:2204.03698, 2022.
  39. [39] W. Hu, B. Huang, W. W. Lee, S. Yang, Y. Zheng, and Z. Li, "Dexterous in-hand manipulation of slender cylindrical objects through deep reinforcement learning with tactile sensing," Robotics and Autonomous Systems, vol. 186, p. 104904, 2025.
  40. [40] N. Gu, K. Kosuge, and M. Hayashibe, "TactileAloha: Learning bimanual manipulation with tactile sensing," IEEE Robotics and Automation Letters, vol. 10, no. 8, pp. 8348–8355, 2025.
  41. [41] E. Helmut, N. Funk, T. Schneider, C. de Farias, and J. Peters, "Tactile-conditioned diffusion policy for force-aware robotic manipulation," arXiv preprint arXiv:2510.13324, 2025.
  42. [42] Anonymous, "Text2Touch: Tactile in-hand manipulation with LLM-designed reward functions," arXiv preprint arXiv:2509.07445, 2025.
  43. [43] Various authors, "Contact-grounded policy: Dexterous visuotactile policy with generative contact grounding," arXiv preprint arXiv:2603.05687, 2026.
  44. [44] Various authors, "FBI: Learning dexterous in-hand manipulation with dynamic visuotactile shortcut policy," arXiv preprint arXiv:2508.14441, 2025.
  45. [45] J. Zhao, Y. Ma, L. Wang, and E. H. Adelson, "Transferable tactile transformers for representation learning across diverse sensors and tasks," arXiv preprint arXiv:2406.13640, 2024.
  46. [46] C. Higuera, A. Sharma, C. K. Bodduluri, T. Fan, P. Lancaster, M. Kalakrishnan, M. Kaess, B. Boots, M. Lambeta, T. Wu, and M. Mukadam, "Sparsh: Self-supervised touch representations for vision-based tactile sensing," in Conference on Robot Learning (CoRL), 2024.
  47. [47] Anonymous, "Sparsh-X: Tactile beyond pixels: Multisensory touch representations for robot manipulation," arXiv preprint arXiv:2506.14754, 2025.
  48. [48] A. Feng et al., "AnyTouch: Learning unified static-dynamic representation across multiple visuo-tactile sensors," arXiv preprint arXiv:2502.12191, 2025.
  49. [49] F. Yang, C. Feng, Z. Chen, H. Park, D. Wang, Y. Dou, Z. Zeng, X. Chen, R. Gangopadhyay, A. Owens, and A. Wong, "Binding touch to everything: Learning unified multimodal tactile representations," in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 26340–26353.
  50. [50] L. Fu, G. Datta, H. Huang, W. C.-H. Panitch, J. Drake, J. Ortiz, M. Mukadam, M. Lambeta, R. Calandra, and K. Goldberg, "A touch, vision, and language dataset for multimodal alignment," in International Conference on Machine Learning (ICML), 2024.
  51. [51] S. Yu, K. Lin, A. Xiao, J. Duan, and H. Soh, "Octopi: Object property reasoning with large tactile-language models," in Robotics: Science and Systems (RSS), 2024.
  52. [52] J. Bi, K. Y. Ma, C. Hao, M. Z. Shou, and H. Soh, "VLA-Touch: Enhancing vision-language-action models with dual-level tactile feedback," arXiv preprint arXiv:2507.17294, 2025.
  53. [53] Anonymous, "Tactile-VLA: Unlocking vision-language-action model's physical knowledge for tactile generalization," arXiv preprint arXiv:2507.09160, 2025.
  54. [54] Various authors, "OmniVTLA: Vision-tactile-language-action model with semantic-aligned tactile sensing," arXiv preprint arXiv:2508.08706, 2025.
  55. [55] Y. R. Song, J. Li, R. Fu et al., "OPENTOUCH: Bringing full-hand touch to real-world interaction," arXiv preprint arXiv:2512.16842, 2025.
  56. [56] X. Zhu, B. Huang, and Y. Li, "Touch in the wild: Learning fine-grained manipulation with a portable visuo-tactile gripper," arXiv preprint arXiv:2507.15062, 2025.
  57. [57] F. Yang, C. Ma, J. Zhang, J. Zhu, W. Yuan, and A. Owens, "Touch and go: Learning from human-collected vision and touch," in NeurIPS Datasets and Benchmarks, 2022.
  58. [58] Anonymous, "VT-Refine: Learning bimanual assembly with visuo-tactile feedback via simulation fine-tuning," arXiv preprint arXiv:2510.14930, 2025.
  59. [59] Q. K. Luu, P. Zhou, Z. Xu, Z. Zhang, Q. Qiu, and Y. She, "ManiFeel: Benchmarking and understanding visuotactile manipulation policy learning," arXiv preprint arXiv:2505.18472, 2025.
  60. [60] V. Pattabiraman, Z. Huang, D. Panozzo, D. Zorin, L. Pinto, and R. Bhirangi, "eFlesh: Highly customizable magnetic touch sensing using cut-cell microstructures," arXiv preprint arXiv:2506.09994, 2025.
  61. [61] Anonymous, "Crossing the reality gap in tactile-based learning," arXiv preprint arXiv:2305.09870, 2023.
  62. [62] Various authors, "Semantic-contact fields for category-level generalizable tactile tool manipulation," arXiv preprint arXiv:2602.13833, 2026.
  63. [63] Various authors, "Sim2Real manipulation on unknown objects with tactile-based reinforcement learning," arXiv preprint arXiv:2403.12170, 2024.
  64. [64] Various authors, "Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning," arXiv preprint arXiv:2501.02116, 2025.