硅谷101

E217 | The Holy Grail of Robotics: Cracking the Code of Dexterous Hands

12/11/2025

Dexterous Hands:
Not Just Moving

“There's a saying in the industry: How fast a robot walks depends on its ‘brain’; but how delicate its work is depends entirely on those ‘hands’.”

HONGJUN / Silicon Valley 101

We've seen too many robot demos—vacuuming, taking out the trash, even pouring drinks at the Tesla event. It looks like these hands can do anything, but does that really represent ‘dexterity’?

HAOZHI / UC Berkeley PhD

Actually, those are simple operations under ‘human command.’ Like pouring a drink, the hand just presses down on the handle. The real challenge lies in: how to use fingers like a human for fine motor skills, and adapt to the different tools in millions of households.

01 Fine Motor Skills

It’s no longer just simple ‘grasping,’ but the coordination, rotation, and fine-tuning between fingers. This is a qualitative leap from being ‘powerful’ to being ‘perceptive’.

02 Generalization Capability

Opening a Coke in a lab isn't a win. Being able to open any model of Coke can, in any lighting and at any angle—that’s what you call generalization.

03 Hardware Reliability

Even points out that hardware must operate stably for long periods without damage. Increasing Degrees of Freedom (DoF) adds complexity, and complexity is often the enemy of reliability.

“Opening a Coke isn't just about fingernails,
it’s the ultimate test of perception, force, and dual-arm coordination.”

Capability Evaluation Dimension Chart

Editor's Note

Engineering details on “Opening a Coke”

Even mentioned that this involves an extremely difficult movement: In-hand Manipulation. The robot needs to adjust the angle of the object with one hand while the other hand precisely aligns with the pull tab, all while sensing pressure in real-time to prevent crushing the can.

As Haozhi said, current demos often reduce difficulty through ‘teleoperation.’ But in the long run, what we need is a universal algorithm capable of autonomous learning and adapting to various configurations.

02:37 - 05:58

NEXT CHAPTER

Behind the “Opening a Coke” Demo: The Gap Between Demonstration and Actual Capability →

— Step Into the Reality —

Don't be fooled by that bottle of Coke

We just finished talking about the “three gold standards” for evaluating dexterous hands, but the reality is that the industry is currently caught in a “Demo Fever.”

Demonstration Effect vs. Actual Capability

“I've seen too many ‘Coke opening’ videos. In the shot, the dexterous hand elegantly twists off the cap, the liquid splashes, and the crowd cheers. But is this really the ‘dexterity’ we're looking for?”

Exclusive Hot Take:

“Many demos are actually ‘special operations.’ They've undergone thousands of rounds of overfit training for specific cap sizes, specific friction, and specific starting poses. Swap it for a Sprite? It might just ‘crush’ it instantly.”

Capability Gap Visualization: Demo vs. Generalization

The Algorithm Team's “Generalization Anxiety”

For algorithm developers, the hardware just needs to be “good enough.” Their real pain is figuring out how to stop the hand from grabbing randomly like an idiot when faced with unfamiliar objects. They are pursuing a Foundation Model for Manipulation.

Hardware Manufacturers' “Single-Point Breakthrough”

“Can we just stack up the Degrees of Freedom first?”

The hardware camp is obsessed with motor torque, response frequency, and sensor density. Their logic is: without an ultimate body, even the strongest soul has nowhere to reside.

Dexterous Hands: The “Ultimate Everest” of Robot Hardware

Why are dexterous hands a hundred times harder to make than arms? First, there's the space constraint. You have to cram over a dozen motors, transmission mechanisms, circuit boards, and countless sensors into a space the size of an adult's palm. It’s like carving a masterpiece on a walnut shell.

Second is the paradox of force feedback. If you want to pick up an egg, you need extreme sensitivity; if you want to unscrew a rusted nut, you need massive explosive force. Most solutions on the market today struggle immensely between these two.

EDITOR'S NOTE

ILDA Hand (Integrated Linkage-driven Dexterous Anthropomorphic)

Developed by Hanyang University, it uses a complex linkage mechanism to achieve high degrees of freedom. It features incredible gripping force (it can even cut through metal wire), but the complexity of the linkage mechanism also brings huge challenges in maintenance and non-linear control.

Thus emerged the Linkage Drive solution. This approach is as precise as a mechanical watch, transmitting power through hard connections. The representative ILDA hand can even demonstrate 15 degrees of freedom. But its problem is also obvious: once a single joint jams, the whole hand might be toast.

“Current dexterous hands don't lack ‘fingers’,
but rather an interaction logic that is as
flexible and sensitive as human skin.”

Next Chapter

Looking at Schools of Thought via “Muscle”: Direct Drive Sharpa vs. Cable Drive Tesla →

Power Over Finesse vs Wire Ballet

From the brutalist aesthetics of direct drive to the extreme weight reduction of cable drive

Having just discussed the "old-school, no-nonsense" mechanical feel of linkage drives, we have to talk about today's "flavor of the month"—Direct Drive Solutions. Hands like the Sharpa take a very straightforward approach: one motor per joint, cutting out the middleman. The sensitivity and force control precision of this setup are second to none, but the trade-off is obvious—your hand ends up looking like a sledgehammer.

If you're chasing the ultimate "human-like" structure, you can't get aroundCable-driven systems. These are like our human tendons: the motors are tucked away in the forearm, controlling the fingers remotely via thin cables. Things get complicated here: the Shadow Hand uses bidirectional pull cables, as precise as a handsaw; meanwhile, Tesla is taking a much more radical path...

Direct Drive (Sharpa-style)

1:1 mapping between joints and motors, ultimate response speed

✓ Zero backlash, incredibly smooth control
✓ The "golden child" of simulation environments; algorithms run like a breeze
✗ Massive palm volume, hard to fit into tight spaces

BRUTE

Cable-driven (Shadow/Tesla-style)

Biomimetic tendon structure; rear-mounted motors free up terminal space

✓ Perfect weight distribution, fingers as light as a feather
✓ Capable of achieving ultra-high Degrees of Freedom (20+ DoF)
✗ Cable wear and hysteresis are a control engineer's nightmare

ELITE

Why does the research community
favor "Direct Drive"?

It's actually an open secret. People doing simulations fear "uncertainty" more than anything. In cable-driven systems, cables stretch, they friction, and they have hysteresis—things that are incredibly difficult to model in a physics engine.

"If you nail a direct-drive hand in a simulator, you've probably got a paper; if you go with cable-driven, you might spend half a year just tuning physics parameters."

Editor's Note / Glossary

Backlash

Refers to the clearance or lost motion in a mechanism caused by gaps between parts when the direction of motion is reversed. In dexterous hands, backlash is the archenemy of precision. By eliminating intermediate transmissions, direct drive solutions bring backlash down to nearly zero, which is why they are considered the "gold standard" in research.

"Those $100,000 dexterous hands,
aren't even intended tomake a profit."

Why is the Shadow Hand so expensive? It's not just the hardware cost. It’s actually a "top-tier filtering mechanism": it filters for the world's most elite labs. The manufacturer is providing a "research ticket," using a high unit price to cover extremely complex post-sales service costs. This isn't a consumer product; it's a supercar for the lab.

Next Chapter

Evan's First-hand Account: My Days Developing the Optimus Hand at Tesla →

From anatomy to surgical observation: How Musk's biomimetic route brute-forces technical challenges

Tesla Tales:Optimus - The Logic Behind its Birth

The crazy leap from anatomy labs to assembly lines

We just mentioned that those "sky-high price" dexterous hands on the market are filtering for top-tier clients, but Tesla operates on a completely different logic. During the development of Optimus, we weren't just making an "expensive tool"; we were reshaping the "extension of the human." That shift in logic began the moment Musk ushered us into a surgical observation room.

DEEP DIVE Why must it be human-like? Musk's "Anatomical" Obsession

Many ask, why must Optimus look human? It's not just for aesthetics. Musk's insistence at the time was very direct: **Since the world is designed for humans, the most efficient general-purpose robot must replicate the human structure.**

To understand the essence of the "hand," our team actually went to observe human surgeries. Only when you see tendons weaving through narrow spaces and the layout of nerves with your own eyes do you realize how crude traditional "in-joint motor" designs really are.

This is where the "forearm motor migration" concept came from.

We moved all the heavy brushless motors out of the tiny finger joints and onto the "forearm," using a complex cable system to control the fingers. This made the hand incredibly light, but it also brought about hellish assembly difficulty.

Editor's Note: Forearm Motor Migration

Traditional dexterous hands often cram miniature motors into the finger joints (like the Shadow Hand), but this limits grip strength and increases finger inertia. The Tesla solution places the power source in the "forearm," transmitted via "cable drive" to simulate the relationship between human muscles and tendons, thereby achieving extremely high power density.

Insight

"We're not just building machines; we're replicating the lever logic evolved over hundreds of millions of years."

Cable Drive vs. Direct Drive: The Mass Production Gamble

Cable Drive (Optimus Solution)

+ Extremely high power density, lightweight fingers
+ Highly compact structure, visually more human-like
- Wire rope wear and tension maintenance are extremely difficult
- Assembly man-hours are measured in 'days'

Direct Drive (General Industrial Solution)

+ Linear control, extremely high reliability
+ Modular production, easy to maintain
- Bulky fingers, difficult to achieve complex dexterity
- Weight distribution poses a major challenge to end-of-arm payload

Evan's Hot Take

“Everyone knows direct drive is easy to do, but Musk wants '0 to 1.' If we can't solve the cable lifespan issue before mass production, this hand will become Optimus' biggest Achilles' heel.”

The 'Death Valley' of Production Efficiency

Comparing the balance between 'assembly complexity' and 'functional ceiling' in traditional vs. Tesla solutions. As flexibility increases, assembly costs grow exponentially.

Exclusive Reveal

“Meta's robotics project is more like an
idealistic laboratory.”

Compared to Tesla's 'must-hit-the-line' urgency, what I saw at Meta was more about exploring the boundaries of algorithms. They aren't in a rush to have robots moving boxes; they care more about: if you give a robot a GPT-level 'brain,' just how fine-tuned can these hands become?

“

NEXT UP

Since hardware has already been pushed to the anatomical limit, where is the real bottleneck?
Data. Without high-quality grasping data, even the most perfect hand is just a pile of scrap metal.

We were just debating whether Meta's tendon-driven system is easy to repair, but honestly, the hardware 'arms race' is just a skirmish. The real storm is that the AI crowd suddenly realized: if GPT can understand text and Tesla's FSD can understand streets, why not use the same method to teach robots to 'perceive' the world with their hands?

Paradigm Shift

From 'hard-coded' to 'brute-force aesthetics': The embodied AI revolution inspired by GPT and FSD.

“Current robotics research is no longer about robotic arm kinematics, but rather adatafeeding frenzy.”

You have to understand, we used to build dexterous hands by calculating every joint and every degree of freedom (DoF) meticulously. But GPT set an example for the world: as long as the model is large enough and there's enough data, logic will 'emerge' automatically.

This 'End-to-End' mindset has completely overturned the original set of complex control algorithms.

Especially after Tesla FSD v12, everyone finally got it. Since vision-based solutions can handle complex autonomous driving, then having a dexterous hand grab a cup or tie a shoelace is essentially a 'pixels-to-actions' mapping. The focus now isn't on how to design more precise reducers, but rather—where do we get all that training data?

Data Acquisition Difficulty: The Dimensional Itch

Why is data collection for dexterous hands so incredibly difficult?

Driving only requires controlling the steering wheel, gas, and brakes—at most 3 core dimensions. But a dexterous hand has over 20 degrees of freedom! Want to teach it to peel an egg? Sorry, the internet doesn't currently have this kind of 'action-labeled' video data. You can't just 'scrape' data for free like GPT crawls the web; every movement requires a human with equipment to teach it hand-over-hand. It's too slow—frustratingly slow.

Player's Guide: Who is Defining the Future?

The Academics

Such as Stanford, UC Berkeley

More focused on the universality of algorithms, attempting to pull off all sorts of slick maneuvers in virtual environments through 'reinforcement learning.' But the question is: can the physical rules in a simulator map perfectly to reality?

The Productists

Tesla, Figure AI

Don't worry about all that—get the hardware out first. Brute-force it with massive amounts of teleoperation data. Believers in brute-force aesthetics, they subscribe to the faith that 'sheer scale creates wonders.'

The Dark Horse

OpenAI (Robot Team Reboot)

Since LLMs already have logic, if you hook them up to 'touch' and 'vision,' will they learn to use hands overnight?

“Actually, I've always had a question: is relying solely on eyes (cameras) really enough? Many fine-tuned tasks are simply impossible to do without 'feeling' them.”

“Exactly! This is Tactile's value. A dexterous hand without tactile sensing is like embroidering under heavy anesthesia; it can see where the needle is, but it can't feel the resistance.”

Tactile: The 'Last Piece of the Puzzle' for Dexterous Hands

Top teams today are outfitting fingertips with sensors like GelSight. It doesn't just measure pressure; it perceives texture, slippage, and subtle deformations. This kind of data is extremely proprietary and cannot be obtained from YouTube videos.

Editor's Note

GelSight

A vision-based tactile sensor. The principle is to use a camera to observe the deformation of a flexible silicone surface, thereby converting 'touch' into 'visual images,' allowing existing computer vision models to process tactile data seamlessly.

Next Chapter

The Dexterous Hand Data Pyramid: From Teleoperation to the Genie3 Video Model →

The Alchemy of Data

Since tactile sensing gives dexterous hands the possibility of "perception," the next question becomes razor-sharp: where exactly should this fine-grained manipulation data be "fed" from?

Layer 1: Human Teleoperation

This is the "purest" data; every frame contains the manipulation wisdom evolved by humans over millions of years. But the problem is: it's too expensive. You can't exactly hire ten thousand people to sit there every day wearing headsets and "fiddling" with parts, can you?

Gold Quality Difficult to Scale

Layer 2: Physics Simulation

In God mode, we can run ten thousand environments in parallel. But the "Reality Gap" (Sim-to-Real) is like a wall; things like water or flexible objects in simulations often "fall apart" when they hit reality.

Infinite Throughput Accuracy Challenges

Layer 3: Internet Video

Hundreds of millions of manipulation videos on YouTube. Their volume is massive, but the biggest pitfall is—there are no "Action" labels. A robot can watch a video and learn that "the hand is moving," but it won't know "how much force to apply."

Massive Scale Missing Labels

The Data Pyramid Trade-off: Quality vs. Scale

Is a video model like Genie3 really the savior for dexterous hands?

Right now, everyone is talking about Genie3 and how to distill robotic control strategies from video. The "sexy" logic here is: if robots can learn all human actions just by watching movies, we won't need expensive equipment anymore.

"My take might be a bit 'hardcore'—if you're just predicting pixels without modeling force and tactile feedback in the physical world, you'll forever be just imitating a 'shadow' rather than mastering a 'skill'."

What robots need is Action-conditioned Video Generation. In other words, the model needs to know what the deformation of an object in the next video frame should be when it performs a "pinch." This isn't just generation; it's learning a physics engine.

Inside Berkeley

"Doing robotics at Berkeley, the most interesting thing is that 'anti-consensus' atmosphere. People don't blindly worship large models; instead, they spend a lot of time messing around with a pressure sensor that might only be worth a few dollars."

"Exactly, the research here is more like a mix of 'brute-force aesthetics' and 'exquisite design.' On one hand, we pursue universality, while on the other, we are obsessed with the limits of hardware."

The Final Verdict

The GPT moment
for dexterous hands lies not in how large the model is,
but inhow fast the data loopis.

Only when teleoperation, simulation, and video can mesh together like gears, allowing robots to self-correct from failure, will that so-called "moment" truly arrive.

Editor's Note: Sim-to-Real

Refers to the process of transferring algorithms trained in computer simulation environments directly to the physical real world. This is one of the most difficult hurdles to cross in robotics.

Now that we've deconstructed hardware, perception, and data, this discussion on dexterous hands is nearing its conclusion.
In the next chapter, we will provide a final summary to see what the robots of the future will actually look like.

Related Episodes

Dexterous Hands: Not Just Moving

01 Fine Motor Skills

02 Generalization Capability

03 Hardware Reliability

Capability Evaluation Dimension Chart

Engineering details on “Opening a Coke”

Don't be fooled by that bottle of Coke

Demonstration Effect vs. Actual Capability

Capability Gap Visualization: Demo vs. Generalization

The Algorithm Team's “Generalization Anxiety”

Hardware Manufacturers' “Single-Point Breakthrough”

Dexterous Hands: The “Ultimate Everest” of Robot Hardware

ILDA Hand (Integrated Linkage-driven Dexterous Anthropomorphic)

Looking at Schools of Thought via “Muscle”: Direct Drive Sharpa vs. Cable Drive Tesla →

Power Over Finesse vs Wire Ballet

Direct Drive (Sharpa-style)

Cable-driven (Shadow/Tesla-style)

Why does the research communityfavor "Direct Drive"?

Backlash

"Those $100,000 dexterous hands, aren't even intended tomake a profit."

Evan's First-hand Account: My Days Developing the Optimus Hand at Tesla →

Tesla Tales:Optimus - The Logic Behind its Birth

DEEP DIVE Why must it be human-like? Musk's "Anatomical" Obsession

Editor's Note: Forearm Motor Migration

Cable Drive vs. Direct Drive: The Mass Production Gamble

The 'Death Valley' of Production Efficiency

“Meta's robotics project is more like anidealistic laboratory.”

Since hardware has already been pushed to the anatomical limit, where is the real bottleneck? Data. Without high-quality grasping data, even the most perfect hand is just a pile of scrap metal.

Paradigm Shift

Data Acquisition Difficulty: The Dimensional Itch

Why is data collection for dexterous hands so incredibly difficult?

Player's Guide: Who is Defining the Future?

The Academics

The Productists

The Dark Horse

Tactile: The 'Last Piece of the Puzzle' for Dexterous Hands

GelSight

The Dexterous Hand Data Pyramid: From Teleoperation to the Genie3 Video Model →

The Alchemy of Data

Layer 1: Human Teleoperation

Layer 2: Physics Simulation

Layer 3: Internet Video

The Data Pyramid Trade-off: Quality vs. Scale

Is a video model like Genie3 really the savior for dexterous hands?

The GPT moment for dexterous hands lies not in how large the model is, but inhow fast the data loopis.

Editor's Note: Sim-to-Real

Related Episodes

Your Catchy English Title

Tesla vs. Waymo: The High-Stakes Battle for Self-Driving Supremacy

AI 2025-2026: Consensus, Conflict, and the Next Frontier of Scaling Laws

e220-hyrox-4z8s5

The €14 Billion Hermès Mystery: Heirs, Gardeners, and the War for Luxury

e218-netflix-b97ly

Dexterous Hands:
Not Just Moving

Why does the research community
favor "Direct Drive"?

"Those $100,000 dexterous hands,
aren't even intended tomake a profit."

“Meta's robotics project is more like an
idealistic laboratory.”

Since hardware has already been pushed to the anatomical limit, where is the real bottleneck?
Data. Without high-quality grasping data, even the most perfect hand is just a pile of scrap metal.

The GPT moment
for dexterous hands lies not in how large the model is,
but inhow fast the data loopis.