Sunday Robotics: Scaling the Home Robot Revolution with Co-Founders Tony Zhao and Cheng Chi
Continuing from the introduction...
The "ChatGPT Moment"
for Physical Reality
Sundae co-founders Tony Zhao and Cheng Chi step onto the stage with a vision that extends far beyond a single prototype. We aren't just talking about a robot; we are talking about a fundamental shift in how machines learn to exist in our world.
"If the robot is cheap, safe, and capable, everyone will want our robot. We see a future where we have more than 1,000,000,000 of these robots in people's homes within a decade."
Where are we now?
We are currently sitting in the gap between the technology breakthrough and the product revolution.
The "GPT" Moment
We have the recipe. The algorithms show signs of life.
Scaling Phase
Gathering data to prove the recipe scales like LLMs did.
The "ChatGPT" Moment
A usable consumer product (Memo) emerges.
Breaking the Loop
Classical robotics relied on a fragile "Sense → Plan → Act" loop. Modern AI robotics utilizes Imitation Learning to generalize behaviors across tasks.
The Secret Sauce
Diffusion Policy
Robotics used to require perfection. If a researcher didn't move exactly the right way during data collection, the model failed.
Diffusion Policy changed the game. It allows the robot to learn from "messy" data—capturing multiple modes of behavior. It means untrained people can now teach robots, unlocking the data scale needed for a foundation model.
"We can have multiple people, sometimes even untrained people, collecting data, and the result will still be great. That really unlocked scalable training." — Cheng Chi
Up Next
We've established the vision and the software engine (Diffusion Policy). But software needs a body. How do we translate this policy into physical action? Coming up: The Role of ACT and ALOHA →
From Algorithms to Physicality
While Diffusion Policy provided the algorithmic brain, the physical bottleneck remained. How do you teach a robot dexterity without the nausea-inducing lag of VR teleoperation?
ALOHA & ACT: Gamifying Teleoperation
Before ALOHA, collecting robotic data felt like moving through molasses—VR setups introduced delays that disconnected the human operator from the robot's hand.
The Solution: A low-latency, bi-manual setup that feels less like a lab experiment and more like a video game. This immediate feedback loop allowed for the collection of truly dexterous data.
Why it matters: Once the data quality improved, the team could finally ditch simple MLPs for Transformers. It turned out, deep learning wasn't failing in robotics because of the architecture—it was starving for high-fidelity data.
Concept: Action Chunking
Humans don't think in milliseconds. We perceive, plan a trajectory, and move.
- ✕ Predicting single actions every millisecond (Jittery)
- → Predicting whole trajectories (Smooth)
Result: Consistent, fluid motion akin to biological movement.
Enter UMI: Data in the Wild
Universal Manipulation Interface
Robots fail in the real world because they only learn in labs. To fix this, the team stripped the robot away entirely.
Using a 3D-printed gripper and a GoPro, they collected data in restaurants, cafeterias, and outdoors. No robot arm required—just pure observation and action data.
THE SUNLIGHT LESSON
The robot worked perfectly everywhere except under direct sunlight. Why? Because the data was collected during two weeks of rain. Distribution matching is non-negotiable.
Figure: The shift in data acquisition paradigm.
"Maybe we should start a company. This is actually working so well."
They had generated SOTA results with a $200k academic budget, outperforming capital-heavy approaches simply by solving the data bottleneck. The technology was ready to leave Stanford.
Bridging the Gap
We’ve explored the algorithms—ACT, ALOHA, and UMI—that allow machines to learn. But algorithms need a vessel. From a desk in a Stanford apartment to a growing team, the mission shifts from pure research to the messy reality of building Sunday: a robot designed not for a factory, but for your living room.
The Anti-Terminator.
Why the future of home robotics looks less like sci-fi chrome and more like a Pixar character.
Form Follows Friendliness
When designing a ubiquitous machine—something you see every morning while making coffee—industrial aesthetics fail. Sunday’s design philosophy rejects the "precise, stiff, blind" nature of factory robots.
"It should have a cute face... Instead of a Terminator doing your dishes, we want the robot to feel like it's out of a cartoon movie."
The Three-Finger Thesis: Why replicate the human hand perfectly? It’s costly and often unnecessary. By fusing fingers into a three-digit gripper, Sunday achieves 90% of human utility (grasping handles, opening dishwashers) at a fraction of the cost.
The Precision Paradox
Traditional robots are blind but mechanically perfect. Sunday flips the script.
- → Hardware: Cheap, compliant, "sloppy" actuators. Inherently safe but imprecise.
- → Software: High-fidelity AI Vision.
Result
"The eyes correct the body's mistakes."
The Hidden Grind: Iterating the Glove
Data Quality MoatIt looks easy to replicate, but the barrier to entry is the integration of hardware and data pipelines. The "Sunday Glove" (data collection device) underwent massive iteration to survive the creativity of 500+ human operators.
The Road to 2026
Now
Internal Prototyping & "Grinding out details"
2026
Beta Program Launch
Real homes. Real kids. Real chaos.
The 500-Person Fleet
We have the robot. We have the mission. Now, how do we feed it enough high-quality data to make it smart? →
From Hardware to Mind
Having established the physical philosophy and shipping timeline, the question remains: how does Sunday teach a machine to understand the chaos of the real world? The answer lies in a staggering volume of human motion.
The 10 Million
Trajectory Brain
Data Scale
While academia often relies on isolated "pick up a cup" tasks, Sunday has collected nearly 10 million long-horizon trajectories in the wild—walking, navigating, and manipulating simultaneously.
The Great Methodology Divide
Not all robot brains are built the same. Sunday’s team discovered a counter-intuitive truth: the "Glove" form factor (a handheld imitation device) produces better data than rigid teleoperation rigs.
"The glove encourages people to do more natural movement... resulting in more intelligent behavior."
This creates a schism in robotics training:
- — Imitation Learning (The Glove): Superior for manipulation. Capturing the nuance of a hand grasping a soft cup is easier to demonstrate than to simulate.
- — Reinforcement Learning (RL): Superior for locomotion. Walking involves rigid body physics (feet hitting ground) which is easy to simulate, but the reactive behavior is hard to hand-code.
The Simulation Paradox
Why Sunday uses different brains for feet vs. hands*Manipulation requires modeling fluid dynamics and deformation (Hard Sim), whereas Locomotion requires complex reactive balance (Hard Behavior).
The Hidden Engineering Cost
Scale reveals fragility. At 10 million trajectories, you cannot manually check the data. Sunday had to build automated calibration systems to detect if a glove sensor is drifting before it pollutes the hive mind.
"You don't need a human staring at the data to know something is wrong."
Coming Up Next
With the data pipeline secured, we turn to the hard limits of reality: The Technical Challenges.
Bridging the Gap
Having established the sheer scale of Sunday’s training data, the conversation shifts to the crucible of execution. Data is the fuel, but the engine—the "training recipe" and the physical hardware—must now withstand the pressure of reality.
Hardware is Hard.
Why building a robot company requires avoiding "cute research" in favor of a brutal feedback loop.
Challenge 01
The Scaling Recipe
The field has just entered the realm of massive data availability. The challenge is no longer finding data, but defining the exact "training recipe" to extract robust behaviors from it.
"We want to avoid doing research that doesn't scale. No cute, fancy ideas. We focus on infrastructure first."
Challenge 02
The Hardware Friction
When the learning team pushes the software, the hardware breaks. This is why the "Full Stack" approach is non-negotiable.
- Mechanical Team builds hardware.
- Learning Team pushes performance envelope.
- Hardware fails -> Immediate internal iteration.
The Path to Ubiquity
2026: The Beta Program
Selected users receive real robots in their homes. The goal: understand human-robot interaction and define what chores are actually valuable.
2027 – 2028: Commercial Launch
Depending on beta results. High standards for reliability and cost. "It is definitely not a decade away."
"I just really hate dishes. The world we'll live in is going to be cleaner. The marginal cost of labor in homes goes to zero."
The Economics of Scale
Current Prototype vs. Target Consumer Price
Insight: The actuators are already cheap. The cost driver today is low-volume cladding (CNC/painting). Injection molding at scale brings material cost under $10k.
From Theory to Practice
Why past demos failed & Sunday's new approach →
Continuing the journey from technical hurdles to visible proof...
The Illusion of Competence:
Why Most Robot Demos Lie to You.
Make zero assumptions.
No priors.
When you see a robot hand a cup to a person, human instinct fills in the gaps. We assume it can hand any cup to any person. Usually, it can only move that specific object to that specific coordinate.
The Complexity Ladder
Failure probability compounds with every interaction in a sequence.
Sunday's "Real World" Trinity
Data-Driven Validation1. The Messy Table (Long Horizon)
Cleaning a table isn't one task; it's a mobile manipulation symphony. Moving from high (table) to low (dishwasher), handling fragile glass, and disposing of food waste.
- Picking up two wine glasses in one hand (requires precise force).
- High stakes: Squeeze too hard, it shatters. Push wrong, it explodes.
2. Extreme Dexterity
Folding socks and operating espresso machines requires feeling, not just seeing.
"When you teleoperate, your hand is numb. You can apply infinite force and not know it."
Sunday's glove captures Force Closure—the loop of tactile feedback needed for soft items.
3. The AirBnB Test (Zero Shot)
The ultimate test of utility. The robot is dropped into a random AirBnB with zero training data from that specific home.
Continuing the Thread
Having moved past the failures of early demos and successfully showcasing Sunday's current capabilities, the conversation shifts to the engine behind this progress: the talent. Building a system that integrates hardware, software, and AI requires a specific, rare breed of engineer.
The Full Stack
Roboticist
Redefining the Stack
Sunday isn't looking for specialists to stay in their lanes. They are defining a new archetype: the Full Stack Roboticist.
In traditional tech, "full stack" means database to frontend. In robotics, the stack is physical. To optimize the system, you cannot be boxed into a cubicle. You must traverse the boundaries between:
- Mechanical Engineering
- Electrical Systems
- Code & Logic
- Data Science
Multidisciplinary Convergence
"We train software engineers to become roboticists."
"I realized... the bottleneck is actually how the robot will move. That's programming. Then I realized there's machine learning... It's very natural for me to gradually expand my skill set because I'm always looking forward to build a robot."— Speaker 2, on the evolution of a roboticist
Why Join?
"Whatever you can imagine about robotics, consumer products, and machine learning, you can find it here."
Keep Listening
Enjoyed the deep dive? Subscribe to No Priors for weekly episodes on the frontier of AI and robotics.