The ‘Buy vs. Build’ Dilemma for Physical Intelligence Foundation Models

Sourcing the Robot Brain: Solving the Buy vs. Build Dilemma in Embodied AI

The era of training bespoke robotic control policies is ending. For robotics executives, the choice between building a proprietary Physical Intelligence model or licensing a foundation model defines the next decade of solvency.

The robotics industry is currently bleeding capital on a problem that has already shifted beneath its feet. For the last decade, the standard operating procedure for robotics companies was full-stack vertical integration: build the hardware, write the control logic, and train the vision systems. This approach is now a liability.


We are witnessing the commoditization of “robotic cognition.” With the emergence of Physical Intelligence (PI) foundation models—systems that understand physics, gravity, and manipulation as well as LLMs understand grammar—the competitive advantage of training your own small-scale model is vanishing. If you are a CTO or VP of Engineering in automation, you face a binary threat: continue burning millions on a “Build” strategy that yields inferior generalization, or pivot to a “Buy” strategy that risks vendor lock-in. This article dismantles the old logic and provides a decision framework for the era of Cortex-as-a-Service.


A robotic hand reaching out to grasp a glowing, digital neural network structure floating in mid-air.
GenAI Prompt: A hyper-realistic close-up of a mechanical robotic hand reaching towards a floating, ethereal digital brain. The brain is composed of blue and gold data streams. Dark industrial background. Cinematic lighting highlighting the gap between hardware and software.

1. The Regime Shift: The Death of Task-Specific Logic

For twenty years, robotics operated under the “Task-Specific Regime.” If you wanted a robot to fold laundry, you built a laundry-folding model. If you wanted it to weld, you built a welding model. The logic was brittle, hard-coded, or trained on narrow datasets (Reinforcement Learning from limited trials).


That era is over. We have entered the General Purpose Embodiment Regime.

Just as GPT-4 rendered custom NLP spam filters obsolete, Physical Intelligence Foundation Models (like those emerging from Google DeepMind, Covariant, or VILA) are rendering task-specific control policies obsolete. These new foundation models transfer learning across domains. A model that learns to manipulate a cup can now infer how to manipulate a beaker without zero-shot training.


The shift is economic as much as it is technical. The capital expenditure required to train a state-of-the-art PI model—requiring petabytes of video and proprioceptive data—is rapidly outpacing the R&D budgets of even large robotics firms. The “Build” path is no longer just hard; it is becoming the exclusive domain of hyperscalers.


2. The Core Misbelief: “Our Data Is Our Moat”

The most dangerous lie currently circulating in boardroom strategy sessions is this: “We cannot buy a model because our proprietary sensor data is our competitive moat.”

This is a fundamental misunderstanding of how foundation models scale. In the era of small models, your specific dataset (e.g., warehouse bin picking logs) was indeed valuable. In the era of Foundation Models, your dataset is a drop in the ocean.

Physical Intelligence models thrive on diversity, not just depth. A model trained on millions of varied interactions (cooking, assembly, logistics, cleaning) will eventually outperform a model trained on billions of identical bin-picking interactions. Why? Because the generalist model understands the physics of friction and geometry better.


Unless you possess a fleet of 100,000 robots collecting diverse real-world data across heterogeneous environments, your data is not a moat. It is merely a calibration set. Holding onto the “Build” strategy to protect this data is like building your own search engine to protect your internal wikis.


3. The New Constraint: The Moravec Gap

If you choose to buy, you face a new constraint that didn’t exist in the pure software world: the Moravec Gap (an extension of Moravec’s Paradox).

LLMs can hallucinate without physical consequence. A PI model cannot. If a robot “hallucinates” a grasp, it breaks the product or the hardware. The constraint for the next five years isn’t intelligence; it is inference latency and reliability.

Foundation models are heavy. Running a multi-modal transformer that processes vision, language, and proprioception in real-time (sub-20ms loops) requires massive compute. The constraint is no longer “can the robot figure it out?” but “can the robot figure it out before the conveyor belt moves?”


Companies opting for the “Buy” model must accept that they are trading control over the model architecture for a dependency on the provider’s API latency and edge-compute efficiency.

Visual representation of the Moravec Gap showing latency between cloud and edge.
GenAI Prompt: A split screen visualization. On the left, a massive glowing cloud server. On the right, a sleek industrial robot arm. A red laser beam connects them, labeled ‘Latency’. The robot arm is slightly blurred to indicate motion, while the cloud is static. High contrast, tech-noir style.

4. The New Mental Model: The “Cortex-Body Decoupling”

To navigate this decision, you need a new framework. Stop viewing the robot as a singular product. View it as a decoupled stack.

The Decoupling Framework

  • The Cortex (Buy/License): The high-level reasoning engine. It understands commands (“Pick up the red apple”) and physics. This is a commodity provided by OpenAI, Google, or specialized PI firms.
  • The Spinal Cord (Build/Fine-Tune): The mid-level controller. It translates the Cortex’s intent into specific kinematic movements for your specific hardware configuration. This is where your IP lives.
  • The Body (Commodity): Actuators, sensors, and chassis. Hardware is becoming standardized.

The Strategy: Buy the Cortex, Build the Spine.

Don’t try to train the model to understand what an apple is (Cortex). Train the model to know exactly how much torque your specific gripper needs to apply to not bruise the apple (Spine). This hybrid approach leverages the reasoning power of hyperscalers while retaining the reliability and hardware-specificity of internal engineering.


5. Signals from Reality

The market is already validating this split.

  • Figure 01 & OpenAI: Figure AI did not attempt to build the reasoning engine from scratch. They partnered with OpenAI for the visual-language reasoning, allowing them to focus entirely on the hardware dexterity and control loops (The Spine).
  • Covariant’s RFM-1: Covariant is positioning itself purely as the “Cortex” provider, selling the Robotics Foundation Model to hardware manufacturers who realize they cannot compete on data volume.
  • Tesla Optimus (The Exception): Tesla is the only outlier attempting the full “Build” stack. However, they have the “fleet constraint” solved—millions of cars collecting data. Unless you have Tesla’s scale, copying Tesla’s strategy is suicide.
Abstract diagram of the Cortex-Spine-Body framework.
GenAI Prompt: An isometric 3D exploded view of a robot. Top layer: A glowing sphere (The Cortex). Middle layer: A complex mechanical spinal column (The Spine). Bottom layer: Generic robot limbs (The Body). Arrows flow downwards. Clean white background, engineering blueprint aesthetic.

6. Risks & Trade-offs

Adopting the “Buy/License” model is not without peril. You trade CapEx risk (bankruptcy from training costs) for OpEx risk (margin erosion from API licensing fees).

Furthermore, you risk Skill Atrophy. If your engineering team stops training models, they lose the intuition required to debug them. When the foundation model fails, you are at the mercy of the vendor’s support ticket system. To mitigate this, successful firms maintain a “Shadow Build” capability—small, specialized models kept in reserve for critical failures or edge cases where the generalist model fails.


7. Executive Takeaway

The window to be a “Full Stack AI Robotics” company is closing. The future belongs to System Integrators of Intelligence.

If your company’s valuation is under $10 Billion, stop trying to build the brain. Focus on the hands. Focus on the integration. Focus on the application layer where the robot meets the real world. Secure a license for a Tier-1 Physical Intelligence model immediately, and redirect your R&D budget toward fine-tuning that model for your specific hardware kinematics. In the gold rush of embodied AI, the winners aren’t the ones inventing the shovel; they are the ones who know exactly where to dig.


Related Insights