Chinese smartphone maker Oppo has unveiled X-OmniClaw, an open-source AI agent framework for Android that operates directly on your device, bypassing the need for cloud servers. Unlike traditional mobile AI systems that run on virtual Android copies in the cloud, X-OmniClaw leverages your phone's own hardware-camera, microphone, and screen-for real-time context awareness.

- Figure 1 -
- Figure 1 -

The framework is built on three core pillars: Omni Perception, Omni Action, and Omni Memory. Omni Perception combines camera feeds, screen content, and voice input into a single pipeline, allowing the agent to understand what you're looking at and what's on your screen. Omni Memory ensures continuity across tasks and sessions by building a long-term semantic memory from your photo gallery. Omni Action handles execution, using XML interface data and an on-device visual model to accurately tap and scroll through apps.

- Figure 2 -
- Figure 2 -

Practical applications include identifying a product via camera and opening a shopping app to search for prices, or scanning the gallery to find parrot-themed photos and using a deeplink to open CapCut for video editing. A behavior cloning feature allows users to record a navigation path once, which the agent can then replay instantly.

- Figure 3 -
- Figure 3 -

X-OmniClaw builds on the open-source HermesApp codebase and draws inspiration from OpenClaw's structured skill model. Oppo plans to release all assets and continue updating the project.