Thinking Machines Lab Inc., the AI research startup founded by former OpenAI CTO Mira Murati, has announced a research preview of its first 'interaction models.' These multimodal AI systems are designed to eliminate the awkward pauses common in current turn-based AI interactions.
The core innovation is a new model architecture enabling full-duplex communication. Instead of waiting for a user to finish speaking before processing a response, the system processes input and output in tiny 200-millisecond chunks. This allows it to react to visual and auditory cues in real time, even while speaking.
The system uses a dual-model approach. TML-Interaction-Small, a 276-billion parameter Mixture-of-Experts model, handles dialogue and immediate follow-ups. An asynchronous Background Model manages complex reasoning, web searches, and tool calls. The company uses encoder-free early fusion to process raw audio and video directly, reducing latency.
Thinking Machines claims TML-Interaction-Small achieves a turn-taking latency of under 0.4 seconds on the FD-bench benchmark, outperforming Google's Gemini-3.1-flash-live (0.57 seconds) and GPT-realtime-2.0 (1.18 seconds).
Enterprise applications are a key focus. Models that see and react in real-time could monitor video feeds in labs or factories, alerting humans to safety violations instantly. The model also has an internal sense of time, allowing for time-sensitive requests like 'alert me if this chemical reaction takes longer than the last one.'
Thinking Machines says TML-Interaction-Small is currently available only to select partners, with a public release expected later this year.