Here is a new brain model for humanoids that can handle multi-agent task planning, spatial reasoning, and closed-loop execution. It features interactive reasoning for long-horizon planning. It supports multi-image, long video, and high-resolution visual inputs. These inputs are fed into a LLM decoder to perform long-chain-of-thought reasoning.
[HT]