An AI-powered database offers a model for extracting and structuring police records for public accessibility and ...
Abstract: We introduce RoadSocial, a large-scale, diverse VideoQA dataset tailored for generic road event understanding from social media narratives. Unlike existing datasets limited by regional bias, ...
Abstract: Recent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have sparked significant interest in developing GUI visual agents. We introduce MONDAY (Mobile OS ...
The dataset, code, model, and benchmark are currently under review. Please stay tuned. The quality and diversity of instruction-based image editing datasets are continuously increasing, yet ...
This repository provides a PyTorch implementation of Unified World Model (UWM). UWM combines action diffusion and video diffusion to enable scalable pretraining on ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results