object detection Archives

Winsage

April 25, 2025

Microsoft touts AI Dev Gallery for Windows

Microsoft has launched the AI Dev Gallery, an open-source application for Windows developers aimed at integrating AI functionalities into projects. Initially introduced as a concept in December 2024, it was officially showcased on April 22. The platform provides resources such as sample applications, model downloads, and exportable source code, and is available for download in preview format from the Microsoft Store. Key features include the ability to experiment with AI applications offline and a variety of interactive samples, including Retrieval-Augmented Generation, chat interfaces, object detection, text-to-speech/speech-to-text conversion, and document summarization and analysis, all designed to run locally on developers' machines.

AppWizard

April 2, 2025

Google made a dashcam app for cars with Android Automotive, but you can’t download it

Google has launched a dashcam application for Android Automotive that allows vehicles with built-in cameras to record their surroundings. This open-source app can be integrated by car manufacturers, addressing the gap where many vehicles lack a native dashcam feature. The app is designed to work with existing camera hardware and requires system-level permissions for integration into the vehicle's operating system. Recordings are stored in the vehicle's internal storage, but Google recommends saving them on external removable storage to reduce wear on internal components. Manufacturers can customize various parameters of the app, including storage allocation and user interface.

AppWizard

October 28, 2024

ROCKET-1 mines diamonds in Minecraft by seeing and tracking objects in real time

A team of researchers has developed ROCKET-1, a method to improve AI agents' precision in virtual environments like Minecraft by combining object detection and tracking with advanced AI models. The technique "Visual-temporal context prompting" enhances interaction capabilities without relying on traditional language or diffusion models. ROCKET-1 operates with a hierarchical structure that includes GPT-4o for planning, Molmo for object identification, and SAM-2 for real-time tracking and masking of objects. The system was trained using OpenAI's "Contractor" dataset, which contains 1.6 billion images of human gameplay. SAM-2 analyzes gameplay in reverse to identify and mark objects interacted with by players. ROCKET-1 has demonstrated high success rates in various tasks, achieving up to 100 percent success in seven tasks, although it struggles with objects outside its field of view or those not previously encountered, requiring increased computational effort.