
Meta has announced a new AI model that can detect objects in pictures and videos without needing to be trained on existing material. The model is called “Segment Anything,” and it allows users to select items by clicking on them or using free-form text prompts. For example, typing “cat” will highlight all the felines in a given photo. The technology can also help reconstruct objects in 3D and draw from views from a mixed reality headset.
Segment Anything is compatible with other models and can reduce the need for additional AI training. The model and a dataset are available for download with a non-commercial license, primarily for research purposes. Currently, Meta uses similar technology to moderate banned content, recommend posts, and tag photos.
While the model can handle prompts in real-time, it may miss finer details and is not as accurate at detecting boundaries as some other models. Additionally, it may struggle with demanding image processing. Despite these limitations, models like Segment Anything can help in situations where relying exclusively on training data is impractical. Social networks can use this technology to keep up with a rapidly growing volume of content.
Meta has a history of sharing AI breakthroughs, such as a translator for unwritten languages. The company plans to create generative AI “personas” for its social apps. Segment Anything shows that Meta is a powerhouse in the category, demonstrating the company’s desire to generalize computer vision. This technology can provide Meta with an advantage over tech giants such as Google and Microsoft.
In conclusion, Meta’s “Segment Anything” AI model is a significant advancement in computer vision. The model can detect objects without prior training and can work with other models. While it has limitations, it can help in situations where relying solely on training data is impractical. The model and dataset are available for research purposes, and this demonstrates Meta’s commitment to advancing AI.