1 result found Sort:

VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 Vision-Language Model. Includes a Gradio-based interface for que...
Created 2024-09-13
10 commits to main branch, last one 2 months ago