Statistics for topic vision
RepositoryStats tracks 633,121 Github repositories, of these 207 are tagged with the vision topic. The most common primary language for repositories using this topic is Python (82). Other languages include: Jupyter Notebook (29), Swift (22), C++ (13), TypeScript (11)
Stargazers over time for topic vision
Most starred repositories for topic vision (view more)
Trending repositories for topic vision (view more)
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
Enhanced ChatGPT Clone: Features Agents, DeepSeek, Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message searc...
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
Xray, Penetrates Everything. Also the best v2ray-core. Where the magic happens.
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
In this short tutorial you will learn how to set a camera feed capture in a SwiftUI app.
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
Enhanced ChatGPT Clone: Features Agents, DeepSeek, Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message searc...
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
Xray, Penetrates Everything. Also the best v2ray-core. Where the magic happens.
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
Simulating the Real World: Survey & Resources, which contains our survey "Simulating the Real World: A Unified Survey of Multimodal Generative Models" and Awesome-Text2X-Resources. Watch this reposito...
In this short tutorial you will learn how to set a camera feed capture in a SwiftUI app.
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
Enhanced ChatGPT Clone: Features Agents, DeepSeek, Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message searc...
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
Xray, Penetrates Everything. Also the best v2ray-core. Where the magic happens.
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
Simulating the Real World: Survey & Resources, which contains our survey "Simulating the Real World: A Unified Survey of Multimodal Generative Models" and Awesome-Text2X-Resources. Watch this reposito...
Convert PowerPoint files into semantically rich text using vision language models
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Enhanced ChatGPT Clone: Features Agents, DeepSeek, Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message searc...
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
Automate browser-based workflows with LLMs and Computer Vision
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
Xray, Penetrates Everything. Also the best v2ray-core. Where the magic happens.
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
Create your custom OpenCV algorithms using a user-friendly node editor interface, inspired by Blender and Unreal Engine blueprints! Quickly prototype your vision using live previews as you edit.