19 results found Sort:
- Filter by Primary Language:
- Python (9)
- TypeScript (3)
- Jupyter Notebook (2)
- Go (1)
- HTML (1)
- JavaScript (1)
- +
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
Created
2023-12-20
30 commits to main branch, last one 11 months ago
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Created
2024-01-26
161 commits to main branch, last one 7 days ago
Control Any Computer Using LLMs.
Created
2024-01-25
166 commits to main branch, last one a day ago
Vision utilities for web interaction agents 👀
Created
2023-11-09
289 commits to main branch, last one 4 months ago
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Created
2025-01-07
8 commits to main branch, last one about a month ago
Lightweight GPT-4 Vision processing over the Webcam
Created
2023-11-07
17 commits to main branch, last one about a year ago
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
Created
2023-09-30
44 commits to main branch, last one about a year ago
Draw your projects to life
Created
2023-11-08
60 commits to main branch, last one about a year ago
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
Created
2024-06-06
3 commits to master branch, last one 7 months ago
Convert different model APIs into the OpenAI API format out of the box.
Created
2023-12-22
52 commits to main branch, last one 12 months ago
GPT-4V in Wonderland: LMMs as Smartphone Agents
Created
2023-11-13
7 commits to main branch, last one 7 months ago
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
Created
2024-01-26
9 commits to main branch, last one about a year ago
The ultimate sketch to code app made using GPT4o serving 25k+ users. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandb...
Created
2023-11-19
95 commits to main branch, last one 9 months ago
Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description
Created
2023-11-06
9 commits to main branch, last one about a year ago
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
Created
2024-05-08
19 commits to master branch, last one 9 months ago
Video Voiceover with gpt-4o-mini
Created
2023-11-12
11 commits to main branch, last one 4 months ago
Monitor the performance of OpenAI's GPT O3 Mini model over time.
Created
2023-11-14
601 commits to main branch, last one 14 hours ago
Mark web pages for use with vision-language models
Created
2024-04-29
87 commits to main branch, last one about a month ago
This repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion...
Created
2024-03-05
66 commits to main branch, last one 3 months ago