Trending repositories for language Python
AnyModal is a Flexible Multimodal Language Model Framework for PyTorch
ASCII generator (image to text, image to image, video to video)
RAG that intelligently adapts to your use case, data, and queries
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker
The official gpt4free repository | various collection of powerful language models
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI ...
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
LLM-powered multiagent persona simulation for imagination enhancement and business insights.
openai-captcha-detection 是一个使用 OpenAI 进行验证码识别的工具。目前验证码识别准确率100%,通过调用 OpenAI 的 API,这个项目可以实现对复杂验证码图片的文本识别,帮助开发者在验证码处理场景中进行自动化操作。
BiomedParse: A Foundation Model for Joint Segmentation, Detection, and Recognition of Biomedical Objects Across Nine Modalities
A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.
Official repository for the Boltz-1 biomolecular interaction model
cuEquivariance is a math library that is a collective of low-level primitives and tensor ops to accelerate widely-used models, like DiffDock, MACE, Allegro and NEQUIP, based on equivariant neural netw...
RAG that intelligently adapts to your use case, data, and queries
A passive recording project allows you to have complete control over your data. 一个完全由你掌控数据的「被动记录」项目。
AnyModal is a Flexible Multimodal Language Model Framework for PyTorch
ASCII generator (image to text, image to image, video to video)
This is a study aim to transfer the single concept by using DIT model self-attention capablity
Bypasses pay-walls and scrapes all the paid content on a creator's page.
ComfyUI API Integration, ComfyUI Automated Workflow, ComfyUI API 一键集成, ComfyUI 自动流程
Welcome to Hercules, the world's first open-source testing agent that's here to lift your testing burdens with the strength of a mythological hero.
AnyModal is a Flexible Multimodal Language Model Framework for PyTorch
MailSecOps is an email and mail gateway security testing tool. With this script, you can perform mail spoofing, relay tests and security checks for a specific domain. The tool also helps to verify SPF...
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
LLM-powered multiagent persona simulation for imagination enhancement and business insights.
ASCII generator (image to text, image to image, video to video)
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
RAG that intelligently adapts to your use case, data, and queries
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.
Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI ...
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
An AI memory layer with short- and long-term storage, semantic clustering, and optional memory decay for context-aware applications.
BiomedParse: A Foundation Model for Joint Segmentation, Detection, and Recognition of Biomedical Objects Across Nine Modalities
A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.
This is a study aim to transfer the single concept by using DIT model self-attention capablity
🎬 卡卡字幕助手 | VideoCaptioner - 基于 LLM 的智能字幕助手,无需GPU一键高质量字幕视频合成!支持生成、断句、优化、翻译全流程。让视频字幕制作简单高效!
Python and TypeScript library for integrating the Stripe API into agentic workflows
Official Implementation for our NeurIPS 2024 paper, "Don't Look Twice: Run-Length Tokenization for Faster Video Transformers".
ComfyUI API Integration, ComfyUI Automated Workflow, ComfyUI API 一键集成, ComfyUI 自动流程
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation
Fortinet Fortimanager Unauthenticated Remote Code Execution AKA FortiJump CVE-2024-47575
Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement 🔥
The first AI agent that builds third-party integrations through reverse engineering platforms' internal APIs.
Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
RAG that intelligently adapts to your use case, data, and queries
first base model for full-duplex conversational audio
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
[NeurIPS 2024🔥] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Automate browser-based workflows with LLMs and Computer Vision
📺IPTV电视直播源更新工具🚀:包含💰央视、📡卫视、☘️广东及各省份地方台、🌊港·澳·台、🎬电影、🎥咪咕、🏀体育、🪁动画、🎮游戏、🎵音乐、🏛经典剧场;支持自定义增加频道;支持组播源、酒店源、订阅源、关键字搜索;每天自动更新两次,结果可用于TVBox等播放软件;支持工作流、Docker(amd64/arm64)、命令行、GUI运行方式 | IPTV live TV sourc...
Auto_Jobs_Applier_AI_Agent by AIHawk is an AI Agent that automates the jobs application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and per...
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
LLM-powered multiagent persona simulation for imagination enhancement and business insights.
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
An open-source RAG-based tool for chatting with your documents.
A natural language interface for computers
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
A flexible framework powered by ComfyUI for generating personalized Nobel Prize images.
Fast and accurate automatic speech recognition (ASR) for edge devices
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
first base model for full-duplex conversational audio
Official code for paper: Chain of Ideas: Revolutionizing Research via Novel Idea Development with LLM Agents
An AI memory layer with short- and long-term storage, semantic clustering, and optional memory decay for context-aware applications.
Free and source-available Apache 2.0 licensed lightweight workflow automation tool.
🎬 卡卡字幕助手 | VideoCaptioner - 基于 LLM 的智能字幕助手,无需GPU一键高质量字幕视频合成!支持生成、断句、优化、翻译全流程。让视频字幕制作简单高效!
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Auto_Jobs_Applier_AI_Agent by AIHawk is an AI Agent that automates the jobs application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and per...
Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. D...
Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
real time face swap and one-click video deepfake with only a single image
An opinionated list of awesome Python frameworks, libraries, software and resources.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
提取微信聊天记录,将其导出成HTML、Word、Excel文档永久保存,对聊天记录进行分析生成年度聊天报告,用聊天数据训练专属于个人的AI聊天助手
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
A collection of learning resources for curious software engineers
Auto_Jobs_Applier_AI_Agent by AIHawk is an AI Agent that automates the jobs application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and per...
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
The #1 open-source voice interface for desktop, mobile, and ESP32 chips.
Collection of awesome LLM apps with RAG using OpenAI, Anthropic, Gemini and opensource models.
YAYI 2 是中科闻歌研发的新一代开源大语言模型,采用了超过 2 万亿 Tokens 的高质量、多语言语料进行预训练。(Repo for YaYi 2 Chinese LLMs)
Start building LLM-empowered multi-agent applications in an easier way.
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.