Trending repositories for topic nlp
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Bringing BERT into modernity via both architecture changes and scaling
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Large Concept Models: Language modeling in a sentence representation space
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. W...
🚀 An open-source SQL AI (Text-to-SQL) Agent that empowers data, product teams to chat with their data. 🤘
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
Making data higher-quality, juicier, and more digestible for foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
AdalFlow: The library to build & auto-optimize LLM applications.
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
Bringing BERT into modernity via both architecture changes and scaling
Large Concept Models: Language modeling in a sentence representation space
SaprotHub: Making Protein Modeling Accessible to All Biologists
This repo aims to record resource of role-playing abilities in LLMs, including dataset, paper, application, etc.
Steering vectors for transformer language models in Pytorch / Huggingface
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
This is a continuously updated handbook for readers to easily track the latest NL2SQL (Text-to-SQL) techniques in the literature and provide practical guidance for researchers and practitioners.
🚀 An open-source SQL AI (Text-to-SQL) Agent that empowers data, product teams to chat with their data. 🤘
Building an assistant for Boletin Oficial del Estado (BOE) using Retrieval Augmented Generation (RAG)
对豆瓣影评进行文本分类情感分析,利用爬虫豆瓣爬取评论,进行数据清洗,分词,采用BERT、CNN、LSTM等模型进行训练,采用tensorboardX可视化训练过程,自然语言处理项目\A project for text classification, based on torch 1.7.1
A compute framework for building Search, RAG, Recommendations and Analytics over complex structured & unstructured data.
Awesome-llm-role-playing-with-persona: a curated list of resources for large language models for role-playing with assigned personas
Home of the AI workforce - Multi-agent system, AI agents & tools
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Large Concept Models: Language modeling in a sentence representation space
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Bringing BERT into modernity via both architecture changes and scaling
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
🚀 An open-source SQL AI (Text-to-SQL) Agent that empowers data, product teams to chat with their data. 🤘
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. W...
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
Making data higher-quality, juicier, and more digestible for foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
Large Language Model Text Generation Inference
Large Concept Models: Language modeling in a sentence representation space
Bringing BERT into modernity via both architecture changes and scaling
A curated list of LLM researches and applications in education.
SaprotHub: Making Protein Modeling Accessible to All Biologists
A community-driven collection of RAG (Retrieval-Augmented Generation) frameworks, projects, and resources. Contribute and explore the evolving RAG ecosystem.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Clips AI is an open-source Python library that automatically converts long videos into clips.
🚀 An open-source SQL AI (Text-to-SQL) Agent that empowers data, product teams to chat with their data. 🤘
The official implementation of paper "ToolGen: Unified Tool Retrieval and Calling via Generation"
This repo aims to record resource of role-playing abilities in LLMs, including dataset, paper, application, etc.
This is a continuously updated handbook for readers to easily track the latest NL2SQL (Text-to-SQL) techniques in the literature and provide practical guidance for researchers and practitioners.
Use Large Language Models like OpenAI's GPT-3.5 for data annotation and model enhancement. This framework combines human expertise with LLMs, employs Iterative Active Learning for continuous improveme...
PyTorch/HuggingFace Implementation of URLTran: Improving Phishing URL Detection Using Transformers
🚀 A list of Haystack Integrations, maintained by the community or deepset.
Large Concept Models: Language modeling in a sentence representation space
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
🚀 An open-source SQL AI (Text-to-SQL) Agent that empowers data, product teams to chat with their data. 🤘
Large Concept Models: Language modeling in a sentence representation space
500 AI Machine learning Deep learning Computer vision NLP Projects with code
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. W...
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
Large Language Model Text Generation Inference
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
AdalFlow: The library to build & auto-optimize LLM applications.
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
Large Concept Models: Language modeling in a sentence representation space
CEO (ceo-py) is an intuitive and modular AI agent framework for task automation.
Bringing BERT into modernity via both architecture changes and scaling
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Natively pre-trained open-source Portuguese language models.
Монгол үгийн алдаа шалгах толь, Mongolian spellchecking dictionary
VerifAI initiative to build open-source easy-to-deploy generative question-answering engine that can reference and verify answers for correctness (using posteriori model)
A community-driven collection of RAG (Retrieval-Augmented Generation) frameworks, projects, and resources. Contribute and explore the evolving RAG ecosystem.
List of ML conferences with important dates and accepted paper list
Explore a comprehensive collection of resources, tutorials, papers, tools, and best practices for fine-tuning Large Language Models (LLMs). Perfect for ML practitioners and researchers!
A compute framework for building Search, RAG, Recommendations and Analytics over complex structured & unstructured data.
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
🚀 An open-source SQL AI (Text-to-SQL) Agent that empowers data, product teams to chat with their data. 🤘
This is a continuously updated handbook for readers to easily track the latest NL2SQL (Text-to-SQL) techniques in the literature and provide practical guidance for researchers and practitioners.
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
🚀 An open-source SQL AI (Text-to-SQL) Agent that empowers data, product teams to chat with their data. 🤘
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.
Large Concept Models: Language modeling in a sentence representation space
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
500 AI Machine learning Deep learning Computer vision NLP Projects with code
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. W...
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
📺 Discover the latest machine learning / AI courses on YouTube.
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
👮♂️The sensitive word tool for java.(敏感词/违禁词/违法词/脏词。基于 DFA 算法实现的高性能 java 敏感词过滤工具框架。请勿发布涉及政治、广告、营销、翻墙、违反国家法律法规等内容。高性能敏感词检测过滤组件,附带繁体简体互换,支持全角半角互换,汉字转拼音,模糊搜索等功能。)
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
Large Language Model Text Generation Inference
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
中文羊驼大模型三期项目 (Chinese Llama-3 LLMs) developed from Meta Llama 3
Superfast AI decision making and intelligent processing of multi-modal data.
Large Concept Models: Language modeling in a sentence representation space
Text analytics for LLM apps. Cluster messages to detect use cases, outliers, power users. Detect intents and run evals with LLM (OpenAI, MistralAI, Ollama, etc.)
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
ChatGPT at home! Basically a better Google Nest Hub or Amazon Alexa home assistant. Built on the Raspberry Pi using the OpenAI API.
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
A package for parsing PDFs and analyzing their content using LLMs.
Explore a comprehensive collection of resources, tutorials, papers, tools, and best practices for fine-tuning Large Language Models (LLMs). Perfect for ML practitioners and researchers!
A fast and lightweight pure Python library for splitting text into semantically meaningful chunks.
Awesome-llm-role-playing-with-persona: a curated list of resources for large language models for role-playing with assigned personas
A Jax-based library for designing and training transformer models from scratch.
Groqqle is a powerful web search and content summarization tool built with Python, leveraging Groq's LLM API for advanced natural language processing. It offers customizable web and news searches, ima...
1st Place Solution for LLM - Detect AI Generated Text Kaggle Competition