【LLM】大型模型景观

语言 Chinese, Simplified

SEO Title

类别	组件	所有者	闭源或私有	OSS 许可证	商业使用	模型大小(B)	发布日期	代码/论文	Star	Description
Multi-Model	ImageBind	Meta		License	No			Github	5.9k	ImageBind One Embedding Space to Bind Them All
Image	DeepFloyd IF	stability.ai		License Model license				Github	6.4k	text-to-image model with a high degree of photorealism and language understanding
	Stable Diffusion Version 2	stability.ai		MIT, unknown				Github	23.5k	High-Resolution Image Synthesis with Latent Diffusion Models
	DALL-E	OpenAI		Modified MIT	Yes			Github	10.3k	PyTorch package for the discrete VAE used for DALL·E.
	DALL·E 2	OpenAI	Yes					product
	DALLE2-pytorch	lucidrains		MIT	Yes			Github	9.7k	Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
Speech	Whisper	OpenAI		MIT	Yes			Github	37.7k	Robust Speech Recognition via Large-Scale Weak Supervision
Speech	MMS	Meta	Yes						paper
Code model	Codex	OpenAI	Yes			12	2021/7/1	blog	Paper
	AlphaCode					41	Feb 2022			Competition-Level Code Generation with AlphaCode
	starcoder	BigCode	No	Apache		15	May 2023	Github	4.8k	language model (LM) trained on source code and natural language text
	CodeGen	Salesforce	No	?				Github	3.6k	model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
	Replit Code	replit				3	May 2023			replit-code-v1-3b model is a 2.7B LLM trained on 20 languages from the Stack Dedup v1.2 dataset.
	CodeGen2	Salesforce		BSD	Yes	1, 3, 7, 16	May 2023	Github		Code models for program synthesis.
	CodeT5 and CodeT5+	Salesforce		BSD	Yes	16	May 2023	CodeT5		CodeT5 and CodeT5+ models for Code Understanding and Generation from Salesforce Research.
language model	GPT						June 2018	GPT		Improving Language Understanding by Generative Pre-Training
	BERT						Oct 2018	BERT		Bidirectional Encoder Representations from Transformers
	RoBERTa					0.125 - 0.355	July 2019	RoBERTa		A Robustly Optimized BERT Pretraining Approach
	GPT-2					1.5	Nov 2019	GPT-2		Language Models are Unsupervised Multitask Learners
	T5					0.06 - 11	Oct 2019	Flan-T5		Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
	XLNet						Jun 2019	XLNet		Generalized Autoregressive Pretraining for Language Understanding and Generation
	ALBERT					0.235	Sep 2019	ALBERT		A Lite BERT for Self-supervised Learning of Language Representations
	CTRL					1.63	Sep 2019	CTRL		CTRL: A Conditional Transformer Language Model for Controllable Generation
	GPT 3	Azure	Yes			175	May 2020	Paper		Language Models are Few-Shot Learners
	GShard					600	Jun 2020	Paper		GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
	BART						Jul 2020	BART		Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
	mT5					13	Oct 2020	mT5		mT5: A massively multilingual pre-trained text-to-text transformer
	PanGu-α					13	April 2021	PanGu-α		PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
	CPM-2					198	Jun 2021	CPM		CPM-2: Large-scale Cost-effective Pre-trained Language Models
	GPT-J 6B	EleutherAI	No		Yes	6	June 2021	GPT-J-6B		A 6 billion parameter, autoregressive text generation model trained on The Pile.
	ERNIE 3.0	Baidu	Yes			10	July 2021			ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
	Jurassic-1					178	Aug 2021			Jurassic-1: Technical Details and Evaluation
	ERNIE 3.0 Titan					10	July 2021			ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
	HyperCLOVA					82	Sep 2021			What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
	FLAN					137	2021/10/1	Paper		Finetuned Language Models Are Zero-Shot Learners

	GPT 3.5	Azure	Yes
	GPT 4	Azure	Yes				2023/3/1
	ERNIE 3.0	Baidu	Yes			10	2021/7/1	Paper
	Jurassic-1					178	2021/8/1	Paper
	T0					11	Oct 2021	T0		Multitask Prompted Training Enables Zero-Shot Task Generalization
	Yuan 1.0					245	Oct 2021			Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
	WebGPT					175	Dec 2021			WebGPT: Browser-assisted question-answering with human feedback
	Gopher					280	Dec 2021			Scaling Language Models: Methods, Analysis & Insights from Training Gopher
	GLaM					1200	Dec 2021			GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
	LaMDA	Bard	Yes			137	Jan 2022	Paper		LaMDA: Language Models for Dialog Applications
	MT-NLG					530	Jan 2022			Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
	InstructGPT					175	Mar 2022			Training language models to follow instructions with human feedback
	Chinchilla					70	Mar 2022			Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data.
	GPT-NeoX-20B					20	April 2022	GPT-NeoX-20B		GPT-NeoX-20B: An Open-Source Autoregressive Language Model
	Tk-Instruct					11	April 2022	Tk-Instruct-11B		Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
	PALM	Google	Yes			540	April 2022			PaLM: Scaling Language Modeling with Pathways
	OPT	Meta	No		Yes	175	May 2022	OPT-13B, OPT-66B ,Paper		OPT: Open Pre-trained Transformer Language Models
	OPT-IML					30, 175	Dec 2022	OPT-IML		OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
	GLM-130B					130	Oct 2022	GLM-130B		GLM-130B: An Open Bilingual Pre-trained Model
	AlexaTM					20	Aug 2022			AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
	Flan-T5					11	Oct 2022	Flan-T5-xxl		Scaling Instruction-Finetuned Language Models
	Sparrow					70	Sep 2022			Improving alignment of dialogue agents via targeted human judgements
	UL2					20	Oct 2022	UL2, Flan-UL2		UL2: Unifying Language Learning Paradigms
	U-PaLM					540	Oct 2022			Transcending Scaling Laws with 0.1% Extra Compute
	BLOOM	BigScience	Bo		Yes	176	Nov 2022	BLOOM ,Paper		BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
	mT0					13	Nov 2022	mT0-xxl		Crosslingual Generalization through Multitask Finetuning
	Galactica					0.125 - 120	Nov 2022	Galactica		Galactica: A Large Language Model for Science
	ChatGPT						Nov 2022			A model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.
	LLama	Meta	No		No	7, 13, 33, 65	2023/2/1	Paper,LLaMA		LLaMA: Open and Efficient Foundation Language Models
	GPT-4						March 2023
	PanGU-Σ		Yes			1085	2023/3/1			PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
	BloombergGPT					50	March 2023			BloombergGPT: A Large Language Model for Finance
	Cerebras-GPT	Cerebras	No		Yes	0.111 - 13	2023/3	hf		Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
	oasst-sft-1-pythia-12b	LAION-AI	No		Yes	12	2023/3	HF		OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
	Pythia	Eleuthera AI	No		Yes	0.070 - 12	2023/3	Pythia, Paper		A suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters.
	StableLM		No		No	3, 7	April 2023	Github		Stability AI's StableLM series of language models
	Dolly 2.0	DataBricks	No		Yes	3, 7, 12	2023/4	Dolly		An instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.
	DLite					0.124 - 1.5	2023/5	HF		Lightweight instruction following models which exhibit ChatGPT-like interactivity.
	MPT-7B	MosaicML	No	Apache 2	Yes	7	2023/5/5	blog		a GPT-style model, and the first in the MosaicML Foundation Series of models.
	h2oGPT					12	2023/5	HF		h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities.
	LIMA					65	2023/5			A 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling.
	RedPajama-INCITE					3, 7	2023/5	HF		A family of models including base, instruction-tuned & chat models.
	Gorilla					7	2023/5	Gorilla		Gorilla: Large Language Model Connected with Massive APIs
Med-PaLM 2						2023/5			Towards Expert-Level Medical Question Answering with Large Language Models
PaLM 2						2023/5			A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM.
Falcon LLM					7, 40	2023/5	7B, 40B		foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens.

Claude	Anthropic	Yes
GPT-Neo	Eleuthera AI	No		Yes
GPT-Neox	Eleuthera AI	No		Yes	20	2022/2/1	Paper
FastChat-T5-3B	LMSYS	No	Apache	Yes		2023/4/
OpenLLama	openlm-research	No		Yes
OpenChatKit	Together	No		Yes
YaLM	Yandex	No		Yes	100	2022/6/1	Github
ChatGLM-6B	TsingHua	No	ChatGLM-6B	No	6	2023/3/1	Github
Alpaca	Stanford	No		No
Vicuna		No		No	13	2023/3/1	Blog
StableVicuna		No		No
RWKV-4-Raven-7B	BlinkDL	No		No
Alpaca-LoRA	tloen	No		No
Koala	BAIR	No		No	13	2023/4/1	Blog

本文地址

https://architect.pub

登录发表评论
20 次浏览

发布日期

星期六, June 24, 2023 - 08:31

最后修改

星期六, June 24, 2023 - 09:02

最新内容

Content type

Content type