allenai/tulu-v2-sft-mixture
databricks/databricks-dolly-15k
HuggingFaceH4/ultrachat_200k
knkarthick/dialogsum
MaziyarPanahi/WizardLM_evol_instruct_V2_196k
openbmb/UltraFeedback
Open-Orca/OpenOrca
shibing624/medical
shareAI/CodeChat
- huggingfaceHub: https://huggingface.co/datasets/shareAI/CodeChat
- 简介: 主要包含逻辑推理、代码问答、代码生成相关语料样本。
shareAI/ShareGPT-Chinese-English-90k
Huggingfacehub: https://huggingface.co/datasets/shareAI/ShareGPT-Chinese-English-90k
简介:中英文平行双语优质人机问答数据集,覆盖真实复杂场景下的用户提问。
规模:90 K
stingning/ultrachat
stack-exchange-paired
YeungNLP/ultrachat_200k
Huggingfacehub: https://huggingface.co/datasets/YeungNLP/ultrachat_200k
简介: 由Zephyr项目开源的英文指令微调数据,在ultrachat数据基础上进行清洗
YeungNLP/WizardLM_evol_instruct_V2_143k
- huggingfaceHub: https://huggingface.co/datasets/YeungNLP/WizardLM_evol_instruct_V2_143k
- 简介:由WizardLM项目开源的英文指令微调数据集,通过Evol-Instruct方法让指令进化,加强指令的复杂度,以提升模型对复杂指令的遵循能力。包含143k条数据。