Datasets for Different Modalities
text
, video
, different widely-applicable concepts, like data formats, how the data was collected or transformed from other data (e.g., see text-to-...
), etc., and general usage guidance like data intended for pretraining
, reinforcement-learning
, chain of thought
, etc.
Keywords
3D Agents Alignment Arrow Arxiv Audio Benchmark Classification Chain Of Thought Chat Crowd Sourced CSV Embeddings Evaluation Fine Tuning Generated Data Feature Extraction Graph Handwritten Image Instruction Following LLM JSON Monolingual Multi Lingual Multimodal Multiple Choice Named Entity Recognition News NLP Planning Pretraining Problem Solving Prompt Question Answering RAG Reasoning Regression Reinforcement Learning Safety Search Security Sentence Similarity Sentence Transformers Sentiment Analysis Speech Summarization Tabular Retrieval Text To … To Text Translation Tutorial Unlearning Video Vision Wikipedia
Datasets for the Modality Keywords
3D (keyword: 3d)↑
This set includes the following additional keywords: depth-estimation, image-to-3d, text-to-3d
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Agents (keyword: agents)↑
This set includes the following additional keywords: agent, downstream-task, downstream-tasks, function-calling, language-agent
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Alignment (keyword: alignment)↑
This set includes the following additional keywords: acceptability-classification, alignment-lab-ai, explainability, fairness, grounding, hallucination, relevance
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Arrow (keyword: arrow)↑
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Arxiv (keyword: arxiv)↑
arxiv:
.)
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Audio (keyword: audio)↑
This set includes the following additional keywords: audio-classification, audio-to-audio, speaker-identification, text-to-audio, voice, voice-activity-detection
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Benchmark (keyword: benchmark)↑
This set includes the following additional keywords: alignment, aveni-bench, benchmarks, gsm8k, mteb, nli, test, testing
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Chain Of Thought (keyword: chain-of-thought)↑
This set includes the following additional keywords: cot
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Chat (keyword: chat)↑
This set includes the following additional keywords: argument, argumentation, chat-dataset, conversation, conversational, conversational-ai, conversations, debate, dialog, dialogue, dialogue-modeling, discussion, fictitious dialogues, multiple-turn-dialogue, roleplay, role-play
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Classification (keyword: classification)↑
This set includes the following additional keywords: acceptability-classification, audio-classification, entity-linking-classification, image-classification, intent-classification, multi-class-classification, multi-class-image-classification, multi-input-text-classification, multi-label-classification, multi-label-image-classification, segmentation, semantic-segmentation, semantic-similarity-classification, semantic-similarity-scoring, sentiment-classification, sentiment-scoring, tabular-classification, tabular-multi-class-classification, tabular-multi-label-classification, text-classification, text-scoring, token classification, token-classification, topic-classification, video-classification, zero-shot-classification, zero-shot-image-classification
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Crowd Sourced (keyword: crowdsourced)↑
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
CSV (keyword: csv)↑
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Embeddings (keyword: embeddings)↑
This set includes the following additional keywords: embedding
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Evaluation (keyword: evaluation)↑
This set includes the following additional keywords: eval, quality
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Feature Extraction (keyword: feature-extraction)↑
This set includes the following additional keywords: image-feature-extraction
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Fine Tuning (keyword: finetuning)↑
This set includes the following additional keywords: finetune, fine-tune, fine-tuning, instruct, instruction-finetuning, instruction-fine-tuning, instruction-following, instruction tuning, instruction-tuning, preference, preferences, sft, structured-fine-tuning
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Generated Data (keyword: generated-data)↑
This set includes the following additional keywords: ai-generated, conditional-text-generation, code-generation, dialog-generation, explanation-generation, generation, generated, expert-generated, machine-generated, ocr, text generation, text-generation, text2text-generation, synthetic, synthetic-captions, synthetic-data, synthetic-dataset, synthgenai
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Graph (keyword: graph)↑
This set includes the following additional keywords: graphs, graph-ml, knowledge graph, knowledge-graph, knowledge graphs, knowledge-graphs
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Handwritten (keyword: handwritten)↑
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Image (keyword: image)↑
This set includes the following additional keywords: anime, chart, caption, danbooru, diagram, geometry-diagram, images, image-captioning, image-captions, image-caption pairs, image-caption-pairs, image classification, image-classification, image-data, image-feature-extraction, image-generation, image-segmentation, image-text-dataset, image-text-to-text, image-to-image, image-to-text, image-to-video, multi-class-image-classification, object detection, object-detection, photo, photos, photograph, photographs, scientific-figure, super-resolution, text-to-image, unconditional-image-generation
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Instruction Following (keyword: instruction-following)↑
This set includes the following additional keywords: instruct, instruction, instruction-finetuning, instruction-fine-tuning, instruction-tuning, multiturn, multi-turn
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
JSON (keyword: json)↑
This set includes the following additional keywords: jsonl
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
LLM (keyword: llm)↑
This set includes the following additional keywords: alpaca, large-language-model, large-language-models, language model, language-modeling, llms, masked-language-modeling
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Monolingual (keyword: monolingual)↑
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Multi Lingual (keyword: multilingual)↑
This set includes the following additional keywords: machine translation, multi-lingual, squad_v2_french_translated, translated
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Multimodal (keyword: multimodal)↑
This set includes the following additional keywords: multimodality, multi-modal, multi-modal-qa
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Multiple Choice (keyword: multiple-choice)↑
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Named Entity Recognition (keyword: named-entity-recognition)↑
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
News (keyword: news)↑
This set includes the following additional keywords: news-articles-summarization
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
NLP (keyword: nlp)↑
This set includes the following additional keywords: explanation, explanation-generation, natural-language-inference, natural-language-processing, natural-language-understanding
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Planning (keyword: planning)↑
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Pretraining (keyword: pretraining)↑
This set includes the following additional keywords: long context, long-context, distillation, pretrain, preservation-loss-training
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Problem Solving (keyword: problem-solving)↑
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Prompt (keyword: prompt)↑
This set includes the following additional keywords: dfp, french prompts, prompts, prompt engineering, prompt-generation
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Question Answering (keyword: question-answering)↑
This set includes the following additional keywords: abstractive-qa, camel, closed-book-qa, closed-domain-qa, document-question-answering, extractive-qa, Figure Q&A, Math Q&A, multiple-choice-qa, multi-modal-qa, open-domain-qa, open-book-qa, q-and-a, qa, qna, q&a, questions, question-generation, table-question-answering, visual-question-answering, vqa
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
RAG (keyword: rag)↑
This set includes the following additional keywords: retrieval augmented generation, retrieval-augmented-generation
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Reasoning (keyword: reasoning)↑
This set includes the following additional keywords: reflection, step-by-step, logical-reasoning, mathematical-reasoning
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Regression (keyword: regression)↑
This set includes the following additional keywords: tabular-regression
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Reinforcement Learning (keyword: reinforcement-learning)↑
This set includes the following additional keywords: dpo, expert trajectory, human-feedback, rl, rlhf, rlaif
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Retrieval (keyword: retrieval)↑
This set includes the following additional keywords: document-retrieval, entity-linking-retrieval, fact-checking, fact-checking-retrieval, information-retrieval, text-retrieval
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Safety (keyword: safety)↑
This set includes the following additional keywords: deepfake, deep-fake, fairness, hallucination, hate-speech, hate-speech-detection, misinformation, red-teaming, toxicity
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Search (keyword: search)↑
This set includes the following additional keywords: codesearchnet, search-queries, semantic-search
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Security (keyword: security)↑
This set includes the following additional keywords: cybersecurity, jailbreak, red-teaming
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Sentence Similarity (keyword: sentence-similarity)↑
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Sentence Transformers (keyword: sentence-transformers)↑
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Sentiment Analysis (keyword: sentiment-analysis)↑
This set includes the following additional keywords: emotion, emotions, sentiment-classification, sentiment, sentiments
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Speech (keyword: speech)↑
This set includes the following additional keywords: automatic-speech-recognition, grammar, hate-speech, hate-speech-detection, linguistics, parts-of-speech, sarcasm-detection, speech-detection, speech-recognition, text-to-speech
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Summarization (keyword: summarization)↑
This set includes the following additional keywords: news-articles-summarization, paraphrase, paraphrase-identification, summary, text-simplification
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Tabular (keyword: tabular)↑
This set includes the following additional keywords: table, table-to-text
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Text To ... (keyword: text-to-...)↑
This set includes the following additional keywords: image-text-to-text, text-to-audio, text-to-image, text-to-speech, text-to-sql, Text to Video, text-to-video, video-text-to-text
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
To Text (keyword: to-text)↑
This set includes the following additional keywords: data-to-text, image-caption pairs, image-caption-pairs, image-text-to-text, image-to-text, table-to-text, video-text-to-text, video-to-text
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Translation (keyword: translation)↑
This set includes the following additional keywords: machine translation, translated
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Tutorial (keyword: tutorial)↑
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Unlearning (keyword: unlearning)↑
This set includes the following additional keywords: tofu
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Video (keyword: video)↑
This set includes the following additional keywords: drone, image-to-video, likert, lvlm, movie, movies, synthetic-captions, Text to Video, text-to-video, video-classification, video-text-to-text, video-to-text, vision-language, vlm, vlms, youtube
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Vision (keyword: vision)↑
This set includes the following additional keywords: computer-vision, computer vision
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.
Wikipedia (keyword: wikipedia)↑
This set includes the following additional keywords: nanodbpedia, extended, wikipedia, wiki, wikidata, wikimedia/wit_base, wikisql
Click a row to see the description. Use the line below the table to resize it. See About These Datasets for important details.