Build better
training data.
DataForge Studio is a workbench for LLM fine-tuning datasets. Import your raw data, edit it in a proper grid, check its quality, generate synthetic examples with your own AI keys, and export training-ready bundles. No server, no account. Everything stays in your browser.
Import anything
JSONL, CSV, Parquet, Excel, PDF or Word, or stream a dataset straight from Hugging Face. Alpaca, ShareGPT and OpenAI formats are detected automatically.
Every dataset type that matters
Chat SFT with tool calls and reasoning traces, DPO preference pairs, KTO feedback and RL prompts with verifiable answers. Traces are stored as fields and rendered per model, <think> tags included.
Quality you can act on
Seventeen checks, one-click cleaning, near-duplicate detection and benchmark contamination screening. Scores render as forge heat: cold steel to molten amber.
Your keys, your data
Bring your own OpenAI, Anthropic, Gemini, OpenRouter, Groq or Ollama key for synthetic generation and enhancement. Keys live in your browser. Nothing is uploaded, ever.