Fine-tuning datasets, forged locally

Build better
training data.

DataForge Studio is a workbench for LLM fine-tuning datasets. Import your raw data, edit it in a proper grid, check its quality, generate synthetic examples with your own AI keys, and export training-ready bundles. No server, no account. Everything stays in your browser.

Open the studio View source

free · open source · nothing leaves your machine

The DataForge Studio workbench: a dataset grid with the conversation inspector open on a tool-call example

Import anything

JSONL, CSV, Parquet, Excel, PDF or Word, or stream a dataset straight from Hugging Face. Alpaca, ShareGPT and OpenAI formats are detected automatically.

Every dataset type that matters

Chat SFT with tool calls and reasoning traces, DPO preference pairs, KTO feedback and RL prompts with verifiable answers. Traces are stored as fields and rendered per model, <think> tags included.

Quality you can act on

Seventeen checks, one-click cleaning, near-duplicate detection and benchmark contamination screening. Scores render as forge heat: cold steel to molten amber.

Your keys, your data

Bring your own OpenAI, Anthropic, Gemini, OpenRouter, Groq or Ollama key for synthetic generation and enhancement. Keys live in your browser. Nothing is uploaded, ever.

ImportDrop a file or pull from the Hub

RefineEdit, clean, dedup, generate

ExportAxolotl, TRL, Unsloth and more

Your next dataset, ready this afternoon.

Open the studio