Donated by Swarm & Bee → OpenDiabetic

Open, deed-backed diabetic data.
Donated to builders.

Built by the hive, cooked on our own sovereign compute, and donated to the community by Swarm & Bee — for better outcomes. Don't just download data — download proof. Our flagship sets ship with a notarized title deed: which model wrote it, a two-judge tribunal score, and a Merkle root. And every set is crystal clear — schema, real sample rows, size, and SHA-256, all visible before you cook. Open, no charge, no strings.

No PHI · education & synthetic instruction sets only · never patient records

110datasets in the catalog
3.3M+cited pairs across the catalog
6deed-anchored sets
$0donated · no charge

Honest counts: 110 distinct downloadable files. The headline number is the sum across the catalog — some medical/CRE verticals include both a superset and specialty cuts of the same content, so the unique total is lower (the diabetic-medical core is 417,196 unique). Each file lists its real row count + SHA-256; verify any of it yourself. Counts read from catalog.json.

⌨️ Off the shelf, into your pipeline

No portal, no signup, no API key. Every full set is a public, deed-backed URL — pull it with curl, verify the hash yourself, load it in one line.

# 1 · download any full set (each dataset card shows its exact URL + SHA-256)
curl -L -o medical-cardiology.jsonl https://diabeticdatasets.com/dl/medical-cardiology

# 2 · verify it yourself — this MUST match the published SHA-256
sha256sum medical-cardiology-full.jsonl

# 3 · load it and fine-tune (Hugging Face datasets)
from datasets import load_dataset
ds = load_dataset("json", data_files="medical-cardiology-full.jsonl", split="train")

Open any dataset in the catalog below → Use it in your pipeline for its exact one-liner + load code.

🖥️ Need compute to cook them? Fine-tuning needs GPUs — and we rent our own sovereign fleet on Vast.ai. Renting our rigs (or using our referral) funds OpenDiabetic's give-engine: compute income recycles into free help for diabetics — the same loop that cooked these datasets.
swarmrails2× RTX PRO 6000 · 194GBID 141205
smashRTX 5090ID 84859
defendableRTX 5090ID 136859

Our rigs cook 24/7 — live availability & pricing on Vast.ai (swarmrails is mid-cook right now).

Rent on Vast.ai →

The flagship sets ship with a title deed

Provenance you can verify, not a "trust us." Each deed-backed pair carries five proofs.

① OriginWhich Swarm model generated it, on what hardware, with which strategy.
② QualityA two-judge tribunal scores it with written reasoning — math and claims checked.
③ ProcessAttempts, generation time, and the full cook record.
④ EconomicsEnergy and cost to produce — honest unit economics.
⑤ TrustMerkle root over the set — tamper-evident, recomputable from the rows.

Not every set is deed-backed — the cited, education-only sets (foot care, insulin, nutrition) are labeled open · OpenDiabetic original, with their source URLs in the rows. We never call a thing more than it is.

🏆 The capstone sets

The flagship drops we're proud of — the heart of the gold.

🔐 Verify it yourself

Crystal-clear isn't "trust us." Pick a dataset, choose the file you downloaded, and your own browser recomputes its SHA-256 and checks it against the published hash. The file never leaves your machine.

published sha256:

computed sha256:

Choose a dataset and a file to verify.

Prefer the command line?
sha256sum your-file.jsonl
# compare the output to the published sha256 above

The full catalog

Every set: open the card to see schema, real sample rows, size, and SHA-256 before you download.

These were donated. If they help you, pay it forward.

Built by the hive, cooked on our own sovereign compute, and given to the community for better diabetic outcomes — no charge, and never will be. But if these datasets move your work forward, you can fund the next ones — and a ride to the doctor, diabetic shoes, or a home vault for someone who needs it. Only if you want to.

💛 Make a donation — diabeticdonation.com