Built by the hive, cooked on our own sovereign compute, and donated to the community by Swarm & Bee — for better outcomes. Don't just download data — download proof. Our flagship sets ship with a notarized title deed: which model wrote it, a two-judge tribunal score, and a Merkle root. And every set is crystal clear — schema, real sample rows, size, and SHA-256, all visible before you cook. Open, no charge, no strings.
No PHI · education & synthetic instruction sets only · never patient records
Honest counts: 110 distinct downloadable files. The headline number is the sum across the
catalog — some medical/CRE verticals include both a superset and specialty cuts of the same content, so the
unique total is lower (the diabetic-medical core is 417,196 unique). Each file lists its real row count + SHA-256;
verify any of it yourself. Counts read from catalog.json.
No portal, no signup, no API key. Every full set is a public, deed-backed URL — pull it with
curl, verify the hash yourself, load it in one line.
# 1 · download any full set (each dataset card shows its exact URL + SHA-256)
curl -L -o medical-cardiology.jsonl https://diabeticdatasets.com/dl/medical-cardiology
# 2 · verify it yourself — this MUST match the published SHA-256
sha256sum medical-cardiology-full.jsonl
# 3 · load it and fine-tune (Hugging Face datasets)
from datasets import load_dataset
ds = load_dataset("json", data_files="medical-cardiology-full.jsonl", split="train")
Open any dataset in the catalog below → Use it in your pipeline for its exact one-liner + load code.
ID 141205ID 84859ID 136859Our rigs cook 24/7 — live availability & pricing on Vast.ai (swarmrails is mid-cook right now).
Rent on Vast.ai →Provenance you can verify, not a "trust us." Each deed-backed pair carries five proofs.
Not every set is deed-backed — the cited, education-only sets (foot care, insulin, nutrition) are labeled open · OpenDiabetic original, with their source URLs in the rows. We never call a thing more than it is.
The flagship drops we're proud of — the heart of the gold.
Crystal-clear isn't "trust us." Pick a dataset, choose the file you downloaded, and your own browser recomputes its SHA-256 and checks it against the published hash. The file never leaves your machine.
published sha256: —
computed sha256: —
Choose a dataset and a file to verify.
sha256sum your-file.jsonl
# compare the output to the published sha256 aboveEvery set: open the card to see schema, real sample rows, size, and SHA-256 before you download.
Built by the hive, cooked on our own sovereign compute, and given to the community for better diabetic outcomes — no charge, and never will be. But if these datasets move your work forward, you can fund the next ones — and a ride to the doctor, diabetic shoes, or a home vault for someone who needs it. Only if you want to.
💛 Make a donation — diabeticdonation.com