- Dr. CliniData
- Posts
- From XPT to Dataset-JSON → and a Python tool to make it easier
From XPT to Dataset-JSON → and a Python tool to make it easier
Transform Clinical Datasets into CDISC Dataset-JSON v1.1 Effortlessly with Python
In Partnership with
This post is powered by the open source dsjson
package, a lightweight Python package to convert clinical tabular datasets (e.g., SDTM/ADaM) and metadata into CDISC Dataset-JSON v1.1 format.


If you’ve ever worked with regulatory submissions, you know the pain of SAS XPT files — 8-char variable limits, no Unicode, clunky metadata. They’ve been around since 1989 and, frankly, they don’t fit today’s data needs.
That’s why CDISC introduced Dataset-JSON. Smaller, lighter, self-describing, and future-proof for API-driven pipelines. It’s not just a new file type — it’s the bridge to modern clinical data exchange.
When I first dug into Dataset-JSON, I realized creating these files by hand was painful — lining up data, adding metadata, double-checking conformity. That inspired me to build something better: a Python package, dsjson
, that takes your SDTM/ADaM data + metadata and turns it directly into a valid Dataset-JSON v1.1 file.
from dsjson import load_metadata, to_dataset_json
import pandas as pd
rows = pd.read_csv("examples/vs.csv")
columns = load_metadata("examples/columns_vs.csv", file_type="csv")
ds = to_dataset_json(rows, columns, dataset_name="VS", dataset_label="Vital Signs")
In my latest blog, I break down:
Why Dataset-JSON matters and how it’s replacing XPT
The dev journey behind building
dsjson
What the package can do today (and where it’s headed)


Reply