Dr. CliniData
Posts
From XPT to Dataset-JSON → and a Python tool to make it easier

From XPT to Dataset-JSON → and a Python tool to make it easier

Transform Clinical Datasets into CDISC Dataset-JSON v1.1 Effortlessly with Python

Dr. CliniData
August 23, 2025

In Partnership with

This post is powered by the open source dsjson package, a lightweight Python package to convert clinical tabular datasets (e.g., SDTM/ADaM) and metadata into CDISC Dataset-JSON v1.1 format.

If you’ve ever worked with regulatory submissions, you know the pain of SAS XPT files — 8-char variable limits, no Unicode, clunky metadata. They’ve been around since 1989 and, frankly, they don’t fit today’s data needs.

That’s why CDISC introduced Dataset-JSON. Smaller, lighter, self-describing, and future-proof for API-driven pipelines. It’s not just a new file type — it’s the bridge to modern clinical data exchange.

When I first dug into Dataset-JSON, I realized creating these files by hand was painful — lining up data, adding metadata, double-checking conformity. That inspired me to build something better: a Python package, dsjson, that takes your SDTM/ADaM data + metadata and turns it directly into a valid Dataset-JSON v1.1 file.

👉 Read my full post on Dataset-JSON here

👉 Read my full post on DSJSON Python Package

from dsjson import load_metadata, to_dataset_json
import pandas as pd
rows = pd.read_csv("examples/vs.csv")
columns = load_metadata("examples/columns_vs.csv", file_type="csv")
ds = to_dataset_json(rows, columns, dataset_name="VS", dataset_label="Vital Signs")