• Dr. CliniData
  • Posts
  • From XPT to Dataset-JSON → and a Python tool to make it easier

From XPT to Dataset-JSON → and a Python tool to make it easier

Transform Clinical Datasets into CDISC Dataset-JSON v1.1 Effortlessly with Python

In Partnership with

This post is powered by the open source dsjson package, a lightweight Python package to convert clinical tabular datasets (e.g., SDTM/ADaM) and metadata into CDISC Dataset-JSON v1.1 format.

If you’ve ever worked with regulatory submissions, you know the pain of SAS XPT files — 8-char variable limits, no Unicode, clunky metadata. They’ve been around since 1989 and, frankly, they don’t fit today’s data needs.

That’s why CDISC introduced Dataset-JSON. Smaller, lighter, self-describing, and future-proof for API-driven pipelines. It’s not just a new file type — it’s the bridge to modern clinical data exchange.

When I first dug into Dataset-JSON, I realized creating these files by hand was painful — lining up data, adding metadata, double-checking conformity. That inspired me to build something better: a Python package, dsjson, that takes your SDTM/ADaM data + metadata and turns it directly into a valid Dataset-JSON v1.1 file.

from dsjson import load_metadata, to_dataset_json
import pandas as pd
rows = pd.read_csv("examples/vs.csv")
columns = load_metadata("examples/columns_vs.csv", file_type="csv")
ds = to_dataset_json(rows, columns, dataset_name="VS", dataset_label="Vital Signs")

In my latest blog, I break down:

  • Why Dataset-JSON matters and how it’s replacing XPT

  • The dev journey behind building dsjson

  • What the package can do today (and where it’s headed)

Got opinions? Click the button and share them. We totally can't wait to hear all about it! 👇

THANKS FOR SURVIVING! SEE YOU NEXT WEEK!

Reply

or to participate.