WeCureUs

Documentation

The WeCureUs query interface returns population statistics over data contributed by people living with multiple sclerosis. This page explains what the dataset contains and how to read the results. The full legal terms are in the Data Use Agreement.

Getting started

First, register and accept the Data Use Agreement. You will receive an API key, shown exactly once. Then open the query builder, enter your API key, and build a query. Your key is held only in your browser session and is sent with each query you submit. The query builder requires a valid API key before any of the data schema or query controls are shown.

Direct API access is also available for programmatic and reproducible use. The query builder is a convenience layer over the same HTTP endpoints, which you can call yourself with your API key as a bearer token. See the API Reference for endpoints, request and response shapes, and curl examples.

What the dataset contains

The dataset is fully intersectional. Every question a participant has answered, every clinical record they have contributed, and every characteristic in their profile can be used to filter any query. There is no fixed list of permitted cross-references. If two things are present in the dataset, you can ask how they relate.

The scale is substantial and growing. Participants answer hundreds of data points across questionnaire modules spanning the dimensions of living with MS: symptoms, daily function, treatment, and the diagnostic experience. They also contribute clinical records directly, including radiology reports, lab results, and ancestry data. New modules and new participants are added over time, so the dataset is broader today than it was at launch and continues to grow.

Three types of data are queryable.

  • Questionnaire responses. Every question across all available modules. Question types are single-select, yes/no, year, numeric integer, and multi-select. Free-text responses are never aggregated or exposed.
  • Contributed records. Findings drawn from records participants contribute themselves: radiology findings (lesion location, contrast enhancement, overall trajectory, lesion count), lab results, and ancestry or genealogical data (including ancient steppe ancestry). Counts over records are always counts of distinct participants, never counts of records.
  • Participant characteristics. Profile dimensions established during enrollment: MS subtype, diagnosis year, birth year, biological sex, gender identity, postal code, and disease-modifying therapy (DMT) status.

Any type can filter any other. A cohort filter is a constraint drawn from any of these three types, and any query can carry one or more of them. You can request the distribution of a fatigue question among participants who contributed a radiology report showing a specific lesion location. You can request a lesion-location distribution among participants who answered a cognitive symptoms question a particular way. You can narrow any of this further by participant characteristics. Multiple filters combine, and each one narrows the cohort. The narrower the cohort, the more likely a result falls below the minimum threshold and is withheld or generalized, as described below.

For the exact request and response shapes, the available module and dimension identifiers, and worked examples in both directions, see the API Reference.

How multi-select counts work

For a multi-select question, each option count is the number of distinct participants who selected that option. A participant who selected several options is counted once in each of those option counts, but only once in the cohort size. Because of this, the option counts can add up to more than the cohort size, and that is expected.

K-anonymity and suppression

Every result is subject to a minimum cohort threshold. No result is returned for a cohort smaller than the threshold (currently five participants). When a whole result is withheld, the response marks it as suppressed and tells you the threshold in effect.

Within a result, any single answer chosen by fewer participants than the threshold is folded into a combined “below threshold” bucket rather than reported on its own. This protects participants who chose rare answers while preserving the rest of the distribution.

Generalization and precision

For some dimensions, when a cohort cannot be reported at exact precision, the system returns a coarser granularity rather than withholding the result entirely. Individual birth years may be reported as five-year or ten-year bands; numeric values may be reported as ranges.

When generalization has been applied, the result includes a precision note such as “Results reported at 5-year band granularity.” Do not treat generalized results as if they were exact-precision data.

Citing WeCureUs data

All published work that uses WeCureUs aggregate data must include attribution. The Data Use Agreement specifies the required language. At minimum, your citation should state that the data is sourced from the WeCureUs platform, is self-reported and not independently verified, and is subject to k-anonymity protections with the minimum cohort threshold in effect at the time of your query.

Suggested form:

Data sourced from the WeCureUs patient-driven health information platform (wecureus.com). Aggregate results are subject to k-anonymity protections with a minimum cohort threshold of [threshold at time of query]. Results represent self-reported participant data and have not been independently verified against clinical records.

See the Data Use Agreement for the full attribution, notification, and limitations requirements.