← All articles

Lacuna: Missingness as Signal

11 June 2026

A clinical trial recruits a hundred patients and follows them for a year. At the end, thirty have dropped out. The analysis proceeds on the seventy who remained.

Consider who the thirty are. A patient withdrawing from a Parkinson's trial because their symptoms are worsening. A participant in an antidepressant study who stops attending because the drug is not helping. An elderly subject in a cognitive decline study who does not return because they can no longer manage the journey. In each case the absence is not incidental to the outcome — it is driven by it. The patients who vanish are often precisely the ones whose outcomes matter most, and the standard analytical machinery either ignores this or assumes it away, biasing any downstream analysis.

This is not only a trial problem. A wearable device monitoring cardiac rhythm collects less data from the patients who are most symptomatic, because they are the ones who remove it, forget to charge it, or are admitted to hospital. A diagnostic algorithm trained on patients who completed a full medical assessment has never seen the ones who were too unwell to attend, or too well to be referred. A health system tracking readmission rates builds its models on the patients who came back; the ones who died at home or transferred elsewhere are absent from the training process. Wherever healthcare generates data, the sickest, the most vulnerable, and the most difficult to reach are the ones most likely to be missing from it. The missingness is the signal, and it is almost universally ignored. Lacuna is our method to extract that signal.

Who is missing, and why

The statistical framework for thinking about missingness was set down by Rubin in 19761 and has not changed since. It distinguishes three regimes:

  1. Missing completely at random (MCAR): The missingness has nothing to do with the patient or the outcome — it is pure noise, independent of everything. A blood sample is lost in transit. A sensor malfunctions on a Tuesday. This is the benign case and the rarest in practice. Very little clinical data goes missing for reasons unrelated to the patient.
  2. Missing at random (MAR): A phrase more treacherous than it sounds. It does not mean the missingness is random. It means that, conditional on what you have observed, the missingness is independent of what you have not. A patient with more severe baseline symptoms may be more likely to drop out, but if you have measured that baseline severity, you can adjust for it. The missingness is explained by the observed record. This is the assumption that nearly all standard methods make: multiple imputation, mixed models, the default settings of every statistical package used in clinical research and every pipeline processing wearable data. It is also unverifiable. You cannot check from the data whether the observed variables fully explain who is missing, because the thing you would need to check against — the unobserved outcome — is precisely what you do not have.
  3. Missing not at random (MNAR): The probability of being missing depends on the outcome that would have been observed, even after conditioning on everything measured. The patient dropped out because they were getting worse. The device was removed because the symptoms made wearing it intolerable. The clinic visit was missed because the disease progressed beyond the point where attending was feasible. No baseline covariate captures the extent of that worsening. This is the regime that matters most in healthcare, and the one hardest to handle.

The quiet default

The overwhelming majority of analyses — in trials, in digital health, in clinical decision support, in real-world evidence — assume missing at random. Not because anyone believes it, but because it is tractable. Under MAR, the missing-data mechanism can be ignored: fit the model on observed data, perhaps with imputation, and the mathematics guarantees validity provided the assumption holds.

Convention has calcified around that proviso. The assumption rarely holds, but it is standard, and standard is easy to defend. In many settings, particularly in wearables and remote monitoring, missingness is not even formally addressed: the incomplete records are simply dropped, and the analysis proceeds on what remains. This is enforced MCAR by omission, biasing results in any setting where the true mechanism is MAR or MNAR — which is to say, almost always.

The cost runs in a specific direction. If patients with worse outcomes are more likely to be missing, any analysis that treats that missingness as ignorable — whether by assuming MAR or by dropping records entirely — will systematically overestimate how well an intervention works, how healthy a population is, how slowly a disease progresses, and how reliably a device captures the patients who need it most. The error is not random. It is structural, and it flatters.

A predictive model trained on data where the sickest patients have been quietly removed learns a distorted version of the disease. A trial whose primary analysis ignores informative dropout may report an effect size that would not survive honest accounting for who left. A wearable-derived biomarker validated on adherent users will perform differently in the patients it will actually be deployed to — those absent from the training data. A diagnostic model built on patients who completed the full pathway has never learned from the ones who fell out of it, and those are often the ones for whom the diagnosis matters most.

And the problem compounds. Missing data is not a one-time correction at the end of an analysis; it shapes the inputs to every downstream model. The representation, the imputation, the evaluation — each inherits the bias of the last. By the time a result is reported, the original distortion is buried beneath layers of apparently rigorous methodology that never questioned its foundation.

What the data can and cannot tell you

If MAR is unverifiable and likely wrong, the honest response is not to assume it anyway. It is to ask: how wrong would the assumption need to be before the conclusion breaks?

This is the logic of sensitivity analysis. In regulated settings, the ICH E9(R1) addendum2 explicitly asks that the impact of missing data on primary inference be assessed. Outside of trials — in wearables, in remote monitoring, in health system analytics — the requirement is rarely formalised, but the need is identical and arguably more acute, because the missingness rates are often higher and less well characterised.

The tools in common use have real limitations. Reference-based imputation encodes a specific clinical story that is natural for some trial designs and implausible for others, without making the strength of the assumption transparent. The location-shift tipping point works for continuous outcomes but is incoherent for binary endpoints, where a shift can push a probability outside its natural bounds, and does not extend to regression coefficients or survival quantities. Most real-world evidence and digital health pipelines have no sensitivity framework at all; they assume MAR or discard incomplete records, and the downstream models inherit the consequences.

The distinction that matters is between what is identifiable and what is not. Some of the missingness in a dataset can be explained by observed covariates: age, baseline severity, treatment arm, device type, site. This is the MAR component, and it is estimable. The rest, the part that depends on the unobserved outcome itself, is structurally inaccessible. No quantity of data will identify it. It must be the subject of an assumption, and that assumption should be stated, varied, and stress-tested rather than fixed and forgotten.

Two stages

Lacuna’s architecture follows from that distinction. First, exhaust what the data can tell you: adjust for every identifiable source of differential missingness so that the observed patients, after correction, represent the full population in their measured characteristics. Second, probe what the data cannot tell you: introduce a parameter that encodes how much the missing patients might differ from the observed in terms of their outcome, after the measured characteristics have been accounted for.

The separation keeps the sensitivity parameter honest. Skip the first stage and the parameter absorbs both the identifiable bias and the unidentifiable bias; the result overstates fragility, because part of what looks like informative missingness is really covariate imbalance that could have been corrected. Exhaust the identifiable component first, and the sensitivity parameter governs only the variation that the observed data genuinely cannot explain. That is the proper subject of a sensitivity analysis: the part that requires an assumption, not the part that requires a calculation.

The tipping point

The central output is the tipping point: the minimum strength of the MNAR mechanism needed to overturn the conclusion.

For a continuous outcome, this translates to an implied mean difference between missing and observed patients, in units of the observed standard deviation. For a binary outcome, an odds ratio: how much more likely would the event need to be among the missing for the result to reverse? For ordinal outcomes, a proportional odds ratio. For survival endpoints, an implied hazard ratio between censored and observed.

In each case the statistical quantity becomes a clinical question. Is the implied difference between missing and observed patients plausible, given the disease, the population, and the reasons people disappear from the data? A tipping point requiring an implausible mechanism — missing patients dramatically, unrealistically different from those observed — is a robust result. A tipping point within the range of clinically expected missingness behaviour is a fragile one, and should be reported as such.

The judgement stays with the domain expert, where it belongs. The framework supplies the translation.

Our method

The reason this matters for what we build at TimeTrace is structural. Our work is the measurement of biological change over time from noisy, irregular health data — across trials, monitoring, diagnostics, and real-world evidence. If the missingness is informative — if the absent data carries systematic information about the outcome — then any model trained on what remains has learned from a biased sample of the biology. The representation is shaped by who showed up, not by who exists. Correcting for this is not a downstream cleanup; it is an upstream requirement, part of the measurement itself.

Lacuna is integrated into the TimeTrace platform as the module that handles this. It is the first module engaged and operates before any data reaches our models or mathematical frameworks, ensuring that what they learn from has not been silently filtered by the process the model is trying to measure. Like the other components of our system, it is built from the mathematics of the problem, because the structure of clinical missingness — its causes, its direction, its entanglement with the outcome — demands a method designed around it.

Missing data is not a gap in the record. It is a statement about the record, and reading it correctly is part of building measurements that work.

1. Rubin, D.B., 1976. Inference and missing data. Biometrika, 63(3), pp. 581–592.

2. ICH E9(R1): Addendum on Estimands and Sensitivity Analysis in Clinical Trials. International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use, 2019.

Please cite this work as:

TimeTrace Labs. Lacuna: Missingness as Signal. TimeTrace Labs Blog, June 2026.

BibTeX:

@article{timetrace2026lacuna,
author = {{TimeTrace Labs}},
title = {Lacuna: Missingness as Signal},
journal = {TimeTrace Labs Blog},
year = {2026},
url = {https://www.timetracelabs.com/blog/lacuna}
}