Historical Analytics
Description
This page enumerates four approaches to performing analysis on historical data. That is, medical records which contain SNOMED CT codes that were active at the time the record was written, but are inactive by the time the analysis is performed. We discuss the pros and cons of each method and situations where each approach might be most appropriate to each use case.
There is further complexity (thanks @Matt Cordell) in that it's not just the case that a concept might become inactive, but that it might have moved within a hierarchy (or even between hierarchies!) between two releases, meaning that subsumption relationships that previously held true, no longer do. This applies both to the expansion of ECL statements and to the expression constraints themselves - if they feature concepts which may have moved or become inactive. So as well as the results changing, the statement of the question itself might need to evolve over time.
In general for Historical Cases, knowledge of the evolution of SNOMED would be required to form pertinent questions.
Developments in ECL eg the addition of terms will also have impact eg descriptions linked to concepts changing through time.
Use Cases
From a discussion in Kuala Lumpur 2019:, two example use cases: (1) comparing previously represented from various releases → "upgrade" each concept to a current representation, if possible. (2) checking consistency with any guidelines used at the time → do check per release, do not "upgrade". (3) Patient cohort identification 'has this diagnosis' where we want to include patient records that feature earlier equivalents of current concepts.
Approach
Name | Description |
|---|---|
Multiple queries run against successive releases of SNOMED with results collated. | |
Create an enhanced transitive closure table containing inactive concepts at their last known position (but not moving children) | |
Augmented solution checks the position of replacement to determine concepts inactivated due to wrong placement | |
| Forming a superset of concept by running a most modern query on the latest edition and then use the historical associations in reverse to include now inactive concepts. |
| Using a substitution set - a 1:1 map from active to inactive concepts which could work in either direction but can only work where a definite 1:1 relationship exists. |
Update the EHRs to the latest concept using historical associations | |
|
|
Find specific questions that we expect each approach to be able to answer.
Use Cases
Finding cohorts of patients based on some criteria.
Retrospective studies
Forensic analysis - results are required to be the same as they would have been at some particular point in time.
Points to Note
The evolution of concepts will affect the statement of any query as well as the substrate that it is run across. Older queries will potentially need to be updated with newer, replacement concepts. Newer queries may need to be reworked when
Historical Association will have varying levels of appropriateness depending on use can and their nature. 1:1 replacements taken from "SAME AS" historical associations give more confidence than picking replacement concepts via "MAY BE A" or "WAS A" associations.
When running a query against some historical substrate, a check must be performed that all concepts used in the query existed in SNOMED CT at that time. However "existing" is not the same thing as "in use". An additional check may be performed to count the number of concepts which use particular attributes, or that descendant count of a particular concept is roughly in line with expectations.
In general, SNOMED CT improves over time. Most of these approaches are restricted to hierarchical type questions
Option for additional annotations in results to indicate possible dangers or confidence levels.
Historical associations only consider concepts being inactivated. They do not consider concepts changing modeling, or moving location, both of which would have an impact on query results changing over time.
All expressions need to be evaluated and checked in advance of any full set processing eg PCEs and concepts dropping out of Simple Refsets over time in both intensional and extensional definitions.
Failing to provide Historical Associations for inactivating concepts causes significant problems in this work area.
Applying fixes to the historical associations is a more useful solution for other use cases.
EHRs may not record what version (or even Edition) of SNOMED CT was in use for any given record or edit. That said, know additional detail about the timings of the record itself could help set boundaries on particular datasets. It is likely that EHRs will have used an older version of SNOMED than whatever was most current at that time.
Checks for ValueSet membership (eg concept is a member of << Substances) may fail over time. The calling application could consider re-querying the data specifying a version of SNOMED that is more in line with the age of the record. That said, compliance checks are most usefully done at point in time of creation. Checking once the record has been transmitted lends the question of what to do if it fails to comply!
In any given use case, it should be determined if false positives or false negatives are more of a concern. Do we wish to cast a wide net an potentially included inappropriate results but with less risk of missing a case (eg for follow up checks or pharmacovigilance) or visa versa?
Lexical (NLP) techniques can also be considered (potentially used in combination with other techniques) as a way to avoid concerns about modeling and hierarchy locations.
Outstanding Problems
There is no indication when a concept is retired if it was just utterly inappropriate (is there a requirement for additional metadata?)
Concepts that have been inactivated without a historical association.
Use of attributes where modeling has changed over time is a heavy problem. The expressions would need to change over time to track the modelling that was used. Solutions that focus solely on hierarchical relationships are more clearly defined. A hybrid approach of running an initial query over the most recent edition (and current modeling practice) and then augmenting those results using purely hierarchical relationships for historic concepts.
We have generally thought about records hailing from one particular country. In the use case where data is amalgamated from several countries each with their own SNOMED CT Extension, we have yet to consider running ECL (and therefore, necessarily classifying) the SNOMED CT World Wide Super Set.
Further Thoughts for The Future
We may need to add information into previous releases to make some of these solutions workable (EDIT: Like what? What was the line of thought here?)
How do the solutions outlined here need to be modified when either the data, or the query is expressed using Post Coordinated concepts? Could something similar to historical associations be used to link Post Coordinated concepts to their closest equivalents.
The inactivation of a concept itself - especially due to duplication - could be flagged up as a cause for concern for historical queries. For example a historical query that selects << X. If Y is inactivated as a duplicate of X, then perhaps that query should be revisited as it is suggested that - historically - the query should have selected << X OR << Y.
Status | In Progress |
|---|
Modeling Advisory Sub-Group
@Peter Williams
@Brandon Ulrich (Unlicensed)
@Daniel Karlsson (Unlicensed)
@michael lawley
@Former user (Deleted)
@Kin Wah Fung (Unlicensed)
Meeting Recordings
Supplemental Reading
See also presentation on use of Historical Associations here: Management of Concept Inactivation
Copyright © 2025, SNOMED International