Results of Analysis of SNOMED CT Extensions
Introduction
In January 2017, the Content Managers Advisory Group (CMAG) initiated an action to conduct a survey of the national extensions that were available, to date there has been limited information available - to other members and presumably SNOMED International. The responses of this survey are available here. The results shows a variety of extensions are produced, ranging from subset/refset, language translations and clinical content development. Subsequent activities investigating collaboration on subset may be pursued, but the CMAG was also interested in exactly what clinical content was in each extension, and how this might be shared. There should be very little clinical content that is exclusive to any country, and if it has been developed by one extension - it's likely globally relevant, and sharing it can reduce duplicated effort and maintenance burden of extension builders.
The results described here are a result of a combination of objective metrics (size of content), crude identifiction of duplicated effort, and finally some incidental quality observations. Further analysis of the content is still underway using description logic techniques, the results of which will be made available separately, at a later date.
SQL snippets are included in the document for future reference by author, but will unlikely be useful to public readership.
A summary of the results is available in the conclusion section at the end of this paper.
The cooperation of all Members is appreciated, and whilst all effort has been made to represent the extensions accurately, any inaccuracies are accidental.
Summary of Extensions
14 NRC responded to the survey, with 9 indicating they created clinical content extensions.
The Australian Edition also includes it's national drug extension, which has been excluded from this round of analysis (as no other extension appeared to include such content)
All extensions were based upon the July 2016 international release except one. This exception may produce some anomalies, but they are limited to the extension.
A raw analysis of the active concepts within an extension.
Ratio of active to inactive concepts
NRC | Proportion currently active |
|---|---|
SNOMED CT Netherlands NRC maintained module | 95.3% |
US National Library of Medicine maintained module | 93.3% |
módulo de la extensión de Uruguay | 95.6% |
Canada Health Infoway English module | 68.3% |
SNOMED CT Sweden NRC maintained module | 99.6% |
Australian common model component extension | 33.3% |
Danish module | 88.4% |
SNOMED Clinical Terms Australian extension | 97.0% |
SNOMED CT United Kingdom clinical extension module | 32.9% |
All analysis was only performed on active content.
Extension changes against International Concept IDs
A total of 40 core concepts have been modified by extensions in some way.
Two were retired by an extension
384612007|pT4a: Tumor directly invades other organs or structures (colon/rectum) (finding)|
384613002|pT4b: Tumor penetrates visceral peritoneum (colon/rectum) (finding)|
(A third concept was retired, but later reactivated)
One concept had a change to definition status (marked Defined) by an extension
399733007|Excision of retroperitoneal lymph node (procedure)|
Eight of these appear to be an attempt to address issues within the module assignment in the international release. (i.e. Concept inactivated on a different module to what they were created. metadata vs core).
The remainder are simply changes to moduleId, and either represent content promotion from an extension to the International. Or a possible error.
select id,count(*) from X_Concepts
where id in (246089008,246221002,260670006,263512003,263513008,447564002,449609005,700043003,11000119105,41000179103,441000119109,601000119109,1111000119100,1561000119105,4181000179103,4191000179101,4201000179104,4211000179102,4221000179107,4231000179109,4241000179101,4251000179103,4261000179100,4271000179106,4281000179108,4301000179109,4311000179106,4321000179101,4331000179104,4341000179107,4351000179105,5461000179100,5471000179106,5481000179108,5491000179105,5531000179105)
and moduleId != 161771000036108
group by id
having count(distinct moduleId) > 1Extension Concepts
There appears to be around 52 unique semantic tags across the extension content. many of these are attributable to translations. Not all extensions provide english FSNs for extension content1, semantic tags were manually translated and merged.
After normalisation, this comes to 32 semantic tags. The distribution of content is shown below.
9 Modules are in use across the extensions.
ModuleId | FSN | Country |
|---|---|---|
11000146104 | SNOMED CT Netherlands NRC maintained module | NL |
731000124108 | US National Library of Medicine maintained module | US |
5631000179106 | módulo de la extensión de Uruguay | UY |
20621000087109 | Canada Health Infoway English module | CA |
45991000052106 | SNOMED CT Sweden NRC maintained module | SE |
161771000036108 | Australian common model component extension | AU |
554471000005108 | Danish module | DK |
32506021000036107 | SNOMED Clinical Terms Australian extension | AU |
999000011000000103 | SNOMED CT United Kingdom clinical extension module | UK |
The type of content by hierarchy
Each Top level hierarchy reviewed below for extension content.
Duplicates were found by comparing terms across extensions within given hierarchy. For example, "Look for duplicate terms within the procedure hierarchy". Duplicates within a module were also ignored.
Analysis was done on the complete aggregate of extensions plus the (International Core).
The presence of duplication may indicate:
Extension concepts also in the core, either before or after.
Those where the concept appears in the International release after it's creation in an extension represent a maintenance burden for NRC's in the absence of a promotion process.
At least two countries producing similar, if not same, content. Which would suggest it's not necessarily country specific content.
Initial analysis is agnostic of description types, however analysis was further performed on just FSNs to increase likelihood of duplicate detection.
A major limitation in the approach used is that translations will (almost) be inherently unique, so comparison is dependent on English terms.
It was discovered mid analysis that a setting within the analysis database, may have caused incorrect character renderings however, this is not expected to have consequence on this analysis.
SET @Hierarchy = 404684003;
select term,count(distinct moduleId) from X_Descriptions
where conceptId in (select distinct id from X_Concepts where active)
and moduleId != 900062011000036108 -- exclude AMT module
-- and moduleId not in(900000000000207008,900000000000012004) -- exclude international
and typeId = 900000000000003001
and conceptId in (select sourceId from X_TransitiveClosure where destinationId = @Hierarchy)
and active = 1
group by term
having count(distinct moduleId) > 1;
-- candidates for consideration.
select * from X_Descriptions
-- active descriptions for active concepts
where active and conceptId in (select distinct id from X_Concepts where active)
-- target hierarchy
and conceptId in (select sourceId from X_TransitiveClosure where destinationId = @Hierarchy)
and term in (select distinct term from X_Descriptions
where conceptId in (select distinct id from X_Concepts where active)
and moduleId != 900062011000036108 -- exclude AMT module
-- and moduleId not in(900000000000207008,900000000000012004)
and conceptId in (select sourceId from X_TransitiveClosure where destinationId = @Hierarchy)
and active = 1
group by term
having count(distinct moduleId) > 1);
Clinical finding
Potential Concept Duplication
26 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.
US National Library of Medicine maintained module
SNOMED CT United Kingdom clinical extension module
SNOMED CT Netherlands NRC maintained module
SNOMED Clinical Terms Australian extension
SNOMED CT Sweden NRC maintained module
Danish module
All but the Danish module have some overlap with each other, as well as the international release.
These are the identified FSNs.
For example:
371093006|Urosepsis (disorder)| has descriptions in, the extensions from three countries, that are the same as the 'en' descritpion.
27830001|Brachial radiculitis (disorder)| has translations in two extensions that are different to the 'en', but differ from eachother by the case of the first character.
75049004|Jeune thoracic dystrophy (disorder)| has translations in two extensions that appear identical.
These may have different character encoding or punctuation conventions, or written languages are genuinely similar (Danish and Swedish). A binary (eliminating case differences) compare halved the number of duplicate terms identified. It's unclear (to the author) what the standards and rules are concerning translations - are they complete (all concepts), some (only concepts of interest), as necessary (where word is different).
Procedure
Potential Concept Duplication
16 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.
SNOMED CT Netherlands NRC maintained module
US National Library of Medicine maintained module
SNOMED Clinical Terms Australian extension
SNOMED CT United Kingdom clinical extension module
Canada Health Infoway English module
SNOMED CT Sweden NRC maintained module
These are the identified FSNs.
Special concept
Potential Concept Duplication
15 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.
SNOMED CT Netherlands NRC maintained module
US National Library of Medicine maintained module
SNOMED CT United Kingdom clinical extension module
Canada Health Infoway English module
SNOMED CT core module
Danish module
SNOMED Clinical Terms Australian extension
These are the identified FSNs.The mix of semantic tags in this set, suggest a possible issue with the transitive queries and history of the "aggregate release". Further investigation is required.
Situation with explicit context
Potential Concept Duplication
8 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.
SNOMED CT United Kingdom clinical extension module
SNOMED CT Netherlands NRC maintained module
SNOMED CT Sweden NRC maintained module
US National Library of Medicine maintained module
These are the identified FSNs.
Observable entity
Potential Concept Duplication
There are no FSNs duplicated across the content.
There are 476 duplicate synonyms across this set. The affected concepts are in the following extensions.
SNOMED CT Netherlands NRC maintained module
SNOMED CT core module
Danish module
SNOMED CT Sweden NRC maintained module
SNOMED CT United Kingdom clinical extension module
Event
Potential Concept Duplication
No FSNs are duplicated across the content.
17 synonyms are duplicated, the affected concepts are in the following extensions.
Danish module
SNOMED CT core module
SNOMED CT Sweden NRC maintained module
SNOMED CT Netherlands NRC maintained module
Qualifier value
Potential Concept Duplication
28 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.
SNOMED CT United Kingdom clinical extension module
Canada Health Infoway English module
US National Library of Medicine maintained module
These are the identified FSNs.
Record artifact
Potential Concept Duplication
8 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.
SNOMED CT Netherlands NRC maintained module
SNOMED CT United Kingdom clinical extension module
These are the identified FSNs.
Social context
Potential Concept Duplication
5 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.
US National Library of Medicine maintained module
SNOMED Clinical Terms Australian extension
Canada Health Infoway English module
These are the identified FSNs.
Substance
Potential Concept Duplication
15 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.
US National Library of Medicine maintained module
SNOMED Clinical Terms Australian extension
Canada Health Infoway English module
SNOMED CT core module
SNOMED CT United Kingdom clinical extension module
These are the identified FSNs.
Body structure
Potential Concept Duplication
No FSNs duplicated across the content.
1,888 synonyms are duplicated across the content, the affected concepts are in the following extensions.
Danish module
SNOMED CT Sweden NRC maintained module
SNOMED CT core module
Lithuania
SNOMED Clinical Terms Australian extension
US National Library of Medicine maintained module
SNOMED CT United Kingdom clinical extension module
Staging and scales
Potential Concept Duplication
No FSNs duplicated across the content, which are almost certainly candidates for promotion.
464 synonyms are duplicated across the extensions. The affected concepts are in the following extensions.
Danish module
SNOMED CT core module
SNOMED CT Sweden NRC maintained module
SNOMED CT United Kingdom clinical extension module
Pharmaceutical / biologic product
Potential Concept Duplication
12 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.
Canada Health Infoway English module
US National Library of Medicine maintained module
SNOMED Clinical Terms Australian extension
These are the identified FSNs.There is obviously an issue with the semantic tag and transitive queries. This may be a problem with the analysis or content.
Organism
Potential Concept Duplication
2 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.
US National Library of Medicine maintained module
SNOMED CT core module
Canada Health Infoway English module
These are the identified FSNs.
Environment or geographical location
Potential Concept Duplication
No FSNs duplicated across the content.
353 synonyms are duplicated across the extensions, The affected concepts are in the following extensions.
Danish module
SNOMED CT Sweden NRC maintained module
SNOMED CT core module
Lithuania
Specimen
Potential Concept Duplication
No FSNs duplicated across the content.
Nine synonyms are duplicated. The affected concepts are in the following extensions.
Danish module
SNOMED CT Sweden NRC maintained module
SNOMED CT core module
US National Library of Medicine maintained module
SNOMED CT United Kingdom clinical extension module
Physical object
Potential Concept Duplication
No FSNs duplicated across the conten.
399 synonyms are duplicated across the extensions. The affected concepts are in the following extensions.
Danish module
SNOMED CT Sweden NRC maintained module
SNOMED CT core module
Physical force
Single Concept : U-V radiation in diagnosis NOS (physical force)
Extension Descriptions
Most analysis performed as part of identifying duplicates within concepts. However, below is a summary of the translations - (extension descriptions for core concepts).
Extension Changes to Core Descriptions
178 International descriptions have some modification in an extension. The associated modules are:
Australian common model component extension
SNOMED Clinical Terms Australian extension
US National Library of Medicine maintained module
Relationship Extensions
5,136 core concepts have been changes within an extension. Some of these look like promotions, however the majority do not appear to be.
Note: Some of the numbers comparing stated and inferred look odd, this is likely a result of the crude aggregation of extensions and some of the extension content already having been promoted to core.
Core relationships modified within an Extension
1,997 core relationships where modified by an extension, affecting 384 concepts
A single concept, 425630003|Acute irritant contact dermatitis (disorder)| was modified by two NRCs.
Both inactivated all the relationships, but one recreated them in the subsequent release.
Other changes are summarised below.
Types of Relationships Modified
A large variety (43) of relationship types are involved in the edits, most are IS A, and some are not part of the approved concept model or are attributes specific to an extension.
select count(distinct sourceId) from X_Relationships
where moduleId not in(900000000000207008,900000000000012004,900062011000036108) -- exclude international+AMT
and sourceId in (select id from X_Concepts where moduleId in(900000000000207008,900000000000012004) and active)
group by moduleId;Comparison Examples - Published (inferred) relationships for Core concepts
Concept | Core | Extension |
|---|---|---|
371040005 | ||
321000119108 | Note: This example, appears to be a promoted concept. But the local relationships haven't been inactivated upon promotion. Examples such as this are a use case for promoting both stated and inferred relationships. Such that maintenance burden on NRCs is reduced, and authoring effort recognised. | |
212385001 |
Additional Observations
The following observations are only exemplars of the observations made, and by no means comprehensive.
Extensions vs Editions
Of the 9 releases looked at:
Four publish Editions
Three publish Extensions
One publishes three separate extensions.
One publishes an extension, "bundled" with the International Edition.
File naming
The file naming conventions, do not appear to be consistent across the extensions.
sct2_Concept_Snapshot_AU1000036_20161231.txt
sct2_Concept_Snapshot_en-CanadianExtension_20161031.txt
sct2_Concept_Snapshot_DK1000005_20161130.txt
sct2_Concept_Snapshot_LT1000092_20151107.txt
sct2_Concept_Snapshot_NL_20160930.txt
sct2_Concept_Snapshot_SE1000052_20161130.txt
sct2_Concept_Snapshot_GB1000000_20161001.txt
sct2_Concept_Snapshot_US1000124_20160901.txt
sct2_Concept_Snapshot_es-UruguayExtension_20161215.txt
sct2_Concept_Snapshot_INT_20160731.txt
Directory structure
Some variation was noticed in the the directory structure within the published zip files.
Below are the paths the the snapshot concepts file in each release.
\SnomedCT_Release_AU1000036_20161231\RF2Release\Snapshot\Terminology
\SnomedCT_Canadian_EnglishExtension_Release_20161031\Snapshot\Terminology
\SnomedCT_ManagedServiceDK_Production_DK1000005_20161130\Snapshot\Terminology
\SnomedCT_RF2Release_LT1000092_20151107\Snapshot\Terminology
\SnomedCT_Netherlands_EditionRelease_20160930\Snapshot\Terminology
\SnomedCT_SE_Production_20161130T170000\Snapshot\Terminology
\SnomedCT_RF2Release_GB1000000_20161001\Snapshot\Terminology
\SnomedCT_RF2Release_US1000124_20160901\Snapshot\Terminology
\SnomedCT_Uruguay_Extension_Release_20161215\Snapshot\Terminology
Specific file inclusions
The international release includes 6 files - Concepts, Description, Relationship,StatedRelationship,Identifier and TextDefinition files - within the "Terminology Folder"
The files are not consistently present in extensions.
| Concept | Description | Relationship | StatedRelationship |
|---|
Copyright © 2026, SNOMED International