Results of Analysis of SNOMED CT Extensions

Results of Analysis of SNOMED CT Extensions

Introduction

In January 2017, the Content Managers Advisory Group (CMAG) initiated an action to conduct a survey of the national extensions that were available, to date there has been limited information available - to other members and presumably SNOMED International. The responses of this survey are available here. The results shows a variety of extensions are produced, ranging from subset/refset, language translations and clinical content development. Subsequent activities investigating collaboration on subset may be pursued, but the CMAG was also interested in exactly what clinical content was in each extension, and how this might be shared. There should be very little clinical content that is exclusive to any country, and if it has been developed by one extension - it's likely globally relevant, and sharing it can reduce duplicated effort and maintenance burden of extension builders.

The results described here are a result of a combination of objective metrics (size of content), crude identifiction of duplicated effort, and finally some incidental quality observations. Further analysis of the content is still underway using description logic techniques, the results of which will be made available separately, at a later date.

SQL snippets are included in the document for future reference by author, but will unlikely be useful to public readership.

A summary of the results is available in the conclusion section at the end of this paper.

The cooperation of all Members is appreciated, and whilst all effort has been made to represent the extensions accurately, any inaccuracies are accidental.

Summary of Extensions

14 NRC responded to the survey, with 9 indicating they created clinical content extensions.
The Australian Edition also includes it's national drug extension, which has been excluded from this round of analysis (as no other extension appeared to include such content)
All extensions were based upon the July 2016 international release except one. This exception may produce some anomalies, but they are limited to the extension. 

A raw analysis of the active concepts within an extension.

Ratio of active to inactive concepts

NRC

Proportion currently active

NRC

Proportion currently active

SNOMED CT Netherlands NRC maintained module

95.3%

US National Library of Medicine maintained module

93.3%

módulo de la extensión de Uruguay

95.6%

Canada Health Infoway English module

68.3%

SNOMED CT Sweden NRC maintained module

99.6%

Australian common model component extension

33.3%

Danish module

88.4%

SNOMED Clinical Terms Australian extension

97.0%

SNOMED CT United Kingdom clinical extension module

32.9%

All analysis was only performed on active content.

Extension changes against International Concept IDs

A total of 40 core concepts have been modified by extensions in some way.
Two were retired by an extension

  •  384612007|pT4a: Tumor directly invades other organs or structures (colon/rectum) (finding)|

  • 384613002|pT4b: Tumor penetrates visceral peritoneum (colon/rectum) (finding)|

  • (A third concept was retired, but later reactivated)

One concept had a change to definition status (marked Defined) by an extension

  • 399733007|Excision of retroperitoneal lymph node (procedure)|

Eight of these appear to be an attempt to address issues within the module assignment in the international release. (i.e. Concept inactivated on a different module to what they were created. metadata vs core).

The remainder are simply changes to moduleId, and either represent content promotion from an extension to the International. Or a possible error.

select id,count(*) from X_Concepts where id in (246089008,246221002,260670006,263512003,263513008,447564002,449609005,700043003,11000119105,41000179103,441000119109,601000119109,1111000119100,1561000119105,4181000179103,4191000179101,4201000179104,4211000179102,4221000179107,4231000179109,4241000179101,4251000179103,4261000179100,4271000179106,4281000179108,4301000179109,4311000179106,4321000179101,4331000179104,4341000179107,4351000179105,5461000179100,5471000179106,5481000179108,5491000179105,5531000179105) and moduleId != 161771000036108 group by id having count(distinct moduleId) > 1

Extension Concepts

There appears to be around 52 unique semantic tags across the extension content. many of these are attributable to translations. Not all extensions provide english FSNs for extension content1, semantic tags were manually translated and merged.
After normalisation, this comes to 32 semantic tags. The distribution of content is shown below.

9 Modules are in use across the extensions.

ModuleId

FSN

Country

ModuleId

FSN

Country

11000146104

SNOMED CT Netherlands NRC maintained module

NL

731000124108

US National Library of Medicine maintained module

US

5631000179106

módulo de la extensión de Uruguay

UY

20621000087109

Canada Health Infoway English module

CA

45991000052106

SNOMED CT Sweden NRC maintained module

SE

161771000036108

Australian common model component extension

AU

554471000005108

Danish module

DK

32506021000036107

SNOMED Clinical Terms Australian extension

AU

999000011000000103

SNOMED CT United Kingdom clinical extension module

UK

The type of content by hierarchy

Each Top level hierarchy reviewed below for extension content.
Duplicates were found by comparing terms across extensions within given hierarchy. For example, "Look for duplicate terms within the procedure hierarchy". Duplicates within a module were also ignored.

Analysis was done on the complete aggregate of extensions plus the (International Core).
The presence of duplication may indicate:

  1. Extension concepts also in the core, either before or after.

    • Those where the concept appears in the International release after it's creation in an extension represent a maintenance burden for NRC's in the absence of a promotion process.

  2. At least two countries producing similar, if not same, content. Which would suggest it's not necessarily country specific content.

Initial analysis is agnostic of description types, however analysis was further performed on just FSNs to increase likelihood of duplicate detection.
A major limitation in the approach used is that translations will (almost) be inherently unique, so comparison is dependent on English terms.
It was discovered mid analysis that a setting within the analysis database, may have caused incorrect character renderings however, this is not expected to have consequence on this analysis. 

SET @Hierarchy = 404684003; select term,count(distinct moduleId) from X_Descriptions where conceptId in (select distinct id from X_Concepts where active) and moduleId != 900062011000036108 -- exclude AMT module -- and moduleId not in(900000000000207008,900000000000012004) -- exclude international and typeId = 900000000000003001 and conceptId in (select sourceId from X_TransitiveClosure where destinationId = @Hierarchy) and active = 1 group by term having count(distinct moduleId) > 1; -- candidates for consideration. select * from X_Descriptions -- active descriptions for active concepts where active and conceptId in (select distinct id from X_Concepts where active) -- target hierarchy and conceptId in (select sourceId from X_TransitiveClosure where destinationId = @Hierarchy) and term in (select distinct term from X_Descriptions where conceptId in (select distinct id from X_Concepts where active) and moduleId != 900062011000036108 -- exclude AMT module -- and moduleId not in(900000000000207008,900000000000012004) and conceptId in (select sourceId from X_TransitiveClosure where destinationId = @Hierarchy) and active = 1 group by term having count(distinct moduleId) > 1);

 

Clinical finding

Potential Concept Duplication

26 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • US National Library of Medicine maintained module

  • SNOMED CT United Kingdom clinical extension module

  • SNOMED CT Netherlands NRC maintained module

  • SNOMED Clinical Terms Australian extension

  • SNOMED CT Sweden NRC maintained module

  • Danish module

All but the Danish module have some overlap with each other, as well as the international release.
These are the identified FSNs.

There are 6400 synonyms that are not unique across this set. There appear to be a number of reasons for this, though most seem to relate to translations.

For example:

  • 371093006|Urosepsis (disorder)| has descriptions in, the extensions from three countries, that are the same as the 'en' descritpion.

  • 27830001|Brachial radiculitis (disorder)| has translations in two extensions that are different to the 'en', but differ from eachother by the case of the first character.

  • 75049004|Jeune thoracic dystrophy (disorder)| has translations in two extensions that appear identical.

These may have different character encoding or punctuation conventions, or written languages are genuinely similar (Danish and Swedish). A binary (eliminating case differences) compare halved the number of duplicate terms identified. It's unclear (to the author) what the standards and rules are concerning translations - are they complete (all concepts), some (only concepts of interest), as necessary (where word is different).

Procedure

Potential Concept Duplication

16 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • SNOMED CT Netherlands NRC maintained module

  • US National Library of Medicine maintained module

  • SNOMED Clinical Terms Australian extension

  • SNOMED CT United Kingdom clinical extension module

  • Canada Health Infoway English module

  • SNOMED CT Sweden NRC maintained module

These are the identified FSNs.


Special concept

Potential Concept Duplication

15 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • SNOMED CT Netherlands NRC maintained module

  • US National Library of Medicine maintained module

  • SNOMED CT United Kingdom clinical extension module

  • Canada Health Infoway English module

  • SNOMED CT core module

  • Danish module

  • SNOMED Clinical Terms Australian extension

These are the identified FSNs.

The mix of semantic tags in this set, suggest a possible issue with the transitive queries and history of the "aggregate release". Further investigation is required.


Situation with explicit context

Potential Concept Duplication

8 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • SNOMED CT United Kingdom clinical extension module

  • SNOMED CT Netherlands NRC maintained module

  • SNOMED CT Sweden NRC maintained module

  • US National Library of Medicine maintained module

These are the identified FSNs.


Observable entity

Potential Concept Duplication

There are no FSNs duplicated across the content.
There are 476 duplicate synonyms across this set. The affected concepts are in the following extensions.

  • SNOMED CT Netherlands NRC maintained module

  • SNOMED CT core module

  • Danish module

  • SNOMED CT Sweden NRC maintained module

  • SNOMED CT United Kingdom clinical extension module

Event

Potential Concept Duplication

No FSNs are duplicated across the content.
17 synonyms are duplicated, the affected concepts are in the following extensions.

  • Danish module

  • SNOMED CT core module

  • SNOMED CT Sweden NRC maintained module

  • SNOMED CT Netherlands NRC maintained module

Qualifier value

Potential Concept Duplication

28 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • SNOMED CT United Kingdom clinical extension module

  • Canada Health Infoway English module

  • US National Library of Medicine maintained module

These are the identified FSNs.


Record artifact

Potential Concept Duplication

8 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • SNOMED CT Netherlands NRC maintained module

  • SNOMED CT United Kingdom clinical extension module

These are the identified FSNs.


Social context

Potential Concept Duplication

5 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • US National Library of Medicine maintained module

  • SNOMED Clinical Terms Australian extension

  • Canada Health Infoway English module

These are the identified FSNs.


Substance

Potential Concept Duplication

15 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • US National Library of Medicine maintained module

  • SNOMED Clinical Terms Australian extension

  • Canada Health Infoway English module

  • SNOMED CT core module

  • SNOMED CT United Kingdom clinical extension module

These are the identified FSNs.


Body structure

Potential Concept Duplication

No FSNs duplicated across the content.

1,888 synonyms are duplicated across the content, the affected concepts are in the following extensions.

  • Danish module

  • SNOMED CT Sweden NRC maintained module

  • SNOMED CT core module

  • Lithuania

  • SNOMED Clinical Terms Australian extension

  • US National Library of Medicine maintained module

  • SNOMED CT United Kingdom clinical extension module

Staging and scales

Potential Concept Duplication

No FSNs duplicated across the content, which are almost certainly candidates for promotion.
464 synonyms are duplicated across the extensions. The affected concepts are in the following extensions.

  • Danish module

  • SNOMED CT core module

  • SNOMED CT Sweden NRC maintained module

  • SNOMED CT United Kingdom clinical extension module

 

Pharmaceutical / biologic product

Potential Concept Duplication

12 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • Canada Health Infoway English module

  • US National Library of Medicine maintained module

  • SNOMED Clinical Terms Australian extension

These are the identified FSNs.

There is obviously an issue with the semantic tag and transitive queries. This may be a problem with the analysis or content.

Organism

Potential Concept Duplication

2 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • US National Library of Medicine maintained module

  • SNOMED CT core module

  • Canada Health Infoway English module

These are the identified FSNs.


Environment or geographical location

Potential Concept Duplication

No FSNs duplicated across the content.
353 synonyms are duplicated across the extensions, The affected concepts are in the following extensions.

  • Danish module

  • SNOMED CT Sweden NRC maintained module

  • SNOMED CT core module

  • Lithuania

Specimen

Potential Concept Duplication

No FSNs duplicated across the content.
Nine synonyms are duplicated. The affected concepts are in the following extensions.

  • Danish module

  • SNOMED CT Sweden NRC maintained module

  • SNOMED CT core module

  • US National Library of Medicine maintained module

  • SNOMED CT United Kingdom clinical extension module

Physical object

Potential Concept Duplication

No FSNs duplicated across the conten.
399 synonyms are duplicated across the extensions. The affected concepts are in the following extensions.

  • Danish module

  • SNOMED CT Sweden NRC maintained module

  • SNOMED CT core module

Physical force

Single Concept : U-V radiation in diagnosis NOS (physical force)
 

Extension Descriptions

Most analysis performed as part of identifying duplicates within concepts. However, below is a summary of the translations - (extension descriptions for core concepts).

Extension Changes to Core Descriptions

178 International descriptions have some modification in an extension. The associated modules are:

  • Australian common model component extension

  • SNOMED Clinical Terms Australian extension

  • US National Library of Medicine maintained module

Relationship Extensions

5,136 core concepts have been changes within an extension. Some of these look like promotions, however the majority do not appear to be.
 

Note: Some of the numbers comparing stated and inferred look odd, this is likely a result of the crude aggregation of extensions and some of the extension content already having been promoted to core.

Core relationships modified within an Extension

1,997 core relationships where modified by an extension, affecting 384 concepts

A single concept, 425630003|Acute irritant contact dermatitis (disorder)| was modified by two NRCs.
Both inactivated all the relationships, but one recreated them in the subsequent release.

Other changes are summarised below.
 

Types of Relationships Modified

A large variety (43) of relationship types are involved in the edits, most are IS A, and some are not part of the approved concept model or are attributes specific to an extension.
 

select count(distinct sourceId) from X_Relationships where moduleId not in(900000000000207008,900000000000012004,900062011000036108) -- exclude international+AMT and sourceId in (select id from X_Concepts where moduleId in(900000000000207008,900000000000012004) and active) group by moduleId;

Comparison Examples - Published (inferred) relationships for Core concepts

Concept

Core

Extension

Concept

Core

Extension

371040005

321000119108

Note: This example, appears to be a promoted concept. But the local relationships haven't been inactivated upon promotion. Examples such as this are a use case for promoting both stated and inferred relationships. Such that maintenance burden on NRCs is reduced, and authoring effort recognised.

212385001

Additional Observations

The following observations are only exemplars of the observations made, and by no means comprehensive.

Extensions vs Editions

Of the 9 releases looked at:

  • Four publish Editions

  • Three publish Extensions

  • One publishes three separate extensions.

  • One publishes an extension, "bundled" with the International Edition.

File naming

The file naming conventions, do not appear to be consistent across the extensions.

  •  sct2_Concept_Snapshot_AU1000036_20161231.txt

  •  sct2_Concept_Snapshot_en-CanadianExtension_20161031.txt

  • sct2_Concept_Snapshot_DK1000005_20161130.txt

  • sct2_Concept_Snapshot_LT1000092_20151107.txt

  • sct2_Concept_Snapshot_NL_20160930.txt

  • sct2_Concept_Snapshot_SE1000052_20161130.txt

  • sct2_Concept_Snapshot_GB1000000_20161001.txt

  • sct2_Concept_Snapshot_US1000124_20160901.txt

  • sct2_Concept_Snapshot_es-UruguayExtension_20161215.txt

  • sct2_Concept_Snapshot_INT_20160731.txt

Directory structure

Some variation was noticed in the the directory structure within the published zip files.
Below are the paths the the snapshot concepts file in each release. 

  • \SnomedCT_Release_AU1000036_20161231\RF2Release\Snapshot\Terminology

  • \SnomedCT_Canadian_EnglishExtension_Release_20161031\Snapshot\Terminology

  • \SnomedCT_ManagedServiceDK_Production_DK1000005_20161130\Snapshot\Terminology

  • \SnomedCT_RF2Release_LT1000092_20151107\Snapshot\Terminology

  • \SnomedCT_Netherlands_EditionRelease_20160930\Snapshot\Terminology

  • \SnomedCT_SE_Production_20161130T170000\Snapshot\Terminology

  • \SnomedCT_RF2Release_GB1000000_20161001\Snapshot\Terminology

  • \SnomedCT_RF2Release_US1000124_20160901\Snapshot\Terminology

  • \SnomedCT_Uruguay_Extension_Release_20161215\Snapshot\Terminology

Specific file inclusions

The international release includes 6 files - Concepts, Description, Relationship,StatedRelationship,Identifier and TextDefinition files - within the "Terminology Folder"
The files are not consistently present in extensions.

 

Concept

Description

Relationship

StatedRelationship

 

Concept

Description

Relationship

StatedRelationship

Copyright © 2026, SNOMED International