Results of Analysis of SNOMED CT Extensions

Results of Analysis of SNOMED CT Extensions

Introduction

In January 2017, the Content Managers Advisory Group (CMAG) initiated an action to conduct a survey of the national extensions that were available, to date there has been limited information available - to other members and presumably SNOMED International. The responses of this survey are available /wiki/spaces/cmag/pages/133989136. The results shows a variety of extensions are produced, ranging from subset/refset, language translations and clinical content development. Subsequent activities investigating collaboration on subset may be pursued, but the CMAG was also interested in exactly what clinical content was in each extension, and how this might be shared. There should be very little clinical content that is exclusive to any country, and if it has been developed by one extension - it's likely globally relevant, and sharing it can reduce duplicated effort and maintenance burden of extension builders.

The results described here are a result of a combination of objective metrics (size of content), crude identifiction of duplicated effort, and finally some incidental quality observations. Further analysis of the content is still underway using description logic techniques, the results of which will be made available separately, at a later date.

SQL snippets are included in the document for future reference by author, but will unlikely be useful to public readership.

A summary of the results is available in the conclusion section at the end of this paper.

The cooperation of all Members is appreciated, and whilst all effort has been made to represent the extensions accurately, any inaccuracies are accidental.

Summary of Extensions

14 NRC responded to the survey, with 9 indicating they created clinical content extensions.
The Australian Edition also includes it's national drug extension, which has been excluded from this round of analysis (as no other extension appeared to include such content)
All extensions were based upon the July 2016 international release except one. This exception may produce some anomalies, but they are limited to the extension. 

A raw analysis of the active concepts within an extension.

Note: UK extension (~30 thousand concepts) is about 6 times larger than US (5000) and has been excluded from the graph as an outlier. Raw values are shown to the right.

NRCSizePrimitive Defined 
AU1339105678.9%28321.1%
CA2040203099.5%100.5%
DK290290100.0%00.0%
LT00 N/A0N/A 
NL104168665.9%35534.1%
SE1175106090.2%1159.8%
UK2979329793100.0%00.0%
US5100276454.2%233645.8%
UY41537189.4%4410.6%

Ratio of active to inactive concepts

NRCProportion currently active
SNOMED CT Netherlands NRC maintained module95.3%
US National Library of Medicine maintained module93.3%
módulo de la extensión de Uruguay95.6%
Canada Health Infoway English module68.3%
SNOMED CT Sweden NRC maintained module99.6%
Australian common model component extension33.3%
Danish module88.4%
SNOMED Clinical Terms Australian extension97.0%
SNOMED CT United Kingdom clinical extension module32.9%

All analysis was only performed on active content.

Extension changes against International Concept IDs

A total of 40 core concepts have been modified by extensions in some way.
Two were retired by an extension

  •  384612007|pT4a: Tumor directly invades other organs or structures (colon/rectum) (finding)|
  • 384613002|pT4b: Tumor penetrates visceral peritoneum (colon/rectum) (finding)|
  • (A third concept was retired, but later reactivated)

One concept had a change to definition status (marked Defined) by an extension

  • 399733007|Excision of retroperitoneal lymph node (procedure)|

Eight of these appear to be an attempt to address issues within the module assignment in the international release. (i.e. Concept inactivated on a different module to what they were created. metadata vs core).

The remainder are simply changes to moduleId, and either represent content promotion from an extension to the International. Or a possible error.

select id,count(*) from X_Concepts
where id in (246089008,246221002,260670006,263512003,263513008,447564002,449609005,700043003,11000119105,41000179103,441000119109,601000119109,1111000119100,1561000119105,4181000179103,4191000179101,4201000179104,4211000179102,4221000179107,4231000179109,4241000179101,4251000179103,4261000179100,4271000179106,4281000179108,4301000179109,4311000179106,4321000179101,4331000179104,4341000179107,4351000179105,5461000179100,5471000179106,5481000179108,5491000179105,5531000179105)
and moduleId != 161771000036108 
group by id
having count(distinct moduleId) > 1

Extension Concepts

There appears to be around 52 unique semantic tags across the extension content. many of these are attributable to translations. Not all extensions provide english FSNs for extension content1, semantic tags were manually translated and merged.
After normalisation, this comes to 32 semantic tags. The distribution of content is shown below.

Semantic TagCount
procedure11372
finding6925
observable entity5609
disorder4783
situation2677
event2251
qualifier value1903
record artifact1361
regime/therapy846
assessment scale549
occupation543
substance503
foundation metadata concept414
morphologic abnormality321
product320
person134
navigational concept132
environment / location114
organism114
body structure106
specimen95
administrative66
physical object59
link assertion22
core metadata concept19
ethnic group16
attribute7
social concept5
religion/philosophy4
linkage concept3
tumor staging2
cell1

 

9 Modules are in use across the extensions.

ModuleIdFSNCountry
11000146104SNOMED CT Netherlands NRC maintained moduleNL
731000124108US National Library of Medicine maintained moduleUS
5631000179106módulo de la extensión de UruguayUY
20621000087109Canada Health Infoway English moduleCA
45991000052106SNOMED CT Sweden NRC maintained moduleSE
161771000036108Australian common model component extensionAU
554471000005108Danish moduleDK
32506021000036107SNOMED Clinical Terms Australian extensionAU
999000011000000103SNOMED CT United Kingdom clinical extension moduleUK

The type of content by hierarchy

Each Top level hierarchy reviewed below for extension content.
Duplicates were found by comparing terms across extensions within given hierarchy. For example, "Look for duplicate terms within the procedure hierarchy". Duplicates within a module were also ignored.

Analysis was done on the complete aggregate of extensions plus the (International Core).
The presence of duplication may indicate:

  1. Extension concepts also in the core, either before or after.
    • Those where the concept appears in the International release after it's creation in an extension represent a maintenance burden for NRC's in the absence of a promotion process.
  2. At least two countries producing similar, if not same, content. Which would suggest it's not necessarily country specific content.

Initial analysis is agnostic of description types, however analysis was further performed on just FSNs to increase likelihood of duplicate detection.
A major limitation in the approach used is that translations will (almost) be inherently unique, so comparison is dependent on English terms.
It was discovered mid analysis that a setting within the analysis database, may have caused incorrect character renderings however, this is not expected to have consequence on this analysis. 

SET @Hierarchy = 404684003;

select term,count(distinct moduleId) from X_Descriptions
where conceptId in (select distinct id from X_Concepts where active)
and moduleId != 900062011000036108 -- exclude AMT module
-- and moduleId not in(900000000000207008,900000000000012004) -- exclude international
and typeId = 900000000000003001
and conceptId in (select sourceId from X_TransitiveClosure where destinationId = @Hierarchy)
and active = 1
group by term
having count(distinct moduleId) > 1;

-- candidates for consideration.
select * from X_Descriptions
-- active descriptions for active concepts
where active and conceptId in (select distinct id from X_Concepts where active)
-- target hierarchy
and conceptId in (select sourceId from X_TransitiveClosure where destinationId = @Hierarchy)
and term in (select distinct term from X_Descriptions
					where conceptId in (select distinct id from X_Concepts where active)
					and moduleId != 900062011000036108 -- exclude AMT module
					-- and moduleId not in(900000000000207008,900000000000012004)
					and conceptId in (select sourceId from X_TransitiveClosure where destinationId = @Hierarchy)
					and active = 1
					group by term
					having count(distinct moduleId) > 1);

 

Clinical finding

Potential Concept Duplication

26 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • US National Library of Medicine maintained module
  • SNOMED CT United Kingdom clinical extension module
  • SNOMED CT Netherlands NRC maintained module
  • SNOMED Clinical Terms Australian extension
  • SNOMED CT Sweden NRC maintained module
  • Danish module

All but the Danish module have some overlap with each other, as well as the international release.
These are the identified FSNs.There are 6400 synonyms that are not unique across this set. There appear to be a number of reasons for this, though most seem to relate to translations.

For example:

  • 371093006|Urosepsis (disorder)| has descriptions in, the extensions from three countries, that are the same as the 'en' descritpion.
  • 27830001|Brachial radiculitis (disorder)| has translations in two extensions that are different to the 'en', but differ from eachother by the case of the first character.
  • 75049004|Jeune thoracic dystrophy (disorder)| has translations in two extensions that appear identical.

These may have different character encoding or punctuation conventions, or written languages are genuinely similar (Danish and Swedish). A binary (eliminating case differences) compare halved the number of duplicate terms identified. It's unclear (to the author) what the standards and rules are concerning translations - are they complete (all concepts), some (only concepts of interest), as necessary (where word is different).

Procedure

Potential Concept Duplication

16 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • SNOMED CT Netherlands NRC maintained module

  • US National Library of Medicine maintained module

  • SNOMED Clinical Terms Australian extension

  • SNOMED CT United Kingdom clinical extension module

  • Canada Health Infoway English module

  • SNOMED CT Sweden NRC maintained module

These are the identified FSNs.


Special concept

Potential Concept Duplication

15 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • SNOMED CT Netherlands NRC maintained module
  • US National Library of Medicine maintained module
  • SNOMED CT United Kingdom clinical extension module
  • Canada Health Infoway English module
  • SNOMED CT core module
  • Danish module
  • SNOMED Clinical Terms Australian extension

These are the identified FSNs.The mix of semantic tags in this set, suggest a possible issue with the transitive queries and history of the "aggregate release". Further investigation is required.


Situation with explicit context

Potential Concept Duplication

8 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • SNOMED CT United Kingdom clinical extension module
  • SNOMED CT Netherlands NRC maintained module
  • SNOMED CT Sweden NRC maintained module
  • US National Library of Medicine maintained module

These are the identified FSNs.


Observable entity

Potential Concept Duplication

There are no FSNs duplicated across the content.
There are 476 duplicate synonyms across this set. The affected concepts are in the following extensions.

  • SNOMED CT Netherlands NRC maintained module
  • SNOMED CT core module
  • Danish module
  • SNOMED CT Sweden NRC maintained module
  • SNOMED CT United Kingdom clinical extension module

Event

Potential Concept Duplication

No FSNs are duplicated across the content.
17 synonyms are duplicated, the affected concepts are in the following extensions.

  • Danish module
  • SNOMED CT core module
  • SNOMED CT Sweden NRC maintained module
  • SNOMED CT Netherlands NRC maintained module

Qualifier value

Potential Concept Duplication

28 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • SNOMED CT United Kingdom clinical extension module
  • Canada Health Infoway English module
  • US National Library of Medicine maintained module

These are the identified FSNs.


Record artifact

Potential Concept Duplication

8 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • SNOMED CT Netherlands NRC maintained module
  • SNOMED CT United Kingdom clinical extension module

These are the identified FSNs.


Social context

Potential Concept Duplication

5 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • US National Library of Medicine maintained module
  • SNOMED Clinical Terms Australian extension
  • Canada Health Infoway English module

These are the identified FSNs.


Substance

Potential Concept Duplication

15 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • US National Library of Medicine maintained module
  • SNOMED Clinical Terms Australian extension
  • Canada Health Infoway English module
  • SNOMED CT core module
  • SNOMED CT United Kingdom clinical extension module

These are the identified FSNs.


Body structure

Potential Concept Duplication

No FSNs duplicated across the content.

1,888 synonyms are duplicated across the content, the affected concepts are in the following extensions.

  • Danish module
  • SNOMED CT Sweden NRC maintained module
  • SNOMED CT core module
  • Lithuania
  • SNOMED Clinical Terms Australian extension
  • US National Library of Medicine maintained module
  • SNOMED CT United Kingdom clinical extension module

Staging and scales

Potential Concept Duplication

No FSNs duplicated across the content, which are almost certainly candidates for promotion.
464 synonyms are duplicated across the extensions. The affected concepts are in the following extensions.

  • Danish module
  • SNOMED CT core module
  • SNOMED CT Sweden NRC maintained module
  • SNOMED CT United Kingdom clinical extension module

 

Pharmaceutical / biologic product

Potential Concept Duplication

12 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • Canada Health Infoway English module
  • US National Library of Medicine maintained module
  • SNOMED Clinical Terms Australian extension

These are the identified FSNs.There is obviously an issue with the semantic tag and transitive queries. This may be a problem with the analysis or content.

Organism

Potential Concept Duplication

2 FSNs duplicated across the content, which are almost certainly candidates for promotion.
The affected concepts are in the following extensions.

  • US National Library of Medicine maintained module
  • SNOMED CT core module
  • Canada Health Infoway English module

These are the identified FSNs.


Environment or geographical location

Potential Concept Duplication

No FSNs duplicated across the content.
353 synonyms are duplicated across the extensions, The affected concepts are in the following extensions.

  • Danish module
  • SNOMED CT Sweden NRC maintained module
  • SNOMED CT core module
  • Lithuania

Specimen

Potential Concept Duplication

No FSNs duplicated across the content.
Nine synonyms are duplicated. The affected concepts are in the following extensions.

  • Danish module
  • SNOMED CT Sweden NRC maintained module
  • SNOMED CT core module
  • US National Library of Medicine maintained module
  • SNOMED CT United Kingdom clinical extension module

Physical object

Potential Concept Duplication

No FSNs duplicated across the conten.
399 synonyms are duplicated across the extensions. The affected concepts are in the following extensions.

  • Danish module
  • SNOMED CT Sweden NRC maintained module
  • SNOMED CT core module

Physical force

Single Concept : U-V radiation in diagnosis NOS (physical force)
 

Extension Descriptions

Most analysis performed as part of identifying duplicates within concepts. However, below is a summary of the translations - (extension descriptions for core concepts).

Extension Changes to Core Descriptions

178 International descriptions have some modification in an extension. The associated modules are:

  • Australian common model component extension
  • SNOMED Clinical Terms Australian extension
  • US National Library of Medicine maintained module

Relationship Extensions

5,136 core concepts have been changes within an extension. Some of these look like promotions, however the majority do not appear to be.
 

Note: Some of the numbers comparing stated and inferred look odd, this is likely a result of the crude aggregation of extensions and some of the extension content already having been promoted to core.

Core relationships modified within an Extension

1,997 core relationships where modified by an extension, affecting 384 concepts

A single concept, 425630003|Acute irritant contact dermatitis (disorder)| was modified by two NRCs.
Both inactivated all the relationships, but one recreated them in the subsequent release.

Other changes are summarised below.
 

Types of Relationships Modified

A large variety (43) of relationship types are involved in the edits, most are IS A, and some are not part of the approved concept model or are attributes specific to an extension.
 

select count(distinct sourceId)  from X_Relationships
where moduleId not in(900000000000207008,900000000000012004,900062011000036108) -- exclude international+AMT
and sourceId in  (select id from X_Concepts where moduleId in(900000000000207008,900000000000012004) and active)
group by moduleId;

Comparison Examples - Published (inferred) relationships for Core concepts

ConceptCoreExtension
371040005

321000119108

Note: This example, appears to be a promoted concept. But the local relationships haven't been inactivated upon promotion. Examples such as this are a use case for promoting both stated and inferred relationships. Such that maintenance burden on NRCs is reduced, and authoring effort recognised.

212385001

Additional Observations

The following observations are only exemplars of the observations made, and by no means comprehensive.

Extensions vs Editions

Of the 9 releases looked at:

  • Four publish Editions
  • Three publish Extensions
  • One publishes three separate extensions.
  • One publishes an extension, "bundled" with the International Edition.

File naming

The file naming conventions, do not appear to be consistent across the extensions.

  •  sct2_Concept_Snapshot_AU1000036_20161231.txt
  •  sct2_Concept_Snapshot_en-CanadianExtension_20161031.txt
  • sct2_Concept_Snapshot_DK1000005_20161130.txt
  • sct2_Concept_Snapshot_LT1000092_20151107.txt
  • sct2_Concept_Snapshot_NL_20160930.txt
  • sct2_Concept_Snapshot_SE1000052_20161130.txt
  • sct2_Concept_Snapshot_GB1000000_20161001.txt
  • sct2_Concept_Snapshot_US1000124_20160901.txt
  • sct2_Concept_Snapshot_es-UruguayExtension_20161215.txt
  • sct2_Concept_Snapshot_INT_20160731.txt

Directory structure

Some variation was noticed in the the directory structure within the published zip files.
Below are the paths the the snapshot concepts file in each release. 

  • \SnomedCT_Release_AU1000036_20161231\RF2Release\Snapshot\Terminology
  • \SnomedCT_Canadian_EnglishExtension_Release_20161031\Snapshot\Terminology
  • \SnomedCT_ManagedServiceDK_Production_DK1000005_20161130\Snapshot\Terminology
  • \SnomedCT_RF2Release_LT1000092_20151107\Snapshot\Terminology
  • \SnomedCT_Netherlands_EditionRelease_20160930\Snapshot\Terminology
  • \SnomedCT_SE_Production_20161130T170000\Snapshot\Terminology
  • \SnomedCT_RF2Release_GB1000000_20161001\Snapshot\Terminology
  • \SnomedCT_RF2Release_US1000124_20160901\Snapshot\Terminology
  • \SnomedCT_Uruguay_Extension_Release_20161215\Snapshot\Terminology

Specific file inclusions

The international release includes 6 files - Concepts, Description, Relationship,StatedRelationship,Identifier and TextDefinition files - within the "Terminology Folder"
The files are not consistently present in extensions.

 ConceptDescriptionRelationshipStatedRelationshipIdentifierTextDefinition
Australia111 1 
Candana*1111 1
Denmark**121112
Lithuania111   
Netherlands111  1
Sweden**12111 
UK1111  
USA111111
Uruguay1111 1

* Canada include a French and English bundle.
** Sweden and Denmark include both an English and Native language Description file.

Denmark and USA are the countries to include all 6 files.
Further variations are present within the refset subdirectories.

Miscellaneous QA issues

The description file for one extension for found to be missing the language code for 268 entries. (The country was notified and have rectified).

Conclusion

  • The analysis described above is reveals a wealth of information. There is evidence of duplication of content in almost every hierarchy, the extent of which likely to be much greater given the primitive analysis techniques used.
  • There is value to the whole SNOMED CT community to introduce a process for content to be promoted through to core. This process should honour the identifiers issued by an extension builder, so as to minimise maintenance burden of the originating extension, and recognising their effort. The current process is prohibitive to promoting content, and consequently content is duplicated across extensions, the potential maintenance debt grows.
  • Almost all NRCs are actively modifying core content. Which proves the importance of clarifying the issues raised in the Discussion Paper - Allowance of Extensions to Modify Core Content (SNOMED International Response). It seems Members have taken a different interpretation of the license to that held by the governing body, and this discrepancy has never been recognised.
  • Variations in the artefacts published by NRCs exist. No comment is made about compliance with Technical Specifications, but such variances may impact portability of software that consumes SNOMED CT. Some sort of certification/verification process asserting a minimum conformance criteria would prove valuable.
  • All the issues described here may just as likely apply to other (affiliate) extensions, however will remain unknown without systematic investigation.
  • The issues described are real and Members are currently struggling to deal with. Prolonging their resolution introduces a cost to all.

Endnotes

1The author recalls a requirement that a US spelling 'en' FSN should be created for all concepts, but unable to identify this in current specification. Is this still a requirement?
 

Copyright © 2025, SNOMED International