Spackman Perl Transform - End User Guide
This document describes the behavior of the current "Spackman Perl Script", the defacto standard for the representation of SNOMED CT in OWL. It then makes several recommendations about possible issues and how the transformation process might be improved.
Contents
Introduction
The file analyzed in this document is tls2_StatedRelationshipsToOwlKRSS_INT_20160131.pl, as distributed in the January 2016 SNOMED CT International release. The Spackman Perl Script is capable of generating OWL in either XML or Functional syntax as well as KRSS. This document focuses specifically on the OWL and, when examples are needed, uses the OWL Turtle syntax in place of the default choices above. This choice was made because of readability and greater community familiarity with Turtle vs. OWLF.
Distinguished Concept Identifiers
The transform calls out a number of special concept identifiers that are used in the transformation itself. The comment associated with the identifier list states:
# CAUTION: The values for these parameters depend on the particular release of SNOMED CT.
# Do not assume they remain stable across different releases.
# **************************************************************
# These values are valid for: 20150131, release format 2 (RF2).
# **************************************************************
The following table describes the distinguished the SNOMED CT concept codes that are used in the actual transformation process:
Identifier | Description |
410662002 | Concept model attribute | | This concept and all of its descendants, with the exception of 16680003|is a| are represented as owl:ObjectProperty. All other concepts are represented as owl:Class.
408739003|Unapproved attribute| and 410663007|Concept history attribute | and their descendants would be represented as owl:Classes rather than properties were the concepts defined in the 900000000000012004| SNOMED CT model component module| module being emitted (which they currently are not).
|
16680003|is a| | This relationship is represented as rdfs:subClassOf or owl:equivalentClass predicates in the generated OWL. Because this is defined in the metadata (model component) module, it is not included in the output. |
123005000|Part of| 272741003|Laterality| 127489000|Has active ingredient| 411116001|Has dose form| | These attributes are not forced inside a role group when they appear in a 0 role group or as the only member of a non-zero role group. All other concept model attributes will always appear inside a role group in the generated OWL. |
363701004|Direct substance| --> 127489000|Has active Ingredient|
| This is the only "right identifier" currently recognized in the OWL. Right identifiers are represented in OWL as property chains. |
900000000000207008|SNOMED CT core|
| The concepts defined in this module are included in the OWL output. All other concepts are ignored. Note that moduleid is not checked in description, definition or relationships. |
900000000000073002|Defined| | Concepts with a definition status of "Defined" and which have more than one parent are defined using owl:equivalentClass. All other concepts definitions use rdfs:subClassOf. This attribute is ignored for ObjectProperties, meaning that it is not possible to define owl:equivalentProperty elements. |
900000000000003001|Fully specified name| 900000000000013009|Synonym| 900000000000548007|Preferred| 900000000000549004|Acceptable| | Used to recognize FSN's, synonyms, preferred and acceptable terms in the description/language refset file |
609096000|Role group| | This concept identifier is not actually used in the RF2 relationships file, but is the relationship (predicate) that is used to represent role groups in OWL |
Namespaces used in the transformation output
Namespace | URI | Description |
rdf: |
| |
rdfs: |
| |
xsd: | http://www.w3.org/2001/XMLSchema#
| (Included in header but not used in actual OWL) |
xml: | Used in XML format only | |
owl: |
| |
sctp | Used to identify preferred / acceptable terms | |
sctf: | Used exclusively as prefix for definitions. | |
: | All SNOMED CT concept identifiers |
The Transformation Header
The namespaces and ontology identification section is hard-coded. In particular, the module id, version URI and release date are static entries and have to be edited every release.
<http://snomed.info/sct/900000000000207008> a owl:Ontology ;
rdfs:label "SNOMED Clinical Terms, International Release, Stated Relationships in OWL RDF" ;
rdfs:comment """Generated as OWL RDF/XML from SNOMED CT release files by Perl transform script. Input concepts file [snip]
Copyright 2015 The International Health Terminology Standards Development Organisation (IHTSDO). All Rights Reserved. SNOMED CT was originally created by The College of American Pathologists. \\"SNOMED\\" and \\"SNOMED CT\\" are registered trademarks of the IHTSDO. [snip] """ ;
owl:versionIRI <http://snomed.info/sct/900000000000207008/version/20150131> ;
owl:versionInfo "International Release, Core Module, Release Date: 20150131" .
Issue: The elements in bold above are hard-coded. Changing the moduleId variable does not change any of the URI's above.
Issue: The output paths for OWL / OWLF / KRSS are separate. Any changes in output have to be realized the same way 3 times in a row. The OWL and OWLF output have diverged -- one gets different results for each path.
Object properties
Every active descendant of 410662002 | Concept model attribute | with the exception of 116680003|is a| is defined as an owl:ObjectProperty using the following template:
:{sctid} a owl:ObjectProperty ;
rdfs:label "{English FSN}"@en ;
sctf:Description.term.en-us.preferred "{selected language preferred term}"@en ;
sctf:Description.term.en-us.synonym "{acceptable synonym}"@en;
[rdfs:subPropertyOf {parent sctid}] ;
[owl:propertyChain ( {lhs sctid} {rhs sctid})] .
Example:
:609096000 a owl:ObjectProperty ;
rdfs:label "Role group (attribute)"@en ;
sctf:Description.term.en-us.preferred "Role group"@en .
[] rdfs:subPropertyOf :363701004 ;
owl:propertyChain ( :363701004 :127489000 ) .
Issue: The preferred term is always "term.en-us.preferred" no matter which language is selected in the header.
Issue: The language code on the actual text is always "en", no matter what language and dialect is included in the input.
Issue: The OWL XML syntax only prints the FSN and preferred name. The OWL functional syntax includes synonyms.
Issue: The script assumes (correctly at the moment) that properties: (a) have a single parent and (b) are primitive.
Issue: The FSN is assumed to be English. This script takes the stance that FSN's are not language specific (or are English specific), which is not necessarily true.
SNOMED CT Concepts and Descriptions
Each active SNOMED CT Concept that is defined in the core module id is defined as an owl:Class using the following template:
:{sctid}> a owl:Class ;
rdfs:label "{English FSN}"@en ;
sctf:Description.term.en-us.preferred "{selected language preferred term}"@en ;
sctf:Description.term.en-us.synonym "{acceptable synonym}"@en;
Zero or more synonyms are emitted and (curiously) the code is designed to deal with concepts that have no FSN's.
Issue: neither Preferred nor Synonym is explicitly defined as an annotation property in the OWL XML syntax. This appears to work, but is a bit sloppy.
SNOMED CT Definitions
If the SNOMED CT is associated with an active entry in the RFW definitions file, the following is emitted:
sctf:TextDefinition.term "{definition text}"@en ;
Issue: The script assumes that there is exactly one, English language, definition. There is no connection to the Language Refset, and "en" is hard coded.
SNOMED CT Relationships
An active RF2 relationship entry as used in the RF2 stated relationship snapshot file has the following components:
Component | Description | Usage in OWL |
id | Unique identifier of entry | (unused) |
effectiveTime | release of last change in entry | (unused) |
active | 1 means currently active, 0 means retired | Only active entries are represented in OWL |
moduleId | The module that 'owns' this assertion | Ignored -- all relationship entries are emitted as long as the RF2 concept file entry for sourceId is 'owned' by the core module |
sourceId | The SCTID of the source or 'subject' |
|
destinationId | The SCTID of the destination or 'object' |
|
relationshipGroup | An integer, where all entries for the same source and non-zero relationship group are considered to be a single definition. If zero, each row is considered to be a definition unto itself. | See: Processing Rules for 2 or more relationship entries below |
typeId | The SCTID of the type or 'predicate' |
|
characteristicTypeId | Descendant of 900000000000449001|Characteristic type| currently one of:
| Ignored -- |
modifierId | Descendant of 900000000000450001|Modifier| currently one of: 900000000000452009|All| 900000000000451002|Some| | Ignored -- all modifiers in the current release are "Some". |
The script assumes that every sourceId fits one of three situations:
There are no active relationship entries for sourceId -- the entry is treated as a root node and the definition is considered to be complete.
There is exactly one active relationship entry for sourceId -- the typeId for the entry is assumed to be 116680003|is a| (unchecked!) and the following OWL is emitted:
rdfs:subClassOf sctid:{destinationId} .There are two or more active relationship entries for sourceId. The processing rules for this situation are defined in the next section.
Processing Rules for 2 or more relationship entries
If there are two or more active relationship entries for a given subject SCTID, the definition status for concept determines whether the definition will be in the form:
owl:equivalentClass [ a owl:Class ;
owl:intersectionOf ( ... ) .
if the definition status is 900000000000073002|Defined| and:
rdfs:subClasOf [ a owl:Class ;
owl:intersectionOf ( ... ) .
otherwise.
The intersection is filled out as follows:
For every relationship entry with typeId = 116680003|is a| , output the URI of the corresponding destinationId.
For every unique non-zero relationshipGroup entry, emit:
[ a owl:Restrictionowl:onProperty sctid:609096000 ;owl:someValuesFrom [ a owl:Class ;owl:intersectionOf ( ...) ] ]
Followed by:
[ a owl:Restriction ;owl:onProperty sctid:{typeId};owl:someValuesFrom sctid:{destinationId} ]
for every row in the group. Note: owl:intersectionOf is not emitted if there is only one row in a given group.For each zero relationshipGroup entry with a typeId other than 116680003|is a|:
if typeId is one of the "never grouped" types emit:
[ a owl:Restriction ;owl:onProperty sctid:{typeId};owl:someValuesFrom sctid:{destinationId} ]
otherwise:
[ a owl:Restriction ;owl:onProperty sctid:609096000 ;owl:someValuesFrom [ a owl:Restriction ;owl:onProperty sctid:{typeId};owl:someValuesFrom sctid:{destinationId} ]]
Examples:
A Root concept
sctid:138875005 a owl:Class ;
rdfs:label "SNOMED CT Concept (SNOMED RT+CTV3)"@en ;
sctf:Description.term.en-us.preferred "SNOMED CT Concept"@en ;
sctf:Description.term.en-us.synonym "SNOMED CT has been created by combining SNOMED RT and a computer-based nomenclature and classification known as Read Codes Version 3, which was created on behalf of the U.K. Department of Health."@en,
"SNOMED Clinical Terms version: 20160131 [R] (January 2016 Release)"@en .
A concept with a single parent
sctid:14120002 a owl:Class ;
rdfs:label "Rodenticide (substance)"@en ;
sctf:Description.term.en-us.preferred "Rodenticide"@en ;
rdfs:subClassOf sctid:59545008 .
A primitive concept with two parents
sctid:108003 a owl:Class ;
rdfs:label "Entire condylar emissary vein (body structure)"@en ;
sctf:Description.term.en-us.preferred "Entire condylar emissary vein"@en ;
sctf:Description.term.en-us.synonym "Condylar emissary vein"@en ;
rdfs:subClassOf [ a owl:Class ;
owl:intersectionOf ( sctid:59191008 sctid:154631009 ) ] .
A fully defined concept with one parent and one non-zero role group
sctid:74400008 a owl:Class ;
rdfs:label "Appendicitis (disorder)"@en ;
sctf:Description.term.en-us.preferred "Appendicitis"@en ;
owl:equivalentClass [ a owl:Class ;
owl:intersectionOf ( sctid:18526009 [ a owl:Restriction ;
owl:onProperty sctid:609096000 ;
owl:someValuesFrom [ a owl:Class ;
owl:intersectionOf ( [ a owl:Restriction ;
owl:onProperty sctid:116676008 ;
owl:someValuesFrom sctid:23583003 ] [ a owl:Restriction ;
owl:onProperty sctid:363698007 ;
owl:someValuesFrom sctid:66754008 ] ) ] ] ) ] .
A concept with both zero and non-zero role groups
Relationship file entries:
sourceId destinationId relationshipGroup typeId
425630003 110979008 0 116680003
425630003 111189002 0 116680003
425630003 105590001 0 246075003
425630003 400195000 0 42752001
425630003 424124008 0 263502005
425630003 472963003 0 370135005
425630003 39937001 1 363698007
425630003 4532008 1 116676008
Compositional grammar definition:
425630003| Acute irritant contact dermatitis | =
111189002| Acute contact dermatitis| +
110979008| Irritant contact dermatitis|:
42752001| Due to |=400195000| Contact hypersensitivity reaction|,
246075003| Causative agent|=105590001| Substance|,
263502005| Clinical course|=424124008| Sudden onset AND/OR short duration|,
370135005| Pathological process|=472963003| Hypersensitivity process|,
{ 363698007| Finding site |=39937001| Skin structure|,
116676008| Associated morphology |=4532008| Acute inflammation | }
OWL:
scid:425630003 a owl:Class ;
rdfs:label "Acute irritant contact dermatitis (disorder)"@en ;
sctf:Description.term.en-us.preferred "Acute irritant contact dermatitis"@en ;
owl:equivalentClass [ a owl:Class ;
owl:intersectionOf ( sctid:110979008 sctid:111189002 [ a owl:Restriction ;
owl:onProperty sctid:609096000 ;
owl:someValuesFrom [ a owl:Restriction ;
owl:onProperty sctid:246075003 ;
owl:someValuesFrom sctid:105590001 ] ] [ a owl:Restriction ;
owl:onProperty sctid:609096000 ;
owl:someValuesFrom [ a owl:Restriction ;
owl:onProperty sctid:42752001 ;
owl:onProperty sctid:400195000 ] ] [ a owl:Restriction ;
owl:onProperty sctid:609096000 ;
owl:someValuesFrom [ a owl:Class ;
owl:intersectionOf ( [ a owl:Restriction ;
owl:onProperty sctid:363698007;
owl:someValuesFrom sctid:39937001> ] [ a owl:Restriction ;
owl:onProperty sctid:116676008 ;
owl:someValuesFrom sctid:4532008 ] ) ] ] ) ] .
A concept with mixed relationshipGroup zeros
sourceId destinationId relationshipGroup typeId
10243007 385101003 0 411116001 never grouped
10243007 387253001 0 127489000 never grouped
10243007 387350000 0 127489000 never grouped
10243007 420081005 0 116680003 One parent
sctid:10243007 a owl:Class ;
rdfs:label "Benzoic and salicylic acid ointment (product)"@en ;
sctf:Description.term.en-us.preferred "Benzoic and salicylic acid ointment"@en ;
sctf:Description.term.en-us.synonym "Whitfield's ointment"@en ;
rdfs:subClassOf [ a owl:Class ;
owl:intersectionOf ( sctid:420081005
[ a owl:Restriction ;
owl:onProperty sctid:411116001 ;
owl:someValuesFrom sctid:385101003 ]
[ a owl:Restriction ;
owl:onProperty sctid:127489000 ;
owl:someValuesFrom sctid:387253001 ]
[ a owl:Restriction ;
owl:onProperty sctid:127489000 ;
owl:someValuesFrom sctid:387350000 ] ) ] .
Notes and Observations:
There are a number of elements that are hard-coded into the script, including dates, the ontology identifier itself, etc. This makes the script less than useful for anything but the SNOMED CT Core module.
Knowledge about "never grouped" concepts are in the script but not in the RF2 release. This tacit knowledge should be made explicit in the RF2 release so that this sort of editing is not required and the script can be applied to other modules/extensions.
Knowledge about "right identity" (property chain) concepts are in the script but not in the RF2 release. This tacit knowledge should be made explicit in the RF2 release so that this sort of editing is not required and the script can be applied to other modules/extensions.
The script makes a number of assumptions about the consistency and integrity of the RF2 release. It also makes assumptions about which aspects of the RF2 release are actually in use (e.g. "All", one active FSN per concept, at most one definition per concept which is assumed to be in english, etc.)
The script is not designed to address modules other than the root module. The root module is hard-coded and the module dependency file is ignored, meaning that no import statements are included in the output.
The script is currently us-english centric and requires modification to emit other dialects of languages. It is also not equipped to emit all language variants.
The script includes unnecessary owl:intersectionOf assertions on rdfs:subClassOf entries. This makes the output less readable and non-standard.
The script should use skos:definition, skos:prefLabel and skos:altLabel instead of defining special properties in the sctp and sctf namespace.
The script needs to address some of the currently unused features in the RF2, including the "qualifying" and "additional" relationship types as well as the "All" modifier. If these features are not going to be used, they should be removed so that extension authors will not be tempted.
The script needs to address the elements in the "Unapproved attribute" branch.
Copyright © 2026, SNOMED International