Representation of SNOMED in OWL.v0.9

Representation of SNOMED in OWL.v0.9

Representing SNOMED CT RF2 Distribution in OWL


This document describes how a SNOMED CT RF2 distribution should be represented in OWL. It is a generalization of an earlier specification written in Perl, called the "Spackman Perl Script", and will work with any reasonable RF2 content.

Introduction

For some time now, the SNOMED CT International RF2 distribution has included a Perl script named "tls2_StatedRelationshipsToOwlKRSS_INT_<date>.pl", which can be used to generate an OWL representation of the distribution in RDF XML, Owl Functional or KRSS syntax. This script is the closest that the SNOMED International organization has come to defining an official OWL representation for SNOMED CT. This approach has a number of shortcomings, however, including:

  • The transformation only works with the SNOMED CT International release. It cannot be applied unchanged to other distributions.

  • The transformation can only emit descriptions in one language

  • There are subtle but significant differences between the various output formats


In addition, while this Perl script could be viewed as a formal specification, one has to be a bit of a coding expert to be able to understand the actual transformation rules expressed within the document.
The intent of this specification is to "reverse engineer" the Perl transformation and document the transformation rules in such a way that:

  1. They can be consistently applied to any SNOMED CT release or extension

  2. They apply to all languages

  3. Multiple language descriptions can appear in one OWL rendering

  4. They are described in such a way that they can be implemented in any target language

  5. The semantics of the output is independent of the format


To the best of the authors' knowledge, this specification describes the intent of the Perl transformation, as it exists today. One of the goals of this specification is that it can serve as a baseline where subsequent changes and enhancements to the OWL representation of SNOMED CT can be clearly identified and where tooling developers can clearly understand the ramifications of the changes to both the output and the transformation tools themselves.

Differences Between this Specification and the Spackman Transform

The following list summarizes the intentional differences between this specification and the Spackman transformation:

  1. The Spackman transformation wraps all of the rdfs:subClassOf assertions in an owl:intersectionOf wrapper.  This specification does not.

  2. The Spackman transformation does not take the language refset into account when emitting text definitions.  This frequently results in the wrong definition(s) being generated.  This specification treats text definitions in the same way as descriptions with the exception that it does not distinguish between preferred and acceptable language refset entries.

  3. The Spackman transformation generates ObjectProperty definitions for all active descendants of 410662002 | Concept model attribute |, with the exception of 116680003|is a|.  This specification emits an additional ObjectProperty definition for 410662002 | Concept model attribute | itself and defines all descendants as its direct or indirect subproperty.

  4. The Spackman transformation uses the following predicates for descriptions and definitions:

    1. sctf:Description.term.{specific language}.preferred "{text}"@{general language} (e.g  sctf:Description.term.en-us.preferred "Due to"@en)

    2. sctf:Description.term.{specific language}.synonym "{text}"@{general language}

    3. sctf:Description.TextDefinition.term "{text}"@{general language}


This specifications uses the SKOS predicates instead:

Approach

SQL Notation

This specification describes how to transform the SNOMED CT RF2 distribution content into an OWL equivalent. As part of this process, we need a way to specify which RF2 files are used, which columns are transformed and how they are selected. We have chosen to use the SQL query syntax for this purpose, as its syntax and semantics is well understood and it makes it possible for us to test and verify the correctness of the specification. It should be noted that this specification does not require that SQL be used in an actual implementation. Any implementation that produces the same results as described by the SQL in this document would be considered "conformant".

Turtle Notation

A second part of the transformation process requires a way to specify the OWL statements that are generated and their relationship to the RF2 tables. We have chosen the Turtle RDF syntax to represent the results, where substitutions are represented with a "$" prefix and italics. As an example, the assertion that the variable named "subject" is declared to be an instance of an OWL Class would be asserted as:

                              sct:$subject rdf:type owl:Class .


This specification does not require that the output of an actual implementation be in the Turtle RDF format, but whatever output format is used, it must be semantically equivalent to the Turtle as specified in this document.
A short synopsis of the subset of the Turtle notation appears in Appendix B: RDF Turtle Notation

.

Notation

RF2 files are represented in bold. Examples: Concept, StatedRelationship


RF2 fields are represented in a monospace font. Examples: id, acceptabilityId


Context variables are represented in bold monospace. Examples: LANGUAGE_MAP, VERSION, RIGHT_IDS


Turtle output is represented in monospace, with substitution variables as monospace italics. Example:
sct:$subject rdfs:subClassOf sct:$destinationId .

SNOMED CT concepts are represented using the conceptReference production as defined in the SNOMED CT Compositional Grammar Specification v2.3.1.  For the sake of brevity, this specification uses the (US) English preferred name rather than the complete FSN.  Example:  74400008 | Appendicitis |


Lists are represented as comma-separated values inside square braces. Example: LANGUAGES:["en-us", "en-gb"]


Maps are represented as comma-separated entries within curly braces, with a colon between the key and value. Example:
LANGUAGE_MAP: {"en-us": 900000000000509007,
                             "en-gb": 900000000000508004}

Map lookup is indicated via "$MAP(key)". Example:
$LANGUAGE_MAP("en-gb") = 900000000000508004

SNOMED CT RF2 Files

The transformations in the document apply to the Snapshot representation of a SNOMED CT RF2 distribution. Transformations of the Full or Delta representations are not defined in this document.


The table below shows the Release Format 2 (RF2) distribution files and corresponding fields that are used in the RF2 to OWL transformation.  FIelds identified as "FILTER" are used to determine whether a given entry (row) is used to generate output, but are not represented in the output itself.

 

File

Field

Purpose

File

Field

Purpose

Concept

id

Used to generate the subject IRI

 

active

FILTER - only active concepts are included in an the OWL output

 

moduleId

FILTER - a separate ontology is generated for each module

 

definitionStatusId

determines whether the OWL definition uses owl:equivalentClass or rdfs:subClassOf

Description

active

FILTER - only active descriptions are included in the OWL output

 

moduleId

FILTER - a separate ontology is generated for each module

 

conceptId

Link to Concept.id

 

languageCode

Language facet of RDF literal string

 

typeId

Determines the specific predicate for description text (One of: 900000000000013009 | Synonym | or 900000000000003001 | Fully specified name |)

 

term

Literal text of label or description

TextDefinition

active

FILTER - only active definitions are included in the OWL output

 

moduleId

FILTER - a separate ontology is generated for each module

 

conceptId

Link to Concept.id

 

languageCode

Language facet of RDF literal string

 

typeId

Determines the specific predicate for definition text (900000000000550004 | Definition |)

 

term

Definition text

StatedRelationship

The Perl transformation states that it only works with the StatedRelationship file. There is nothing in the transformation rules themselves that prevents them from being applied to the Relationship file as well, but caution should be used as multiple modules and their inferences could get mixed in the latter file

active

FILTER - only active relationships are included

 

sourceId

Used to generate IRI of the subject

 

destinationId

Used to generate IRI of the object

 

relationshipGroup

Definition nesting

 

typeId

Used to generate IRI of the predicate

 

characteristicTypeId

FILTER - only descendants of 900000000000006009| Defining relationship| (i.e. stated, inferred) are included in the transformation output.

 

modifierId

FILTER - only rows with the existential modifier, 900000000000451002| Some |  is included in the transformation output.

Language

active

FILTER - only active language entries are included

 

moduleId

FILTER - only language entries for the target module are included are included in the output.

 

acceptabilityId

Used to determine whether a description is preferred or acceptable. (One of: 900000000000548007 | Preferred| or 900000000000549004 | Acceptable | )

 

referencedComponentId

FILTER - id of associated description row.

Transitive

concept

Concept identifier

 

ancestor

Concept identifier of parent or parent's parent, etc.

DescriptionAndDefinition

active

FILTER - only active definitions are included in the OWL output

 

moduleId

FILTER - a separate ontology is generated for each module

 

conceptId

Link to Concept.id

 

typeId

Determines the specific predicate for description text (One of: 900000000000013009 | Synonym | or 900000000000003001 | Fully specified name | or 900000000000550004 | Definition |)

 

languageCode

Language facet of RDF literal string

 

term

Description or TextDefinition text

LanguageNames

languageText

ISO 639-1 language name from Languages

 

refsetId

Corresponding SNOMED CT refsetId

 

Note that  moduleId  is not used as a filter on the Relationship file. While it is theoretically possible for more multiple modules to collectively define the meaning of a concept, the ramifications of a shifting meaning depending on which module is referenced is problematic. For this reason, all active qualifying relationship rows are used in the definition of a concept.

Transitive File

The Transitive file represents the transitive closure of the "isA" relationship in the StatedRelationship table. In SQL, this table could be created via the following steps:

 

CREATE TABLE Transitive AS
SELECT sourceId concept, destinationId ancestor
              FROM StatedRelationship
WHERE active=1 AND
              typeId = 116680003;

ALTER TABLE Transitive ADD UNIQUE k1 (concept, ancestor);



Followed by repeated execution of the statement below until no new rows are inserted:

INSERT INTO Transitive
SELECT DISTINCT t1.concept, t2.ancestor
              FROM Transitive t1, Transitive t2
WHERE t1.ancestor = t2.concept AND
             (SELECT count (*) FROM transitive
                             WHERE concept = t1.concept AND
                                           ancestor = t2.ancestor LIMIT 1) = 0;

DescriptionAndDefinition File

The DescriptionAndDefinition file represents the union of the Description and TextDefinition files.  These files are distributed separately in the RF2 distribution because the maximum size of a Description term entry is considerably smaller than that of a Definition but, with this exception are structurally identical.

CREATE TABLE DescriptionAndDefinition AS SELECT * FROM TextDefinition;

INSERT INTO DescriptionAndDefinition SELECT * FROM Description;

LanguageNames File

The LanguageNames file combines the information from the LANGUAGES and LANGUAGE_MAP metadata entries into a form that can be referenced within the SQL syntax used in this document.  It contains a row for each LANGUAGE_MAP entry that has an entry in the LANGUAGES table. In the following example we create a table that carries a single entry that corresponds to the example in the next section.

CREATE TABLE LanguageNames (languageText char(36) NOT NULL,

                                                         refsetId bigint NOT NULL,

                                                         PRIMARY KEY (refsetId) ;

INSERT INTO LanguageNames VALUES ('en-us', 900000000000509007) ;

OWL Transformation Context


The OWL transformation requires several contextual variables. Some of these (MODULE, VERSION, LANGUAGES) are inputs that must be provided by the user. Many of the others, however, should really be part of a distribution. Appendix B includes recommendations on where and how the variables below might be included in future RF2 distributions.

 

Identifer

Description

Type

Example

MODULE

The SNOMED CT concept identifier of the module being transformed.

Input

900000000000207008

MODULE_LABEL

The formal textual name of the module

Metadata

"SNOMED Clinical Terms, International Release, Stated Relationships in OWL RDF"

MODULE_DESCRIPTION

A textual description of module including its purpose and derivation

Metadata

"Generated as OWL RDF/XML from SNOMED CT release files by Perl transform script

Input concepts file was ..."

MODULE_COPYRIGHT

Copyright information for the module

Metadata

"Copyright 2015 The International Health Terminology Standards Development Organisation (IHTSDO).  ... "

VERSION

The version of the module being transformed.  Typical format yyyymmdd

Input

20160131

VERSION_DESCRIPTION

A textual description of the specific version

Metadata

"International Release, Core Module, Release Date: 20160131"

LANGUAGES

An list of ISO 639-1 language identifiers. Only descriptions and definitions in the specified language(s) will be emitted. 

Input

 

["en-us"]

LANGUAGE_MAP

A map from ISO 639-1 identifier(s) in LANGUAGES to the corresponding SNOMED CT Language Refset Identifier.

Metadata

{"en-us": 900000000000509007,

 "en-gb": 900000000000508004,

 "zh":  722128001,

 "es":  448879004}

NEVER_GROUPED_LIST

The set of concept identifiers that are guaranteed to never appear within a role group

Metadata

[123005000, 272741003, 127489000, 411116001]

RIGHT_IDS

A map from a set of concept identifiers to a list of their "right identifiers" or property chains

Metadata

{363701004 : 127489000}

SQL

The SQL language is used in this to specify the particular rows, columns and linkages between the various RF2 files. It is used as a convenient shorthand for selection and linkage criteria. Implementations may use any technology they choose to realize this specification as long as the results are consistent with what is described in the SQL below.

Transformation

Transformation Namespaces

The following namespaces are used in the transformation. The Namespace URI's are normative and must be used exactly as specified. The namespace names listed below are recommendations and, while not strictly necessary, are strongly recommended for readability.

Namespace
Name

Namespace URI

Description

rdf:

http://www.w3.org/1999/02/22-rdf-syntax-ns#

RDF built-in vocabulary

rdfs:

http://www.w3.org/2000/01/rdf-schema#

RDF Schema vocabulary

owl:

http://www.w3.org/2002/07/owl#

Web Ontology Language (OWL)

sctf:

http://snomed.info/field/

Description and Definition predicates

sct:

http://snomed.info/id/

SNOMED CT Concept identifiers

Copyright © 2026, SNOMED International