Potential New Approach to Refset Descriptors

Potential New Approach to Refset Descriptors

Note: This page was copied from the internal Design Authority space. However it is now posted and maintained here in the Terminology Release Advisory Group space  to ensure that it is open to wider review and input.

Current Approach

Specification

The Refset Descriptor is specified as shown in the following table with one row per attribute. Including a first row with attributeOrder=0 which defines the referencedComponentId type.

id

effectiveTime

active

moduleId

UUID

Time

Boolean

SCTID

General versioning and identification.

refsetId

SCTID

Refers to the Refset Descriptor Refset

referencedComponentId

SCTID

Identifies the reference set (or type of reference set ) that is specified by this descriptor.

Set to a descendant of 900000000000455006 |reference set (foundation metadata concept)| in the metadata hierarchy .

attributeDescription

SCTID

Specifies the name of an attribute that is used in the reference set to which this descriptor applies.

Set to a descendant of 900000000000457003 |Reference set attribute (foundation metadata concept)| in the metadata hierarchy , that describes the additional attribute extending the reference set .

attributeType

SCTID

Specifies the data type of this attribute in the reference set to which this descriptor applies.

Set to a descendant of 900000000000459000 |attribute type (foundation metadata concept)| in the metadata hierarchy , that describes the type of the additional attribute extending the reference set .

attributeOrder

Integer

Specifies the position of this attribute in the reference set to which this descriptor applies. A zero value identifies the referencedComponentId  within the reference set. Other values specify additional attributes by position relative to the referencedComponentId . Within a particular descriptor, attributeOrder  values for a particular referencedComponentId  must be contiguous.

An unsigned Integer , providing an ordering for the additional attributes extending the reference set .

Commentary

In practical terms each row provides the following information about each attribute:

  1. Position

    1. The column position of the attribute (attributeOrder). 

    2. To be precise this represents the column offset to the right of referencedComponentId in the specified reference set type.

  2. Data Type

    • The data type of the attribute (attributeType) by reference to a concept in the SNOMED CT metadata hierarchy. In practice, all data types are represented as strings in the release file. However, the data types are subdivided into:

      1. Component references (e.g. SCTIDs referring to a component) with specific subtypes representing specific component types (concept, description, relationship).

      2. Integers with specific subtypes representing signed and unsigned integers.

      3. Strings with a variety of subtypes

  3. Description

    1. A human readable description of the attribute (attributeDescription) represented as SNOMED CT metadata concept. This allows translation of the description of the attribute.

    2. A convention was adopted in which subtypes of the concept referenced by the description concept were considered to represent the valid values of the attribute. However, this convention has several issues noted below under the unmet requirements section. 

Unmet Requirements

No Representation of: Information About the Refset as a Whole

Apart from the name of the refset (identified by refsetId) and the type of the refset (indicated by the supertypes of the identified concept), all information is recorded for individual attributes. Therefore there is no formally defined facility to include information about the reference set as a whole.

Examples:

  • The fact that a language reference set relates to a particular language, dialect or context of use cannot be represented in a formal structured way.

  • The fact that a map reference set related to a particular code system and mapping use case cannot be represented in a formal structured way.

Meeting the requirements:

  • This requirement, could be met using a string based syntax that allowed additional metadata to be specified.

  • The only way the existing refset descriptor refset could represent this would be by specifying an additional text column and also assigning an attributeOrder value to refer to the reference set as a whole.

No Representation of: Relationships or Dependencies between Refsets

Although in theory an association reference set can be used to associate or group reference sets. However, there is no specified consistent way to represent relationships and dependencies between reference sets that may need to be understood in order to use those sets.

Examples:

  • There is relationship between the primary care reference sets and ICPC map reference sets. However, there is no formal way to express this.

  • Similar cases where different members of a set of related to reference sets apply to different use case contexts cannot currently be formally represented in a standard way.

Meeting the requirements:

  • This requirement, could be met using a string based syntax or using a relational representation similar to that used for module dependencies. 

No Representation of: Attribute Uniqueness or Multi-Attribute Uniqueness

In some reference set types the referenceComponentId must logically be unique (e.g. simple reference set) in others the referencedComponentId is not unique (e.g. extended map reference sets).

  • From the perspective of someone validating or using a reference set it can be useful to know this.

  • However, this information is not provided in the refset descriptor

In some reference set types (or in some usages of some reference set types) combinations of attributes may logically be required to be unique.

  • Knowing this can be useful when validating and using a reference set - in particular this knowledge can be used to determine useful indexes and keys.

  • However, this information is not provided in the refset descriptor

Meeting the requirements:

  • Uniqueness and key values could be expressed fairly easily using a string based syntax as used in defining an SQL table schema or using a similar definition language.

Limited Representation of: Range of Value Permitted in an Attribute

Although the data types for attributes are specified in the refset descriptor, there is no formal way to represent the range of possible values within a given attribute. In some cases where the datatype is a concept reference, the attributeDescription refers to a concept which has subtypes and by convention those subtypes are deemed to be the potential values of that attribute. This approach is contrary to good terminology practice (e.g. the values of an attribute are NOT subtypes of the name of the attribute). It is also limited since it can only be used for concept references.

In the case of text strings there is no formal representation of string length (and the same is true in terms of string formats - e.g. markup and character set restrictions).

 

Even in the case of concept references, there are cases which cannot be supported by this approach. These include:

  • Relevant values are in the different hierarchies or sub-hierarchies with no common ancestor

  • Only some values in a hierarchy are relevant as values for the attribute

Meeting the requirements:

  • To meet the requirements for constraining concept values the obvious SNOMED CT standard approach would be to use the Expression Constraint Language

  • Strings could be restricted effectively using regular expressions

  • Strings could also be restricted in terms of length, format and character set.

  • Integers or other numeric values could be restricted by ranges

Summary

Key Points

  • The current refset descriptor refset is unable to meet address the limitations identified in the previous section.

  • Major changes to this refset would be disruptive to any implementer currently using the limited capabilities of the current descriptor

Suggested Way Forward

  • Retain the current refset descriptor to minimize disruption but do not attempt to use it to address the limitations

  • Create a new type of refset specification reference set designed to meet the full set of requirements

  • Once the new type of reference set is established plan the withdrawal of existing reference set descriptor refset.

Outline Proposal for New Refset Specification Refset

Option 1

  1. Use a Refset with a single row per Refset or Refset Type

    1. referenceComponentId would refer to the Refset

    2. Specify one or more text attributes to contain representations of the specifications covering the items noted above.

  2. Use a String Based Syntax for the Specification

    1. The obvious candidate on which to base this would be the JSON syntax (XML would be another option)

  3. Within the JSON objects (or XML elements)

    1. Represent the metadata about the refset as a whole and its relationships with other refsets

    2. Include nested objects representing each attribute 

      1. Include name and datatype (potentially as expressions representing the relevant metadata concepts)

      2. Include constraints for each attribute represented in a syntax appropriate to the dada type (e.g. ECL for concept references)

      3. No need to specify order as this is implicit in the JSON object order.

    3. Also include nested objects to represent keys and uniqueness of attributes or attribute combinations

Option 2

  1. Use a Refset with a row per Refset or Refset Type PLUS a row per attribute

    1. referenceComponentId would refer to the Refset

    2. an order field would be needed with a value (e.g. 0 or perhaps -1) indicating the refset as a whole and other values representing attributes in the refset.

    3. Specify one or more text attributes to contain representations of the specifications covering the items noted above.

    4. Note that in this approach information about keys and uniqueness would need to be recording in the row for the overall refset information (or in additional rows specifically for this purpose).

  2. Use a String Based Syntax as for Option 1 above but subdivide the syntax between the rows

    1. Represent the metadata about the refset as a whole and its relationships with other refsets and attribute keys/uniqueness in the general row for the overall refset (or rows specified by other predefined order values)

    2. Include specification for each attribute in a row allocated to that attribute

      1. Include name and datatype (potentially as expressions representing the relevant metadata concepts)

      2. Include constraints for each attribute represented in a syntax appropriate to the dada type (e.g. ECL for concept references)

Recommendation

Option 1 above appears the best approach as it places all the data about the refset in a single JSON object (with nested sub-objects).  As a result this can support a very flexible approach to Refset specification,

Copyright © 2025, SNOMED International