Potential New Approach to Refset Descriptors
Note: This page was copied from the internal Design Authority space. However it is now posted and maintained here in the Terminology Release Advisory Group space to ensure that it is open to wider review and input.
Current Approach
Specification
The Refset Descriptor is specified as shown in the following table with one row per attribute. Including a first row with attributeOrder=0 which defines the referencedComponentId type.
UUID Time Boolean SCTID | General versioning and identification. | |
refsetId | SCTID | Refers to the Refset Descriptor Refset |
Identifies the reference set (or type of reference set ) that is specified by this descriptor. Set to a descendant of 900000000000455006 |reference set (foundation metadata concept)| in the metadata hierarchy . | ||
Specifies the name of an attribute that is used in the reference set to which this descriptor applies. Set to a descendant of 900000000000457003 |Reference set attribute (foundation metadata concept)| in the metadata hierarchy , that describes the additional attribute extending the reference set . | ||
Specifies the data type of this attribute in the reference set to which this descriptor applies. Set to a descendant of 900000000000459000 |attribute type (foundation metadata concept)| in the metadata hierarchy , that describes the type of the additional attribute extending the reference set . | ||
Specifies the position of this attribute in the reference set to which this descriptor applies. A zero value identifies the referencedComponentId within the reference set. Other values specify additional attributes by position relative to the referencedComponentId . Within a particular descriptor, attributeOrder values for a particular referencedComponentId must be contiguous. An unsigned Integer , providing an ordering for the additional attributes extending the reference set . |
Commentary
In practical terms each row provides the following information about each attribute:
Position
The column position of the attribute (attributeOrder).
To be precise this represents the column offset to the right of referencedComponentId in the specified reference set type.
Data Type
The data type of the attribute (attributeType) by reference to a concept in the SNOMED CT metadata hierarchy. In practice, all data types are represented as strings in the release file. However, the data types are subdivided into:
Component references (e.g. SCTIDs referring to a component) with specific subtypes representing specific component types (concept, description, relationship).
Integers with specific subtypes representing signed and unsigned integers.
Strings with a variety of subtypes
Description
A human readable description of the attribute (attributeDescription) represented as SNOMED CT metadata concept. This allows translation of the description of the attribute.
A convention was adopted in which subtypes of the concept referenced by the description concept were considered to represent the valid values of the attribute. However, this convention has several issues noted below under the unmet requirements section.
Unmet Requirements
No Representation of: Information About the Refset as a Whole
Apart from the name of the refset (identified by refsetId) and the type of the refset (indicated by the supertypes of the identified concept), all information is recorded for individual attributes. Therefore there is no formally defined facility to include information about the reference set as a whole.
Examples:
The fact that a language reference set relates to a particular language, dialect or context of use cannot be represented in a formal structured way.
The fact that a map reference set related to a particular code system and mapping use case cannot be represented in a formal structured way.
Meeting the requirements:
This requirement, could be met using a string based syntax that allowed additional metadata to be specified.
The only way the existing refset descriptor refset could represent this would be by specifying an additional text column and also assigning an attributeOrder value to refer to the reference set as a whole.
No Representation of: Relationships or Dependencies between Refsets
Although in theory an association reference set can be used to associate or group reference sets. However, there is no specified consistent way to represent relationships and dependencies between reference sets that may need to be understood in order to use those sets.
Examples:
There is relationship between the primary care reference sets and ICPC map reference sets. However, there is no formal way to express this.
Similar cases where different members of a set of related to reference sets apply to different use case contexts cannot currently be formally represented in a standard way.
Meeting the requirements:
This requirement, could be met using a string based syntax or using a relational representation similar to that used for module dependencies.
No Representation of: Attribute Uniqueness or Multi-Attribute Uniqueness
In some reference set types the referenceComponentId must logically be unique (e.g. simple reference set) in others the referencedComponentId is not unique (e.g. extended map reference sets).
From the perspective of someone validating or using a reference set it can be useful to know this.
However, this information is not provided in the refset descriptor
In some reference set types (or in some usages of some reference set types) combinations of attributes may logically be required to be unique.
Knowing this can be useful when validating and using a reference set - in particular this knowledge can be used to determine useful indexes and keys.
However, this information is not provided in the refset descriptor
Meeting the requirements:
Uniqueness and key values could be expressed fairly easily using a string based syntax as used in defining an SQL table schema or using a similar definition language.
Limited Representation of: Range of Value Permitted in an Attribute
Although the data types for attributes are specified in the refset descriptor, there is no formal way to represent the range of possible values within a given attribute. In some cases where the datatype is a concept reference, the attributeDescription refers to a concept which has subtypes and by convention those subtypes are deemed to be the potential values of that attribute. This approach is contrary to good terminology practice (e.g. the values of an attribute are NOT subtypes of the name of the attribute). It is also limited since it can only be used for concept references.
In the case of text strings there is no formal representation of string length (and the same is true in terms of string formats - e.g. markup and character set restrictions).
Even in the case of concept references, there are cases which cannot be supported by this approach. These include:
Relevant values are in the different hierarchies or sub-hierarchies with no common ancestor
Only some values in a hierarchy are relevant as values for the attribute
Meeting the requirements:
To meet the requirements for constraining concept values the obvious SNOMED CT standard approach would be to use the Expression Constraint Language
Strings could be restricted effectively using regular expressions
Strings could also be restricted in terms of length, format and character set.
Integers or other numeric values could be restricted by ranges
Summary
Key Points
The current refset descriptor refset is unable to meet address the limitations identified in the previous section.
Major changes to this refset would be disruptive to any implementer currently using the limited capabilities of the current descriptor
Suggested Way Forward
Retain the current refset descriptor to minimize disruption but do not attempt to use it to address the limitations
Create a new type of refset specification reference set designed to meet the full set of requirements
Once the new type of reference set is established plan the withdrawal of existing reference set descriptor refset.
Outline Proposal for New Refset Specification Refset
Option 1
Use a Refset with a single row per Refset or Refset Type
referenceComponentId would refer to the Refset
Specify one or more text attributes to contain representations of the specifications covering the items noted above.
Use a String Based Syntax for the Specification
The obvious candidate on which to base this would be the JSON syntax (XML would be another option)
Within the JSON objects (or XML elements)
Represent the metadata about the refset as a whole and its relationships with other refsets
Include nested objects representing each attribute
Include name and datatype (potentially as expressions representing the relevant metadata concepts)
Include constraints for each attribute represented in a syntax appropriate to the dada type (e.g. ECL for concept references)
No need to specify order as this is implicit in the JSON object order.
Also include nested objects to represent keys and uniqueness of attributes or attribute combinations
Option 2
Use a Refset with a row per Refset or Refset Type PLUS a row per attribute
referenceComponentId would refer to the Refset
an order field would be needed with a value (e.g. 0 or perhaps -1) indicating the refset as a whole and other values representing attributes in the refset.
Specify one or more text attributes to contain representations of the specifications covering the items noted above.
Note that in this approach information about keys and uniqueness would need to be recording in the row for the overall refset information (or in additional rows specifically for this purpose).
Use a String Based Syntax as for Option 1 above but subdivide the syntax between the rows
Represent the metadata about the refset as a whole and its relationships with other refsets and attribute keys/uniqueness in the general row for the overall refset (or rows specified by other predefined order values)
Include specification for each attribute in a row allocated to that attribute
Include name and datatype (potentially as expressions representing the relevant metadata concepts)
Include constraints for each attribute represented in a syntax appropriate to the dada type (e.g. ECL for concept references)
Recommendation
Option 1 above appears the best approach as it places all the data about the refset in a single JSON object (with nested sub-objects). As a result this can support a very flexible approach to Refset specification,
Copyright © 2025, SNOMED International