Working Group: Refined Metadata
Description
Addressing the various Metadata topics raised in the TRAG and MAG
Objectives
Define all use cases
Define all detailed requirements
Identify and agree solutions
Put up a straw man for discussion in the various AG's
Include any real world examples of solutions in action
Status | change this status |
|---|
Team
Andrew Atkinson
@Dion McMurtrie
@Matt Cordell
@michael lawley
NEW VOLUNTEERS (from TRAG meeting):
@Mikael Nyström (Unlicensed)
@Alejandro Lopez Osornio
@Reuben Daniels
Child Pages
Relevant Documents
REQUIREMENTS
Categorization of the Type of Metadata
1 - Technical package metadata
Use case - How to validate the release package for knowing how to use.
Full - which modules are included in the Full and what is the latest version
Snapshot - information needed to calculate the snapshot from the full
Need to know the Edition module and version (version URI)
Modules and version included in the snapshot (esp necessary if not in the MDRS)
Delta - defining the to/from
version edition URI (what edition the Delta is from and to)
additional modules outside of the MDRS dependencies (to/from URIs - also includes the versions to/from)
Extension
Version edition and URI
The needed language refset
Extension package (information on what is needed to be added to the extension to make it into an Edition)
2 - Component "gaps" metadata
The preferred language of the refset
Language/dialect code for language reference sets Simplemap patterns that don't specify the nature of the t
Correlation id in the conceptmap
Field names for each refset pattern
Foreign (non-SNOMED CT) CodeSystem URI for map type reference set source or target
3 - IP and Release Notes metadata
4 - Component (refset) metadata
Requirement requests from the Working group:
associating dialect alias with a language reference set
field names of each of the refset patterns
A lot of metadata about a release package could be encoded equally encoded as JSON, or using one (or more) refset file formats. It doesn't necessarily follow that you would have to load the WHOLE release first before you could interpret an RF2 encoding of the metadata, which might then tell you that you loaded the wrong thing. I would have thought you could cheerfully load and parse e.g. an srefset file in isolation. The advantage of a refset encoding by comparison with JSON is that it would support composition across multiple extensions, and snapshotting in ways that are both technically familiar and not currently supported so easily by JSON itself. And you would avoid the need to duplicate the information both as JSON to be read before you load the data and then again also as refsets to be read in the event you decided to load the data.
SNOMED CT canonical CodeSystem resource: SNOMED CT canonical CodeSystem resource
In case they have become lost since June 2020 when I first suggested them, the following could be useful extensions to the more human readable elements of release bundle metadata expressivity offered by the existing JSON beastie: documentationURL: link to wherever release documentation is posted licenseURL: alternative to existing licenseStatement element to be used when the required license extends beyond core SNOMED content to include the licenses for any number of allied products of which some significant part is embedded within the release, typically as crossmaps. helpdesk: email contact for further information and support updatesURL: URL of at least one canonical place where this release bundle and its future updates might be obtained, for the benefit of anybody who has no idea where the one in their hand actually came from
In case they have become lost since June 2020 when I first suggested them, the following could be useful extensions to the more human readable elements of release bundle metadata expressivity offered by the existing JSON beastie: documentationURL: link to wherever release documentation is posted licenseURL: alternative to existing licenseStatement element to be used when the required license extends beyond core SNOMED content to include the licenses for any number of allied products of which some significant part is embedded within the release, typically as crossmaps. helpdesk: email contact for further information and support updatesURL: URL of at least one canonical place where this release bundle and its future updates might be obtained, for the benefit of anybody who has no idea where the one in their hand actually came from
Requirement requests from Australia:
modules are held in a version
edition and version
missing metadata from FHIR ConceptMap and ValueSet for implicit reference sets - implicit ConceptMap target URI and relationship type where the reference set doesn't state it explicitly
modules within a snapshot
the defintion of the delta
https://www.healthterminologies.gov.au/access/snomed-ct-au/reference-sets-2/?ui:fhirVersion=R4
JSON file or no, we do need a way to have machine readable metadata for a release package indicating what it contains.
For the Full format, that is really just the set of contained modules and that can be cross checked against the content (but makes a good QA point)
For a Delta if it is present it is really what the Delta is relative to - the “from” version
For a Snapshot it is even more critical - what is the root point that Snapshot was calculated from? That comes down to at least an edition module ID and a version, but given the issues with dependency versus composition with the MDRS we also need to be able to express additional modules outside strict MDRS dependency that were “composed into” the Snapshot calculation
Requirement requests from the UK:
From Freshdesk ticket https://ihtsdo.freshdesk.com/a/tickets/32991 (Reply to Mark Wardle once we've made decisions):
Please could the canonical name of the release be included in the release metadata file? We will blend multiple releases together (ie International + UK clinical + UK dm+d) and a name would be useful in registering what is installed, the versions and licencing requirements. Otherwise, we have to derive from the filename of the downloaded distribution file.
On a separate note, the SI metadata is correctly formatted JSON but the current UK clinical and dm+d metadata is NOT correctly formatted. I have raised this with NHS digital but it might perhaps be useful to mandate that other organisations distributing SNOMED releases should format their metadata to an agreed standard.
Requirement requests from SNOMED International (internal):
name - e.g. Belgian Edition. This is already contentious - we list everything as an Edition although most are packaged as extensions..
countryCode - the ISO two letter country code table on wikipedia, e.g. "be" - this should probably be upper case to match the standard.
defaultLanguageCode - this can not be detected from content because some extensions have a lot of translated content but still want to use English as default. Again the two letter ISO language code.
defaultLanguageReferenceSets - list of SCTIDs. This controls which terms are displayed in the concept details and their order. The set of language reference sets could be found using the content but many extensions do not want the GB language refset from the International Edition. The desired order can probably not be found from the content.
maintainerType - I'm not sure if this information should be included in the package metadata or how it should be named. The values we have are "International", "Managed Service" or "Community Content". We use this to list extensions in different categories.
Requirement requests from the TRAG (through other topics):
FINAL DECISIONS:
c) We will NOT change the RF2 spec to move to transitive dependencies in the MDRS.
5.2.4.2 Module Dependency Reference Set - currently states
"Dependencies are not transitive and this means that dependencies cannot be inferred from a chain of dependencies. If module-A depends on module-B and module-B depends on module-C, the dependency of module-A on module-C must still be stated explicitly."
New planned changes to .JSON metadata file: Update to the .JSON file metadata - addition of "Package Composition" data
Examples of extending this metadata:
.json format 5 ?? (Please see Michael Lawley's comments on 16/04/2021 here: Re: Working Group: Refined Metadata)
Package Name? (Please see Michael Lawley's comments on 20/04/2021 here: Re: Working Group: Refined Metadata: Yes, regarding the "Name" entry, it would be ideal if it could be used to populate the "Product Name" field in a list of available packages (and other required and relevant fields for MLDS). Then the zip contents would be sufficient to automatically populate MLDS (or an ATOM-based Syndication feed))
Also create 2 new pages -
one to capture the requirements for all different use cases
one to discuss and agree on the proposed solutions....
We really need to tackle the Delta from and to release version in the Delta file naming, and possibly package file naming. At the moment it is impossible to know what a Delta is relative to making it hard to safely process it. Perhaps beyond the scope of this document, but quite important
THIS IS NOW ADDRESSED IN THE NEW Metadata file:
Standard July 2020 International Edition metadata file:
NON-Standard July 2020 International Edition Rollup Delta metadata file:
January 2021 International Edition metadata file:
March 2021 Belgium Extension metadata file:
Can we link this in to the .JSON file above? (Computer readable metadata) - yes, done!
Copyright © 2026, SNOMED International