The following draft specification documents have been created as a result of this discussion:

Design Discussion

Statement of Intent

A data format and application interface to support bidirectional exchange of Snomed content between terminology servers during an authoring cycle in order to support distributed authoring.

Data Format

Use cases (Representational Requirements or something like that...)

KC: The data must be able to represent that I added a component to a module

KC: The data must be able to represent that I retired a component in a module

KC: The data must support distributed identifier allocation using UUIDs

KC: The data must support alternative identifiers (SCT and others... LOINC, RxNorm, etc)

KC: The data must support "withdrawal" of unreleased and released content

KC: The data must support "dynamic" refsets, where another refset defines the structure (fields) of a refset.

KC: The data must support modularity, and movement of a component between one or more modules

KC/KK: The data must hold enough information to support version control of content

KC: The data must support validation/verification (i.e. secure hash of files using same standard used to sign jar/zip files for java as an example)

KC: The data must identify Status, Time, Author, Module, and Path for every change.

KC: The representational unit for the files is an identified component, specifically a stamped version of an identified component (not a having a concept and all its references as the representational unit)

KC: The data must represent versioned modular dependencies

KC: The data must be self-contained, other than modular dependencies. For example, the language code field should use standard methods of fully qualified, synonymy, and preferred naming, and not depend on resources external to the files themselves. (Out of scope)

KC: The data must be able to represent "Development" (current authoring cycle) content and released content.

KC: The data must be able to represent module precedence for specific purposes independent of module dependencies. (Out of scope)

KC: The data must be able to represent standardized version, language, and logic "coordinates" that support coordinate-based separation of concerns

KC: The data format must be available under non-viral open license (specifically not affiliate license encumbered, something like Apache 2) so that it can be easily built upon by other developers.

Data Format Candidates

Snomed Authoring Platform Concept JSON
Snomed RF2
A custom format

Concept JSON format

This format is already in use within SI tooling both for authoring and processing of content requests. This format is well suited for transfer to and from a web based thin client where a concept is a unit of change.

This format does not suit our requirements because we would like to use any set of components as a unit of change including reference set members which are not currently included in this format.

Snomed RF2 format

This format is in wide use. It captures individual component versions and can represent any type of Snomed content. RF2 is an extensible format via the reference set specification allowing any additional information to be captured. RF2 is designed for the distribution of released Snomed content.

We would have to relax the RF2 specification slightly to enable the transfer of unreleased content. Components exchanged during an authoring cycle from a remote system would not need an effective time set and may use a temporary identifier in a different format.

This format has the benefit that most terminology servers can already consume the official version of RF2.

A Custom Format

A custom format could be designed to support exactly what we need but we can already do with with RF2 using reference sets. The group agrees that existing standards should be used as far as possible to maximise reuse of existing tools and understanding and minimise design and implementation effort.

Information Required to Support Exchange

There are two types of content exchange:

Fetching content from the origin
Contributing content to the origin

These content packages are slightly different in nature and require slightly different metadata.

The format for this metadata could be a reference set within the package or a json manifest file?

Content Fetch Package Metadata

Name	Description

Name	Description
Snomed Version Identifier	Version URI
Development Path Identifier	A string representing the development branch or substate path.
Base Commit Identifier	The commit which this delta is relative to.
Latest Commit Identifier	The latest commit included within the scope of this content package.

The content fetch package would also contain an RF2-like delta of the changes between the base and latest commits in the selected substrate. The effectiveTime field should be blank for unpublished content within the package.

The use of base and latest commit identifiers could allow many upstream commits to be pulled in one package.

Content Contribution Package Metadata

Name	Description

Name	Description
Intended Terminology Server Identifier	Some identifier of the terminology server this package is intended for. Like an address on an envelope. Is this needed?
Snomed Version Identifier	Same as above.
Development Path Identifier	Same as above.
Latest Origin Commit Identifier	The commit from the latest pulled fetch package which this delta is relative to.
Author Name	Name of the author who is responsible for this authoring contribution package.
Author Email Address	Could be used to give feedback or status update?
Purpose of Content Change	Description of the purpose of the content changes within this unit of work.

The content contribution package would also contain an RF2-like delta of the changes on top of the latest origin commit against the selected substrate.

Application Interface

Requirements

Functional

The interface must:

Be terminology server agnostic
Cover all types of Snomed components and reference sets
Allow exchange of content between any two terminology servers which implement the interface
Allow content exchange in either direction between two terminology servers

Non Functional

The interface must:

Include a security mechanism to prevent unauthorised access

Out of Scope

The following will be excluded in the initial version:

Automatic Terminology Server service discovery

Communication Model

We talked about the authoring contribution content package being transport mechanism independent. I propose we use a JSON based REST interface for the rest of the interface. Any comments?

Roles

Communication takes place between two terminology servers the origin server and the remote server.
The origin server contains the latest content for that edition/extension. The remote server should be able to take the latest content, update the content locally and then contribute it back to the origin.
A terminology server could implement both parts of the interface allowing them to be either an origin or a remote server in this point to point exchange.

Example Exchange Timeline

An example of how the Content Fetch and Content Contribution packages could be used.

Workflow:

Author user Contributing Terminology Server chooses the Drugs branch on the Origin Terminology Server as the substrate they want to contribute to.
- Contributing TS makes the first fetch request without a relative starting point.
- Fetch 1 contains:
  - Snapshot of released content.
  - Delta of the current authoring cycle content.
  - Latest Commit identifier from Origin.
Author(s) make changes on the Contributing Terminology Server.
Author prepares to make contribution by fetching upstream updates.
- Contributing TS makes the second fetch request with a relative starting point of the commit identifier from the previous fetch.
- Fetch 2 contains:
  - Delta of the content changes on the Drugs branch since the last fetch.
  - New latest commit identifier from Origin.
- Contributing TS merges changes from fetch 2 into it's content branch.
- Contributing author resolves any content issues resulting from the latest content fetch.
Author chooses to make contribution to Origin.
- Contributing TS prepares package and sends to Origin TS.
- Contribution 1 contains:
  - Latest fetched origin commit identifier.
  - Delta of content changes between latest fetched content and changes on Contributing TS.
  - Also all other metadata as specified in "Content Contribution Package Metadata".
At this point processes on the Origin TS may perform content validation before notifying an origin author to review and merge the contribution into the Drugs branch.

Operations

Selecting a Substrate

The Origin server should list the available substrates, each with:

Name	Description	Example

Name	Description	Example
Snomed Version Identifier	Ontology URI following the URI Standard and including the version. URI	http://snomed.info/sct/900000000000207008/version/20130731
Development Path Identifier	(Optional) Identification of the content path / development stream? Any string	In SI tooling this might be a branch path like MAIN/CRSJAN19
Latest Commit Identifier	Identification of the latest commit in this substrate to be used when performing subsequent updates from Origin. Any string (Format to be chosen by implementation)	Examples could be: Commit hash: c205e7eaec28c3046f6c21348f4dcac58c805e2c Epoch Timestamp: 1527153313418 .. or something else.
Latest Commit Timestamp	Date stamp indicating when this substrate was last updated. Useful information if the commit identifier is not a timestamp. UTC ISO 8601 Timestamp	2018-05-24T12:35:47+00:00

Fetching the Latest Content

There are two possible phases here; fetching the released content and fetching the content for the current authoring cycle.

Do we want these as separate operations or do we want the function which provides the latest content to support delivering a snapshot of the released content in the first pull of the feed?

Should we make this interactive so a Remote server can request a custom delta from the Origin server using the last commit pulled?

Contributing Content

Once authoring changes have been made on the Remote terminology server the changes will be packaged up and made available to the Origin server.

Transport Mechanism

The suggested transport mechanism is a REST API using the JSON message format although some have asked that this specification remains transport mechanism independent.

I am keen to include as much as we can agree on in this spec in order to achieve maximum compatibility between terminology servers while not excluding any which are not online 24/7 as mentioned by Keith. Can we design this in? Could we require that the Origin and Remote servers are both online while updating the content in the Remote server? The delivery of the Content Contribution could happen via a number of agreed store and forward transport mechanisms?

Authentication & Authorisation

This is somewhat dependant on the transport mechanism but if we are able to agree on a REST API for the first part of the communication here are some authentication options:

Snomed International hosted single sign-on
OAuth
other?

Using the SI SSO would have an advantage over OAuth that we could assign an authorisation role to the ID of that terminology server / author.

Terminology Server Content Exchange Interface