2020-05-20 - SLPG Meeting

Date & Time

20:00 to 21:00 UTC Wednesday 20th March 2020

Location

Zoom meeting: https://snomed.zoom.us/j/471420169

Goals

To discuss syntax and advice for collation and folding
To develop examples to illustrate new term searching functionality

Attendees

Chair: @Former user (Deleted)
Project Group: @Daniel Karlsson, @Ed Cheetham, @Peter Jordan (Unlicensed), @Anne Randorff Højen, @michael lawley, @Guillermo Reynoso, @Rob Hausam

Apologies

Agenda and Meeting Notes

Description	Owner	Notes

Description	Owner	Notes
Welcome and agenda	@Former user (Deleted)
Concrete values	@Former user (Deleted)	ON HOLD: SCG, ECL, STS, ETL - Ready for publication, but on hold until after MAG meeting in April confirming requirement for Boolean datatype.
Expression Constraint Language	@Former user (Deleted)	WIP ECL Specification QUESTION FROM SNOMED ON FHIR - Can/should we register ECL as a MIME type? QUESTION FROM INTERNAL - Should the ^ operator return all concepts which are active members of the reference set, regardless of the active state of the concept? ADDED TO DRAFT SYNTAX - Child or self (<<!) and Parent or self (>>!) New examples to be added TERM SEARCH FILTERS - Syntax currently being drafted Examples < 404684003 \|Clinical finding (finding)\| {{ term = "heart att"}} < 404684003 \|Clinical finding (finding)\| {{ term != "heart att"}} – A concept for which there exists a description that does not match – E.g. Find all the descendants of \|Fracture\| that have a description that doesn't contain the word \|Fracture\| < 404684003 \|Clinical finding (finding)\| MINUS * {{ term = "heart att"}} – A concept which does not have any descriptions matching the term < 404684003 \|Clinical finding (finding)\| {{ term = match: "heart att" }} – match is word (separated by white space) prefix any order; Words in substrate are ....; Search term delimiters are any mws < 404684003 \|Clinical finding (finding)\| {{ term = wild: "heart* ack" }} < 404684003 \|Clinical finding (finding)\| {{ term = ("heart" "att") }} < 404684003 \|Clinical finding (finding)\| {{ term != ("heart" "att") }} – matches concepts with a description that doesn't match "heart" or "att" < 404684003 \|Clinical finding (finding)\| {{ TERM = (MATCH:"heart" WILD:"ack") }} < 404684003 \|Clinical finding (finding)\| {{ term = "myo", term = wild:"ack" }} — Exists one term that matches both "myo" and "ack" < 404684003 \|Clinical finding (finding)\| {{ term = "myo" }} {{ term = wild:"ack" }} -– Exists one term that matches "myo", and exists a term that matches "ack" (filters may match on either same term, or different terms) < 404684003 \|Clinical finding (finding)\| {{ term = "hjärta", language = se }} < 404684003 \|Clinical finding (finding)\| {{ term = "hjärta", language = SE, typeId = 900000000000013009 \|synonym\| }} < 404684003 \|Clinical finding (finding)\| {{ term = "hjärta", language = SE, typeId = (900000000000013009 \|synonym\| 900000000000003001 \|fully specified name\|)}} < 404684003 \|Clinical finding (finding)\| {{ term = "hjärta", language = SE, typeId != 900000000000550004 \|Definition\|}} < 404684003 \|Clinical finding (finding)\| {{ term = "hjärta", language = SE, type = syn }} < 404684003 \|Clinical finding (finding)\| {{ term = "hjärta", language = SE, type != def }} < 404684003 \|Clinical finding (finding)\| {{ term = "hjärta", language = SE, type = (syn fsn) }} < 404684003 \|Clinical finding (finding)\| {{ term = "hjärta", language = SE, type != (syn fsn) }} < 404684003 \|Clinical finding (finding)\| {{ term = "cardio", dialectId = 900000000000508004 \|GB English\| }} < 404684003 \|Clinical finding (finding)\| {{ term = "card", dialectId = ( 999001261000000100 \|National Health Service realm language reference set (clinical part)\| 999000691000001104 \|National Health Service realm language reference set (pharmacy part)\| ) }} < 404684003 \|Clinical finding (finding)\| {{ term = "card", dialect = en-gb }} < 404684003 \|Clinical finding (finding)\| {{ dialect != en-gb }} < 404684003 \|Clinical finding (finding)\| {{ term = "card", dialect = ( en-nhs-clinical en-nhs-pharmacy ) }} < 404684003 \|Clinical finding (finding)\| {{ term = "card", dialect = en-nhs-clinical (900000000000548007 \|Preferred\|) }} < 404684003 \|Clinical finding (finding)\| {{ term = "card", dialect = en-nhs-clinical (prefer) }} < 404684003 \|Clinical finding (finding)\| {{ term = "card", dialect = en-nhs-clinical (accept) }} < 404684003 \|Clinical finding (finding)\| {{ term = "card", dialect = en-nhs-clinical (prefer accept), dialect = en-gb (prefer) }} < 404684003 \|Clinical finding\| MINUS * {{ dialect = en-nhs-clinical}} < 73211009 \|diabetes\| MINUS * {{ dialect = en-nz-patient }} < 73211009 \|diabetes\| MINUS < 73211009 \|diabetes\| {{ dialect = en-nz-patient }} < 73211009 \|diabetes\| {{ term = "type" }} MINUS < 73211009 \|diabetes\| {{ dialect = en-nz-patient }} (< 404684003 \|Clinical finding\|:363698007\|Finding site\| = 80891009 \|Heart structure\|) {{ term = "card" }} MINUS < (404684003 \|Clinical finding\|:363698007\|Finding site\| = 80891009 \|Heart structure\|) {{ dialect = en-nz-patient }} < 73211009 \|Diabetes\| {{ term = "type" }} OR < 49601007 \|Disorder of cardiovascular system (disorder)\| {{ dialect = en-nz-patient }} Previous Decisions Wild Term Filter - Everything inside the quotation marks is the search term (including leading and trailing spaces - Note: Match term is tokenized, but wild search is not Acceptability will be an option directly attached to a dialect filter - for example: * {{ term = "card", dialect = en-nhs-clinical (accept prefer), dialect = en-gb (prefer) }} * {{ term = "card", dialect = en-nhs-clinical, dialect != en-nhs-clinical (accept), dialect = en-gb (900000000000548007 \|Preferred\| ) }} Questions for Discussion Case/accent folding + uni-code collation - What advice should we be giving in the specification? Daniel - "PRO" folding (see Unicode reference that database providers refer to in their search engines) Folding should happen before matching UCA - Unicode Collation Algorithm CLDR - Common Locale Data Repository http://cldr.unicode.org å → a Index using the Swedish/English index engine Refer to Ed's questions and references - 2020-02-26 - SLPG Meeting In particular - https://www.w3.org/TR/charmod-norm/#performNorm Question - * {{ term = match (noFold):"" }} Tokenizing the substrate - What advice should we be giving in the specification? Ed's Feedback: Links discussed http://www.unicode.org/reports/tr10/#Searching → 11.2 Asymmetric Search https://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options - Collation Setting https://docs.mongodb.com/manual/reference/collation/#collation-document-fields Following up on our homework: UCA/CLDR/Case/accent folding + Unicode collation - What advice should we be giving in the specification? I have personally found trying to answer this torture! Ideally we want to be try and get predictable (per locale) search behaviour. This could then be neatly summed up in a sentence in the guidance something like this: “The search specification assumes that descriptions are indexed for search using the default UCA, or UCA tailored for a specific language or locale according to CLDR. The selected locale can be specified using the ‘language=[ISO 639-1 code]’ filter. Descriptions indexed this way are compared with unmodified search tokens.” However, it looks as though ‘default UCA’ doesn’t ignore case (but bafflingly how case is handled is predominantly specified using a parameter called ‘strength’!). The UCA specification states that “…Language-sensitive searching and matching are closely related to collation…”, but this also indicates that they are not the same. The required collation strength for case insensitive searching is ‘secondary’, whilst the default for collation is ‘tertiary’. This may be explained here and/or here , and is probably buried somewhere deep in here, but to me is actually most clearly described by the kind people who maintain the mongoDB documentation. If we therefore need to add something about case insensitivity to the assumption statement above (and possibly even make case sensitivity configurable in our filters), could we just say ‘“The search specification assumes that descriptions are indexed for search using case insensitive default UCA…”? From a practical point of view this is tempting (commercial product configurations seem to use the “_CI” notation when setting collation (e.g. “>>mysqld --character-set-server=utf8 --collation-server=utf8_unicode_ci"). However if we are going to reference UCA then it’s worth noting that the Unicode materials don't seem to use the phrase ‘case insensitive’. Instead they talk in terms of secondary or tertiary ‘strength’ (as does the configuration page of mongoDB). On balance I suspect that if we make case sensitivity configurable then we should name the filter ‘case=’ with values of ‘case sensitive’ and ‘case insensitive’ (implicit default). The alternative is to name the filter ‘strength’ with values of ‘secondary’ and ‘tertiary’ and so on. Whilst the latter looks more principled I suspect it’s just confusing. I’ll stop there, but will just add for info that the W3C reference we looked at last time was coming at this from a different direction. Their concern relates to string matching as it applies to the syntactic content of web pages etc. Consequently their recommendation is for a normalization step that changes nothing - to avoid changes in element names/markup. Other content (what that paper calls natural language content) may well benefit from extensive normalisation - closer to case insensitive UCA transformation.
Querying Refset Attributes	@Former user (Deleted)	Proposed syntax to support querying and return of alternative refset attributes (To be included in the SNOMED Query Language)
Returning Attributes	@michael lawley	Proposal (by Michael) for discussion Currently ECL expressions can match (return) concepts that are either the source or the target of a relationship triple (target is accessed via the 'reverse' notation or 'dot notation', but not the relationship type (ie attribute name) itself. For example, I can write: << 404684003\|Clinical finding\| : 363698007\|Finding site\| = <<66019005\|Limb structure\| << 404684003\|Clinical finding\| . 363698007\|Finding site\| But I can't get all the attribute names that are used by << 404684003\|Clinical finding\|
Reverse Member Of	@michael lawley	Proposal for discussion What refsets is a given concept (e.g. 421235005 \|Structure of femur\|) a member of? Possible new notation for this: ^ . 421235005 \|Structure of femur\| ? X ? 421235005 \|Structure of femur\| = ^ X
Expression Templates	@Peter Williams	Examples: [[+id]]: [[1..] @my_group sameValue(morphology)] { \|Finding site\| = [[ +id (<<123037004 \|Body structure (body structure)\| MINUS << $site[! SELF ] ) @site ]] , \|Associated morphology\| = [[ +id @my_morphology ]]} Implementation feedback on draft updates to Expression Template Language syntax Use cases from the Quality Improvement Project: Multiple instances of the same role group, with some attributes the same and others different. Eg same morphology, potentially different finding sites. Note that QI Project is coming from a radically different use case. Instead of filling* template slots, we're looking at existing content and asking "exactly how does this concept fail to comply to this template?" For discussion: Is it correct to say either one of the cardinality blocks is redundant? What are the implications of 1..1 on either side? This is less obvious for the self grouped case. Road Forward for SI Generate the parser from the ABNF and implement in the Template Service User Interface to a) allow users to specify template at runtime b) tabular (auto-completion) lookup → STL Template Service to allow multiple templates to be specified for alignment check (aligns to none-off) Output must clearly indicate exactly what feature of concept caused misalignment, and what condition was not met. Additional note: QI project is no longer working in subhierarchies. Every 'set' of concepts is selected via ECL. In fact most reports should now move to this way of working since a subhierarchy is the trivial case. For a given template, we additionally specify the "domain" to which it should be applied via ECL. This is much more specific than using the focus concept which is usually the PPP eg Disease. FYI @Michael Chu
Description Templates	@Kai Kewley
Query Language - Summary from previous meetings	@Former user (Deleted)	FUTURE WORK Examples: version and dialect Notes
Confirm next meeting date/time	@Former user (Deleted)	Next meeting is scheduled for Wednesday 22nd April 2020 at 20:00 UTC.

	File	Modified

No files shared here yet.

2020-05-20 - SLPG Meeting

Date & Time

Location

Goals

Attendees

Apologies

Agenda and Meeting Notes

Road Forward for SI

Comments