Concrete Domain Decimal Places and Rounding

Summary of Recommendations

Feature	Recommendation

Feature	Recommendation
Maximum Decimal Places	6
Maximum Magnitude	999,999,999
Mitigation of limits	Normalization of units can be used to keep numbers in a reasonable range.
Editorial Policy	Editorial Guidance may make further restrictions for specific domains.
Rounding Method	Round Half Up (as per Excel and Google Sheets)

Primary Consideration

Since we cannot know all the use cases that Concrete Domains will be put to, the safest approach we can take to numeric representation is to store and display whatever a user originally entered, as they entered it.

Certainly for pharmacovigilance, for example, manufacturers will write a number without specifying decimal places because a decimal point is easy to miss and result in a care worker reading 5.0mg as 50mg. So packaging will be written as 5mg. However, in engineering disciplines, if a number has been measured to a certainly level of specificity, then it's important to capture that; 3.000 means "I have measured this value to within a tolerance of +/- 0.0005". These two opposing requirements show the standard cannot make a decision about representation across the board.

Tooling should not change the value entered. It is for this reason that we cannot add a decimal point in order to differentiate between Integer and Decimal data types. In addition, where a number entered is going to hit a limit or experience rounding as described here, this should be checked and confirmed with the user prior to saving. A large part of the purpose of this page is to elicit requirements for tooling so that the user can be given clear feedback on what can and cannot be entered into a system, so as to avoid encountering floating point errors such as the one shown on the right, or unexpected changes in the actual values entered.

Secondary Considerations

While we would ideally consider these questions starting from use cases (rather than a particular implementation) any restriction that is made will need to be achievable with known technology. Restrictions within any known limits are going to be arbitrary, so there is an argument that aligning with an existing standard (eg IEEE-754 as per Java's Double type) will be well documented, have clear precedent and offer the maximum support for all possible implementations.

There is an argument for reducing storage and not specifying some ridiculous number of significant figures if there is no practical use for such extravagance. That said, we already store SCTIDs as 64 bit Integers so a 64 bit decimal - say - would be no more costly in either storage or computational cost even before considering the saving of not having to create a concept with its associated descriptions.

While we could suggest that implementers store all Concrete Values as Strings to be entirely faithful in their storage and reproduction, this would diminish the real power of concrete values which is being able to ask questions that do numeric comparison, like searching for medicinal products that contain less than 100 mg of aspirin. For this reasons it seems preferable to set specific limits on the size of numbers that can be stored, in order to allow true numeric types to be used for storage and processing.

Significant Figures

Normally when discussing precision we would talk about significant figures rather than decimal places. Since there is a byte limit on underlying storage mechanisms, if a large number is given to the left of the decimal point, then this will restrict the number of decimal places that can then accurately be represented on the right of it. For this reason, we cannot say we'll support 16DPs and hope to use (say, in a Java implementation) a Double datatype since 1234567890000.123456789 cannot be accurately recorded, even with only 9 decimal places.

Given a lack of clear use cases, the argument can be made to 'cut the pear in half' and choose a number of decimal places which also allows a decent sized (left of the decimal place) number to be stored. 6 DPs would allow - for example 234,567,890.123456 would be safe but 11,234,567,890.123456 would not. @Matt Cordell I'd love to hear from about about how Australia approached this question of big left + big right numbers.

In general, people find decimal places to be easier to understand than significant figures, so I suggest we specify that.

OR

Or we just specify 64 bit double precision and allow individual domains to decide if that should result in restrictions to either the magnitude of the number or the number of decimal places. I feel that that approach is likely to increase confusion around SNOMED in a way that specifying magnitude and decimal place limits would not.

Editorial Guidance

This is not to say that Editorial Guidance for a particular area of SNOMED must follow what is being discussed here. This page discusses the technical limits of precision for concrete values, which can be used by implementers when designing storage and data processing solutions. If the Medicinal Product Concept Model were to take the decision to work to a maximum of 4 or 5 decimal places then as long as the underlying system supports that, there's no reason to disallow further restrictions for a given logical domain. The primary consideration - of preserving whatever the user originally entered - would take priority.

Repeating Decimals

We have a precedent in the International Drug Model that repeating decimals (eg 1/3 as a decimal is 0.3333333...ad infinitum) be represented using 3 decimal places and rounded ( see Section 5.5 in Medicinal Product Concept Model) . We see this occur in normalized concentrations rather than manufacturer's specified strengths. For example:

The manufactured product here is actually 5mg per 24 hours. Multiplying 208.333 x 24 = 4,999.992 which was felt to be sufficiently close to 5000µg as to make no practical difference, since manufacturing methods do not allow for anything like this degree of precision . In electronic components, tolerances are specified along with values - 5% is common and a component manufactured to a 1% tolerance would be considered "high end". In the SNOMED International Drug Model the relatively low number of decimal places is mitigated by the decision to normalize units to force a number to be > 1 and < 1000 so for example 0.0001253 g would be normalized to 125.3 µg

But this question isn't about being "close enough for horse shoes", it's about identifying equivalences in classification which requires entirely identical (or subsumable) role groups. Therefore, for any given clinical area, the same representation must be used to allow correct subsumption. In the case of the Medicinal Product Concept Model, the Drug Project group are considering the use of additional axioms to allow subsumption to be achieved with multiple value representations of the same concentration.

Rounding "Mode"

Experience has shown that doing anything other than "what Excel would do" is going to cause questions and confusion, so I recommend ROUND_HALF_UP which is the approach both Excel and Google Sheets take:

Example of Floating Point problems even at 1DP

class fp {
  public static void main(String[] args) {
        double a = 0.3d - 0.1d;
        System.out.println(a);
  }
}

This program will output : 0.19999999999999998 and - perhaps worse for our use case - a will fail to evaluate as equivalent to 0.2d

However, rounding this number (which fails at the 17th DP) to anything shorter will give the expected result.

Previous relevant discussion: Re: SNOMED International Proposal for Representing Concrete Domains in RF2