ASCII and UTF-8 characters in RF2 files

ASCII and UTF-8 characters in RF2 files

There have been a few minor inconsistencies in the RF2 files for several years now - with special (hidden) characters being used interchangeably in different files within the RF2 packages.

We therefore need to discuss at the next meeting:

a)  This isn't necessarily an issue, as ASCII and UTF-8 should be fully compatible with each other.  That combined with the fact that no-one using the releases had experienced any issues in the past few years suggests that it’s not a critical failure of any kind.  Therefore, is there any requirement for this to be changed, beyond our desire to standardise everything wherever possible?

b)  If so, will there be any impact to any users? For example, the UKTC may need to update their import routines as they have based everything off ASCII, whereas the TIG states that the SNOMED standard is UTF-8.

 

Examples:

  • SNOMED International documentation states UTF-8 is the standard: Appendix C. Unicode UTF-8 encoding

  • This is certainly the case for any file that contains a string datatype.

  • All files have the correct terminators of CRLF.

  • The following 6 files are currently the only ones in the RF2 packages using UTF-8 instead of ASCII:

    sct2_TextDefinition_Delta-en_INT_20180731.txt: UTF-8 Unicode text, with very long lines, with CRLF line terminators

    der2_iisssccRefset_ExtendedMapFull_INT_20180731.txt: UTF-8 Unicode text, with very long lines, with CRLF line terminators

    sct2_TextDefinition_Full-en_INT_20180731.txt: UTF-8 Unicode text, with very long lines, with CRLF line terminators

    Readme_en_20180731.txt: UTF-8 Unicode text, with very long lines, with CRLF, LF line terminators

    der2_iisssccRefset_ExtendedMapSnapshot_INT_20180731.txt: UTF-8 Unicode text, with very long lines, with CRLF line terminators

    sct2_TextDefinition_Snapshot-en_INT_20180731.txt: UTF-8 Unicode text, with very long lines, with CRLF line terminators

     

  • There's also an inconsistency inbetween the mapping reference set Full/Delta/Snapshot files;

    der2_iisssccRefset_ExtendedMapDelta_INT_20180731.txt: ASCII text, with very long lines, with CRLF line terminators

    der2_iisssccRefset_ExtendedMapFull_INT_20180731.txt: UTF-8 Unicode text, with very long lines, with CRLF line terminators

    der2_iisssccRefset_ExtendedMapSnapshot_INT_20180731.txt: UTF-8 Unicode text, with very long lines, with CRLF line terminators

     

Copyright © 2025, SNOMED International