Database missing/erroneous information

J Decker via Unicode unicode at
Wed Jul 12 08:35:02 CDT 2017

I started looking more deeply at the javascript specification.  Identifiers
are defined as starting with characters with ID_Start and continued with
ID_Continue attributes.
I grabbed the xml database (ucd.all.grouped.xml )  in which I was able to
find IDS, IDC flags ( also OIDS,OIDC, XIDS,XIDC of which meaning I'm not
entirely sure of)

but I started filtering out to find characters that are NOT IDS|IDC....

Something simple like numbers 0x30-0x39 are marked with IDS='N' but have no
[ OX]IDC flags specified.  Is a lack of flag assumed N or Y? documentation on the XML file format doesn't
specify.  I see 'ID_Continue characters include
ID_Start characters, plus characters '

most languages do support identifiers like a1, a2, etc as valid
identifiers, so certainly numbers should have IDC even though they're not
Are there characters that are IDS without being IDC?  There are certainly
characters that are IDC without IDS.

some examples.....

found  char { cp: '0034',  na: 'DIGIT FOUR',  gc: 'Nd',  nt: 'De',  nv:
'4',  bc: 'EN',  lb: 'NU',  sc: 'Zyyy',  scx: 'Zyyy',  Alpha: 'N',  Hex:
'Y',  AHex: 'Y',  IDS: 'N',  XIDS: 'N',  WB: 'NU',  SB: 'NU',  Cased: 'N',
 CWCM: 'N',  InSC: 'Number' }

(this has IDC notation but not IDS; since it says 'digit' I assume this is
a number type, and should not be IDS.)
found  char { cp: '0F32',  na: 'TIBETAN DIGIT HALF NINE',  gc: 'No',  nt:
'Nu',  nv: '17/2',  Alpha: 'N',  IDC: 'N',  XIDC: 'N',  SB: 'XX',  InSC:
'Number' }

This might be not IDS but is IDC?
found  char { cp: '203F',
  na: 'UNDERTIE',
  gc: 'Pc',
  IDC: 'Y',
  XIDC: 'Y',
  Pat_Syn: 'N',
  WB: 'EX' }

this is sort of IDS but not IDC?
found  char { cp: '309B',  na: 'KATAKANA-HIRAGANA VOICED SOUND MARK',  gc:
'Sk',  dt: 'com',  dm: '0020 3099',  bc: 'ON',  lb: 'NS',  sc: 'Zyyy',
 scx: 'Hira Kana',  Alpha: 'N',  Dia: 'Y',  OIDS: 'Y',  XIDS: 'N',  XIDC:
'N',  WB: 'KA',  SB: 'XX',  NFKC_QC: 'N',  NFKD_QC: 'N',  XO_NFKC: 'Y',
 XO_NFKD: 'Y',  CI: 'Y',  CWKCF: 'Y',  NFKC_CF: '0020 3099',  vo: 'Tu' }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list