<div dir="ltr">Hello<div><br><div>In UAX #44, White_space is described as "Spaces, separator characters and other control characters which should be treated by programming languages as "white space" for the purpose of parsing elements."</div></div><div><br></div><div>From what I can tell, ECMAScript/JS uses White_space (or rather Space_Separator which is slightly different), Rust uses Pattern_White_Space which is a more restricted set, while most other languages seem to only support the ASCII spaces. </div><div><br></div><div>I wanted to confirm that the intent is that White_Space is recommended in programming languages.</div><div>I assumed that Pattern_White_Space would be more suitable for that purpose,</div><div>but it isn't actually clear from a reading of UAX31</div><div><br></div><div>Which first states in it's introduction</div><div>> A common task facing an implementer of the Unicode Standard is the provision of a parsing and/or lexing engine for identifiers, such as programming language variables or domain names.</div><div><br></div><div>But later:</div><div><br></div><div>Pattern Syntax : There are many circumstances where software interprets patterns that are a mixture of literal characters, whitespace, and syntax characters. Examples include regular expressions, Java collation rules, Excel or ICU number formats, and many others. </div><div><br></div><div>(programming languages are not mentioned there)</div><div><br></div><div>Any clarification as to whether White_Space should be considered over Pattern_White_Space for programming languages would be appreciated :)</div><div><br></div><div>I think that clarification might be useful for many users as different programming languages have made different choices!</div><div><br></div><div>Thanks, </div><div><br></div><div>Corentin</div><div><br></div></div>