• NeatNit
    link
    fedilink
    arrow-up
    1
    ·
    3 hours ago

    Sanity is subjective here. There are reasons to disallow non-ASCII characters, for example to prevent identical-looking characters from causing sneaky bugs in the code, like this but unintentional: https://en.wikipedia.org/wiki/IDN_homograph_attack (and yes, don’t you worry, this absolutely can happen unintentionally).

    • toastal@lemmy.ml
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      2 hours ago

      OCaml’s old m17n compiler plugin solved this by requiring you pick one block per ‘word’ & you can only switch to another block if separated by an underscore. As such you can do print_แมว but you couldn’t do pℝint_c∀t. This is a totally reasonable solution.

    • thevoidzero@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      3 hours ago

      Sorry, I forgot about this. I meant to say any sane modern language that allows unicode should use the block specifications (for e.g. to determine the alphabets, numeric, symbols, alphanumeric unicodes, etc) for similar rules with ASCII. So that they don’t have to individually support each language.

      • NeatNit
        link
        fedilink
        arrow-up
        1
        ·
        2 hours ago

        Oh, that I agree with. But then there’s the mess of Unicode updates, and if you’re using an old version of the compiler that was built with an old version of Unicode, it might not recognize every character you use…