• NeatNit
      link
      fedilink
      arrow-up
      47
      ·
      4 months ago

      Unicode in filenames? Are you crazy?!

      Okay that was /s to some extent but I gotta rant, I’m totally convinced that there’s still new software today that completely trip over themselves when files or paths have non-ASCII characters, or sometimes even a space. Incompetence didn’t go anywhere.

      • Rubanski@lemm.ee
        link
        fedilink
        arrow-up
        30
        ·
        4 months ago

        I still use underscores for filenames, basically muscle memory at this point

        • aulin@lemmy.world
          link
          fedilink
          arrow-up
          11
          ·
          4 months ago

          Spaces in file names will always be fiddly though. It’ll work, but it’ll still be wrong, because arguments are space separated, and having spaced file names totally messes with that.

          • Fennek@feddit.org
            link
            fedilink
            arrow-up
            5
            ·
            edit-2
            4 months ago

            I try to just always put files names or paths into quotes in CLI or tie it to a variable in programming. This way it also accepts spaces and knows how to separate it from arguments.

            • aulin@lemmy.world
              link
              fedilink
              arrow-up
              4
              ·
              4 months ago

              Yeah. It’s a good idea to guard against it, but I would still never put spaces in filesnames that I myself choose.

      • pmk@lemmy.sdf.org
        link
        fedilink
        arrow-up
        16
        ·
        4 months ago

        Unicode in filenames can be a bad idea, since there are more than one way to achieve what looks like the same character. So matching patterns could fail if you think it’s one way, but it’s actually another representation in unicode.

        • NeatNit
          link
          fedilink
          arrow-up
          5
          ·
          4 months ago

          Good point. Do filesystems use a normal form to at least prevent having two files with effectively the same name?

          I should point out the flip side though, that there’s no avoiding Unicode in filenames. Users in languages that don’t use the Latin alphabet (such as Japanese, Chinese, Korean, Hebrew, Arabic, Greek and Russian, and the list could go on) can reasonably expect to be able to give a file a name they can read and understand with no extra effort. All the software woes that come with it - too bad, software needs to deal with it.

          • pmk@lemmy.sdf.org
            link
            fedilink
            arrow-up
            2
            ·
            4 months ago

            I’m not sure. A few years ago I remember that OpenBSD expected ASCII for files, but I think Linux expects utf-8. I could be wrong though.

            • NeatNit
              link
              fedilink
              arrow-up
              3
              ·
              edit-2
              4 months ago

              I’m assuming Unicode anyway, and UTF-8 is by far the most natural because most files will be in ASCII. A “normal form” (see link above), you might think of it as a canonical form, is a way to check if two strings are equivalent, even if they encoded the text differently. Like the example mentioned on Wikipedia:

              For example, the distinct Unicode strings “U+212B” (the angstrom sign “Å”) and “U+00C5” (the Swedish letter “Å”) are both expanded by NFD (or NFKD) into the sequence “U+0041 U+030A” (Latin letter “A” and combining ring above “°”) which is then reduced by NFC (or NFKC) to “U+00C5” (the Swedish letter “Å”).

      • darklamer@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        10
        ·
        4 months ago

        Incompetence didn’t go anywhere.

        Now that’s certainly true, but the beauty of open source software is that we can fix bugs when we encounter them.

    • Psythik@lemmy.world
      link
      fedilink
      arrow-up
      4
      arrow-down
      2
      ·
      edit-2
      4 months ago

      Speaking of which, it blew my mind when I discovered that .EXEs are just zip files compressed archives. Same goes for .DLLs, and a lot of other common Windows file extensions as well. (.DOC too, for example IIRC). They all open in your favorite archiver software (I like NanaZip; which is a fork of 7-Zip with a modern UI).

      • NeatNit
        link
        fedilink
        arrow-up
        5
        arrow-down
        1
        ·
        4 months ago

        I don’t think that’s true for .exe or .dll files, but it’s definitely true for .docx files and other Office files ending with x. Some .exe’s are self-extracting archives or have other files embedded in them, so maybe that’s what you’ve been seeing.

        • areyouevenreal@lemm.ee
          link
          fedilink
          arrow-up
          4
          ·
          4 months ago

          You are actually correct. They can contain archived files or resources that can be unpacked with an archive program (including on Linux btw), but they aren’t just a zip file. That’s why my Linux archive manager (ark I think) offer to open one, but won’t execute it. They can see the extra content even if they can’t execute the file as intended.

            • areyouevenreal@lemm.ee
              link
              fedilink
              arrow-up
              2
              arrow-down
              1
              ·
              4 months ago

              Mate I saw the blind leading the blind and had to step in. You could have actually opened some exes on Linux as the other guy suggests. In fact I am surprised you never noticed your system presenting that option. It just isn’t actual proof of what they said, even if it appears like it. In fact I am a bit lost how neither of you realized something weird was going on. On what planet would an executable format being a zip file make any sense? Exes actually can include several executable formats.

              There are things like self extracting archives that make this all more confusing. They are basically an archive with an extraction program in the same file. Installer exes work in a similar way too. Not all exes can be extracted since not all of them contain secret hidden archives or extra resources.

              There actually are tools to show you the contents of an executable file, and you could probably learn a lot by using one. They contain more than just a blob of machine code like one might assume. Often they contain data as well, and instructions and information on how to load the executable like what memory layout to use.

              I am annoyed that people upvoted the other guy without double checking as well. Now we have more people walking around spreading misinformation just because of some guy on Lemmy. This is why things like climate change become contentious issues. People come to their own conclusions based on partial information, and since it appears to make sense without proper investigation it gets spread around like wildfire. It’s only when you actually know what’s going on at a deeper level that it becomes possible to spot the flaws in the reasoning.

          • NeatNit
            link
            fedilink
            arrow-up
            2
            ·
            4 months ago

            It’s a zip file that includes a bunch of things, including embedded images and a bunch of other junk, but yes - the most important and central files in the zip are XML-based.

        • Psythik@lemmy.world
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          4 months ago

          Why don’t you just try it and see for yourself?

          Remind me in about 5 hours and I’ll upload a screenshot as proof when I get home.

          • NeatNit
            link
            fedilink
            arrow-up
            2
            ·
            4 months ago

            I’m not on Windows.

            Let me know when you have the screenshot!

            • Psythik@lemmy.world
              link
              fedilink
              arrow-up
              2
              arrow-down
              1
              ·
              edit-2
              4 months ago

              You could always download a random exe even in Linux, you know. But I’ll handle it. Commuting home now.

              • NeatNit
                link
                fedilink
                arrow-up
                3
                ·
                edit-2
                4 months ago

                Well, I did get my hands on an exe file (some game on Steam) and opened it with Archive Manager. It does show some files, but the file properties say Type: application/x-ms-dos-executable (as opposed to application/zip). So it’s not an actual archive file, the archive manager is just displaying it as such to be helpful.

                The “files” I can see are:

                /.text
                /.reloc
                /.rsrc/version.txt
                /.rsrc/ICON/2.ico
                /.rsrc/ICON/3.ico
                /.rsrc/ICON/4.ico
                /.rsrc/GROUP_ICON/32512.ico

                I tried to create a zip file and rename it to .exe, but Archive Manager failed to open it at all which I found strange. You’d think it would look at the actual file contents to figure out what type of archive it is, and not rely on the extension.

                • areyouevenreal@lemm.ee
                  link
                  fedilink
                  arrow-up
                  3
                  ·
                  4 months ago

                  Okay that’s actually slightly different from what I was expecting. Does the .text file contain machine code or assembly language by any chance? It seems the archive program can pull out the executable code as well, similar to the binary analysis tools I have worked with.

                  .reloc is probably the relocation table used by the OS to load the program into an address space.

                • Psythik@lemmy.world
                  link
                  fedilink
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  4 months ago

                  Well fair enough. You clearly have more knowledge on the subject than I do.

                  FWIW, by “zip file”, I meant that the file is a compressed archive. Apologies for implying a specific file format. That wasn’t my intention.

      • areyouevenreal@lemm.ee
        link
        fedilink
        arrow-up
        5
        arrow-down
        2
        ·
        edit-2
        4 months ago

        Just because they open in 7-Zip or whatever doesn’t mean they are just a zip file. There are several kinds of archives. EXEs are a special case as well. They aren’t archives at all. Rather they can contain archives or extra content along with being an executable. One reason is self extracting archives. Here an archive is packaged with an extraction program as an exe all in one. The other case is exes that have extra resources like images, videos, graphics textures, etc. Either way it’s an executable plus some extra stuff, not a zip archive. DLLs I am not sure about, but I suspect something similar is happening here.

        Next time you should research stuff before posting it on Lemmy. Things are sometimes more complicated than they appear.

        docx you are correct about though. Specifically it’s a zip file that contains XML files and resources.

        Edit: I actually found an article on self extracting archives, it’s quite an interesting technology to be fair even if it causes confusion: https://en.m.wikipedia.org/wiki/Executable_compression

        • Psythik@lemmy.world
          link
          fedilink
          arrow-up
          3
          arrow-down
          2
          ·
          4 months ago

          By “zip file”, I meant a compressed archive. I’m not as nerdy as you guys are so I see now that there is a difference. I appreciate the correction.

          That said, you have to admit that it’s still cool that these different file formats are nothing more than archives. Maybe not to you but it blew my mind when I first learned this.

          • areyouevenreal@lemm.ee
            link
            fedilink
            arrow-up
            2
            arrow-down
            2
            ·
            4 months ago

            Bruh an exe is not an archive. Some just happen to contain an archive, not all. As me and the other guy discovered some archive utilities can read them, but what they are doing is closer to a binary analysis tool than unpacking an actual archive. It’s not about being nerdy, it’s about getting your facts right.

            • Psythik@lemmy.world
              link
              fedilink
              arrow-up
              4
              arrow-down
              2
              ·
              4 months ago

              Man, even when I try to be diplomatic I still get berated.

              Should have just said fuck you and called it a day. (kidding)

              • areyouevenreal@lemm.ee
                link
                fedilink
                arrow-up
                2
                arrow-down
                2
                ·
                edit-2
                4 months ago

                You’re still trying to weasel out of being wrong. It’s not an archive nor is it compressed. Go read what a Portable Executable is. It’s not about being diplomatic or whatever. Just admit you’re wrong and go and read about how it actually works. You might learn something.