• schnurrito
    link
    fedilink
    arrow-up
    103
    arrow-down
    7
    ·
    2 months ago

    no, this is one of the worst answers on Stack Overflow

    OP had a specific question to capture opening tags. The thing OP asked about can be done with regular expressions. It is true that arbitrarily nested languages like HTML cannot generally be parsed with regular expressions, but that is not what OP asked about.

    • fartsparkles@sh.itjust.works
      link
      fedilink
      arrow-up
      88
      arrow-down
      1
      ·
      2 months ago

      This is StackOverflow after all. Your question is wrong. Your problem is wrong. You are wrong. I am right. Thread locked. Go read this other post that is totally unrelated to your problem I’ve decided isn’t the problem you’re facing because. I. Am. Right.

    • moriquende@lemmy.world
      link
      fedilink
      arrow-up
      8
      arrow-down
      1
      ·
      2 months ago

      It can’t be done, as an opening tag in html can contain anything in its attributes, even JavaScript (e.g. onclick handler).

        • moriquende@lemmy.world
          link
          fedilink
          arrow-up
          5
          ·
          2 months ago

          You can’t parse every html opening tag with regex, because a html opening tag doesn’t have a set structure. How would you match, with regex, this opening tag? <mytag myattribute="<value of \"myattribute\">" >

          • schnurrito
            link
            fedilink
            arrow-up
            1
            arrow-down
            1
            ·
            edit-2
            2 months ago

            Is this valid HTML? My understanding is that that attribute value needs to be escaped, i.e. &lt;value of \&quot;myattribute\&quot;&gt;.

            • moriquende@lemmy.world
              link
              fedilink
              arrow-up
              4
              ·
              2 months ago

              The quote must not be escaped when you start with a single quote. The rest doesn’t. This is valid and tested: <img alt='my "<img>"'>

    • kbal@fedia.io
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      2 months ago

      It can be done with simple regex of the kind proposed in various answers there iff the html is known to be limited to the subset of html where that sort of thing can easily be made to work. The question does not tell us whether or not that is the case, so everyone is free to make their own assumptions and argue as if they know what’s going on.