I’ve got a weird problem and looking for possible pointers.

On at least one of our servers, kernel 5.10.0-0.deb10.16-amd64 boots without a problem. Bat as we don’t want to rely on an “ancient” kernel build for Debian Buster, we also tried various later ones but they all fail to start in the same way. Taking for example 6.1.0-11-amd64 from Debian Bookworm, this one would boot fine from local disk, but the very same one loaded via DHCP/PXE/TFTP would load the kernel and initrd seemingly fine but then only print

early console in setup code
Probing EDD (edd=off to disable)... ok

and then hang, i.e. the newly loaded kernel does not even start. Kernel command line options include already

debug loglevel=7 ro console=ttyS1,115200n8 earlyprintk=serial,ttyS1,115200n8 console=tty0

and I don’t get any more info from the system, neither via serial port nor at the console.

Anyone with pointers?

Edit: edd=off results in the very same except the corresponding line missing from output

  • Kwozyman@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    In the FTPD logs, do you see the initrd file being pulled? Could it be a mismatch between the kernel and initrd you’re serving?

    • JohnJeffreyJonesOP
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      Thanks for the hint, but no, no mismatch and yes the files are being pulled (even looked with tshark if everything comes over properly).

      The solution was then much more benign. The stock kernel in the NFSroot where the initrd was produced in was much smaller than the one from the bootable system. This lead into why this was the case and it was missing about 200 non-free drivers which somehow made the kernel stop right before really starting off.

      Adding those to the NFSroot and then into the initrd solved the problem sigh

      Wasted way too much time there and I still have no idea why 5.10 booted ok the whole time.