Lately my PC has started crashing while it plays videos. It freezes completely, screen frozen and not responding to any input (keyboard, mouse), I mean I cannot change TTY (alt + ctrl + F(1-2-…)), and it cannot even respond to alt + PrntScr + REISUB. I have to force power off by holding down the power button.

After I reboot I have tried checking all logs available and I cannot find anything logged right before the incident. Last entries are always different and not indicating anything.

I suspect it has to do with the graphics card but I’m looking for ways that I can dig deeper on that and confirm it or not.

What else should I check? How can I find more info?

OS: Lubuntu 22.04.3 LTS (latest updates) I’m using the nvidia proprietary drivers (nvidia-driver-390)

UPDATE:

First of all thank you all for your input and fresh ideas. Now I’ve already tried some of them and I will continue with the other ones until I get some results.

till now I have tried

  • memtest and it didn’t show any errors.
  • boot from a live distro and see if problem also occurs. Well it didn’t occur but on the live distro you cannot change the graphics driver. So it was using the open source nouveau driver, also it didn’t happen during the 1 hour I let it play. The thing is that it never was punctual even before. It could happen during the first hour or the third or sometime later.

Next steps are to

  • open the case and clean it up to remove the possibility of high temp because of that,
  • change my drivers to be the nouveau and try again,
  • try with only the onboard GPU on,
  • remove extra disks to reduce the load of the PSU

thank you all again.

  • blotz@lemmy.world
    link
    fedilink
    arrow-up
    17
    arrow-down
    1
    ·
    edit-2
    1 year ago

    Check your storage connection! If storage disconnects, your OS will freeze and stop responding to the keyboard. Also, the os won’t be able to write any logs because the storage isn’t attached. Even power off won’t work because the os can’t read any files. This feels very similar to your problem.

    For me, my motherboard had a faulty drive controller which would randomly stop working and drives would no longer appear connected.

    I’m not sure whether you have the same issue as me but it has the same characteristics as mine. Hope this helps!

    • gohixo9650@discuss.tchncs.deOP
      link
      fedilink
      arrow-up
      12
      ·
      1 year ago

      thanks for your input but it looks different. I mean when I power it off with the button, then it is possible to boot without issues. Also it doesn’t freeze randomly. It freezes only if it plays videos. Now it is 10 days on uninterrupted since I stopped playing videos.

    • vettnerk@lemmy.ml
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      Had issue with my storage recently, and the symptom was similar to what OP described. Syslog didn’t reveal anything, as the root filesystem was read only, so troubleshooting it was hard. Coincidentally I needed a newer kernel, and after the upgrade the problem disappeared.

      • blotz@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        When storage disconnects while the os is still running, it causes the os to freeze and stop responding to all keyboard inputs. I thought this was similar to OPs issue which is why I suggested it

  • CapnElvis@kbin.social
    link
    fedilink
    arrow-up
    10
    ·
    1 year ago

    Test your RAM. I had a machine doing this a few years ago - turns out I had a stick of RAM with a 128k block somewhere in the middle that was dead.

    That machine worked fine as long as I didn’t get it doing anything too intensive, then it would crash. A new stick of RAM solved the issue.

    • Avid Amoeba@lemmy.ca
      link
      fedilink
      arrow-up
      4
      ·
      1 year ago

      This is the most likely issue. To add - test 3-4 passes of Memtest86+. The first pass is shorter and meant for finding egregious RAM problems. It can fail on subsequent full passes. I had my RAM fail on 3rd of 4th pass which passed the 1st. It could even be caused by incompatibility of the size of RAM with the platform. For example in my case AMD supported 2x 8GB sticks of this RAM with no issues. Insert 4x 8GB and it starts producing errors even if each individual stick passes with flying colors.

    • lemmyvore@feddit.nl
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      1 year ago

      Seconded. I’d been having issues (random freezes, crashes) for a while but I had attributed then to a lack of RAM. So I bought some more RAM at some point and ran memtest on all RAM together and saw errors. Those bastards, they sold me dodgy RAM, right? Tested the new sticks individually, they were clean. Turns out I had a bad 64kb area on one of my old sticks.

      You can tell the kernel to not use the bad area btw if it’s all in one place, so don’t necessarily rush out to replace the bad stick.

  • mnmalst@lemmy.zip
    link
    fedilink
    arrow-up
    10
    ·
    1 year ago

    I was in a similar situation not too long ago and couldn’t find anything to fix it either at first. One thing that was high on my list was changing my PSU since a defect or weak one often seems to be a problem in such cases. Besides a general hardware failure of course. If it’s the hardware that could be anything really. Motherboard, RAM, GPU, PSU. PSU is the easiest to switch tho, so if you go that route I would try that first.

    Anyways, I never had to do this cause in my case, believe it or not, a BIO update fixed my problem. I am still not 100% sure what happened but I think the update fixed the GPU voltage distribution or something similar.

    Hope that help at least a little bit.

    • gohixo9650@discuss.tchncs.deOP
      link
      fedilink
      arrow-up
      4
      ·
      1 year ago

      good idea about the PSU. I hadn’t thought of that. The PSU is not any high-performance/high-quality and is already 5 years old. Being unable to provide the required voltage may be a possibility if we accept that the performance degrades in time. (Was working without issues for 5 years in the same PC configuration).

      I think I’ll try by first removing the extra HDDs so reducing the load and check again. Thanks for your input

      • dylanmorgan@slrpnk.net
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        1 year ago

        If your processor/MB has onboard video, it would probably be easier to pull the gpu and test. If you still suspect power management, pulling other components like additional HDDs after adding the gpu back would confirm it.

    • reddit_sux@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      PSU is the last thing we check but is usually the first to fail under load if it is old or cheap. Try reducing the load on it like not using ur GPU, HDDs or any other peripheral that is unnecessary.

      Next check ur RAM, that too can give random errors under load.

  • ourob@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    1
    ·
    1 year ago

    If you have another pc, ssh from it to the problem machine and run sudo dmesg -w. That should show kernel messages as they are generated and won’t rely on them being written to disk.

    • gohixo9650@discuss.tchncs.deOP
      link
      fedilink
      arrow-up
      6
      ·
      1 year ago

      i will try it but I’m quite confident that it will be unresponsive/not reachable since if the kernel was listening it would respond to the alt + PrntScr + REISUB by unmounting the drives and I would see it when I examine the logs afterwards

      • ourob@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        10
        ·
        1 year ago

        To be clear, dmesg -w should be run before you do anything to cause the crash. It will continuously print kernel output until you press ctrl+c or the kernel crashes.

        In my experience, a crashing kernel will usually print something before going unresponsive but before it can flush the log to disk.

  • mortrek@lemmy.ml
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 year ago

    Can you be more specific when you say “plays videos”?

    Like in vlc, or YouTube, or something else? What videos? Like, 4k hevc videos, or literally anything?

    • gohixo9650@discuss.tchncs.deOP
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      mostly on youtube, usually at 720p30fps. I think if I go to 60fps it crashes even faster. Also I’ve tried watching on freetube and on firefox + mpv, but it can crash in all combinations

      • Possibly linux@lemmy.zip
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        It should say “PASS” if your rams good. It can take a while depending on your ram speed and amount.

        • gohixo9650@discuss.tchncs.deOP
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          it reached to the point that says “Pass complete, no errors, press Esc to exit” but the test still runs. So yes, I will re-do it for the additional passes, but from a first look, it looked fine.

  • ∟⊔⊤∦∣≶@lemmy.nz
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    Why do you suspect gfx card?

    What happens if you use a different video player, or play different videos ie different codecs?

    I used to have a similar issue but updates at some point fixed it. I think it was codecs.

    Do you have an onboard gfx card you can use instead to test?

    • gohixo9650@discuss.tchncs.deOP
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      Why do you suspect gfx card?

      because it happens only on video. Also if it is an 60fps video I start hearing the fans spinning like mad

      What happens if you use a different video player, or play different videos ie different codecs?

      haven’t tried specific codecs. Usually it is youtube videos but makes no difference if I play them on firefox, on chromium, or even on opening them on MPV

      Do you have an onboard gfx card you can use instead to test?

      yes, there is one. Good idea.

      • mnmalst@lemmy.zip
        link
        fedilink
        arrow-up
        3
        arrow-down
        1
        ·
        1 year ago

        Also if it is an 60fps video I start hearing the fans spinning like mad

        ok this is definitely not normal. Check the temperature of your GPU. Is the GPU physically clean? Is the air flow ok? Do you have the chance to test with a different GPU? An overheating GPU can definitely lead to the described symptoms.

      • taladar@sh.itjust.works
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        If the fans are spinning maybe check if it still happens when you point a big standalone Fan at the graphics Card.

        • gohixo9650@discuss.tchncs.deOP
          link
          fedilink
          arrow-up
          2
          ·
          1 year ago

          sorry but what exactly do you mean by checking resource utilization? CPU? yes load increases but it is not indicative. It is not starting being laggy and eventually freeze. It goes from responsive instantly to non responsive. RAM? it is 8GB, it always shows as full including the cached items (which is normal), but it doesn’t start to swap. Swap is ~1GB used. SSD? not full.

          What else (and how) should I check?

          • rem26_art@kbin.social
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            1 year ago

            im not sure how helpful it’ll be, but theres a program called nvtop that will show you what’s using your GPU and all that. Maybe it’ll show something right as it crashes, or at least give you a hint as to what to look into next?

          • ober@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            I mean to use something like htop, btop, or psensor to check how much of your RAM, CPU, GPU, etc is being used along with temperature. Also, what do you mean your RAM always shows as full? I get that Linux “uses” it all but most resource monitors should be able to tell how much is actually being used for programs.

  • MiddledAgedGuy@beehaw.org
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    I think a live boot cd or trying to use an integrated gpu, if available, (both I saw already suggested) are better steps but you could also try blacklisting nvidia and use nouveau. Could point to those drivers if it works ok.

  • MonkCanatella@sh.itjust.works
    link
    fedilink
    arrow-up
    2
    ·
    1 year ago

    Would it be too much to remove the GPU and run videos? How often does it happen? If it’s easy to reproduce, and it’s not too much work, you can try removing the GPU and using onboard gfx to see if the problem persists.

    Another suggestion, maybe try different drivers.

    Alternatively, you could boot do a new install to a usb, install the same or different drivers or both, to see if the problem persists.

    This does seem like a stability issue, either on the hardware or firmware side. It could even be as simple as reseating the GPU.

    btw, do you happen to remember whenabout the first crash happened? Did it start out sporadic and grow more frequent?

    • gohixo9650@discuss.tchncs.deOP
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      btw, do you happen to remember whenabout the first crash happened? Did it start out sporadic and grow more frequent?

      it started happening before one month. It could have such pattern, yes. In the beginning it started happening some of the times I was watching any video. In the end it ended up happening almost every time. Today I’m still trying to make it happen though but not yet.

      • MonkCanatella@sh.itjust.works
        link
        fedilink
        arrow-up
        2
        ·
        1 year ago

        Sorry to hear that. These types of issues are very frustrating.Something tells me it’s your GPU, but CPU and RAM issues can look totally weird like this as well.

  • plasticcheese@lemmy.one
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    1 year ago

    I had a similar problem a while back and it turned out to be my Asus motherboard’s “AI” frequency control hard locking the system. Took me days of troubleshooting and headaches to figure this out. Ended up switching it off in BIOS and everything is stable now. Just my 2c.

    • gohixo9650@discuss.tchncs.deOP
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      but will I see anything more than what I see in /var/log ? I have checked all possible logs… Or was your suggestion just for a better way to check the logs in general?

      • snekerpimp@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        Whenever I have no clue what’s happening I turn to journalctl and google. It’s just how I learned to parse log files and investigate.

  • Pankkake@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    I used to have some similar issues when playing games, and the cause of it was my motherboard’s firmware. Maybe check and see if it is up to date?

      • MonkCanatella@sh.itjust.works
        link
        fedilink
        arrow-up
        2
        ·
        1 year ago

        It’s really easy. The manufacturer’s website will have a page for your motherboard, where you can download new versions of bios. They’ll have instructions how to flash it. Should be as easy as downloading the bios update to a usb drive, restarting to bios and selecting the update option and pointing it to the usb drive.