Plantage de l'affichage

Bonjour,

Depuis la dernière mise à jour du noyau, il arrive qu’au bout d’un moment, mes écrans s’éteignent et plus rien ne réponde.
Je rencontre le problème avec la version précédente du noyau.
Voici les erreurs présentes dans les journaux :

Oct 30 06:31:30 Thorn kernel: [  715.626138] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Oct 30 06:31:35 Thorn kernel: [  715.626138] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Oct 30 06:31:35 Thorn kernel: [  720.500043] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=72067, emitted seq=72069
Oct 30 06:31:35 Thorn kernel: [  720.500154] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 3848 thread Xorg:cs0 pid 4309
Oct 30 06:31:35 Thorn kernel: [  720.500226] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Oct 30 06:31:39 Thorn kernel: [  724.500156] amdgpu 0000:03:00.0: amdgpu: failed to suspend display audio
Oct 30 06:31:39 Thorn kernel: [  724.500219] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
Oct 30 06:31:39 Thorn kernel: [  724.500221] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
Oct 30 06:31:39 Thorn kernel: [  724.693467] [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:438
Oct 30 06:31:39 Thorn kernel: [  724.695393] [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:512
Oct 30 06:31:39 Thorn kernel: [  724.795935] [drm:dcn20_wait_for_blank_complete [amdgpu]] *ERROR* DC: failed to blank crtc!
Oct 30 06:31:39 Thorn kernel: [  724.798033] [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:446
Oct 30 06:31:39 Thorn kernel: [  724.799948] [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:520
Oct 30 06:31:39 Thorn kernel: [  724.801867] [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:470
Oct 30 06:31:39 Thorn kernel: [  724.803787] [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:544
Oct 30 06:31:39 Thorn kernel: [  724.904307] [drm:dcn20_wait_for_blank_complete [amdgpu]] *ERROR* DC: failed to blank crtc!
Oct 30 06:31:39 Thorn kernel: [  724.904428] [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* ring_buffer_start = 0000000054ce9db7; ring_buffer_end = 000000004f05c210; write_frame = 000000006b780f6a
Oct 30 06:31:39 Thorn kernel: [  724.904515] [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* write_frame is pointing to address out of bounds
Oct 30 06:31:39 Thorn kernel: [  724.925981] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=5
Oct 30 06:31:39 Thorn kernel: [  725.117496] [drm] REG_WAIT timeout 1us * 100000 tries - optc1_disable_crtc line:528
Oct 30 06:31:39 Thorn kernel: [  725.117506] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=5
Oct 30 06:31:39 Thorn kernel: [  725.117614] [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* ring_buffer_start = 0000000054ce9db7; ring_buffer_end = 000000004f05c210; write_frame = 000000006b780f6a
Oct 30 06:31:39 Thorn kernel: [  725.117682] [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* write_frame is pointing to address out of bounds
Oct 30 06:31:40 Thorn kernel: [  725.227498] [drm] REG_WAIT timeout 10us * 10020 tries - enc1_stream_encoder_dp_blank line:944
Oct 30 06:31:40 Thorn kernel: [  725.312942] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=5
Oct 30 06:31:40 Thorn kernel: [  725.504523] [drm] REG_WAIT timeout 1us * 100000 tries - optc1_disable_crtc line:528
Oct 30 06:31:40 Thorn kernel: [  725.702408] amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Oct 30 06:31:40 Thorn kernel: [  725.702469] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Oct 30 06:31:40 Thorn kernel: [  725.893847] amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Oct 30 06:31:40 Thorn kernel: [  725.893906] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Oct 30 06:31:40 Thorn kernel: [  726.085281] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Oct 30 06:31:40 Thorn kernel: [  726.085403] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:7 param:0x00000000 message:DisableAllSmuFeatures?
Oct 30 06:31:40 Thorn kernel: [  726.085405] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features.
Oct 30 06:31:40 Thorn kernel: [  726.085407] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
Oct 30 06:31:40 Thorn kernel: [  726.085408] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -121
Oct 30 06:31:40 Thorn kernel: [  726.085466] [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* ring_buffer_start = 0000000054ce9db7; ring_buffer_end = 000000004f05c210; write_frame = 000000006b780f6a
Oct 30 06:31:40 Thorn kernel: [  726.085530] [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* write_frame is pointing to address out of bounds
Oct 30 06:31:40 Thorn kernel: [  726.085590] [drm:psp_suspend [amdgpu]] *ERROR* Failed to terminate hdcp ta
Oct 30 06:31:40 Thorn kernel: [  726.085650] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <psp> failed -22
Oct 30 06:31:40 Thorn kernel: [  726.086727] CPU: 1 PID: 55519 Comm: kworker/u64:1 Tainted: G           OE     5.18.0-0.deb11.4-amd64 #1  Debian 5.18.16-1~bpo11+1
Oct 30 06:31:40 Thorn kernel: [  726.086730] Hardware name: To Be Filled By O.E.M. B450 Steel Legend/B450 Steel Legend, BIOS P4.30 02/25/2022
Oct 30 06:31:40 Thorn kernel: [  726.086731] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Oct 30 06:31:40 Thorn kernel: [  726.086735] Call Trace:
Oct 30 06:31:40 Thorn kernel: [  726.086737]  <TASK>
Oct 30 06:31:40 Thorn kernel: [  726.086739]  dump_stack_lvl+0x45/0x5e
Oct 30 06:31:40 Thorn kernel: [  726.086743]  amdgpu_do_asic_reset+0x28/0x44a [amdgpu]
Oct 30 06:31:40 Thorn kernel: [  726.086842]  amdgpu_device_gpu_recover_imp.cold+0x613/0x8ea [amdgpu]
Oct 30 06:31:40 Thorn kernel: [  726.086927]  amdgpu_job_timedout+0x149/0x180 [amdgpu]
Oct 30 06:31:40 Thorn kernel: [  726.087009]  drm_sched_job_timedout+0x76/0x110 [gpu_sched]
Oct 30 06:31:40 Thorn kernel: [  726.087011]  process_one_work+0x1e5/0x3b0
Oct 30 06:31:40 Thorn kernel: [  726.087014]  ? rescuer_thread+0x390/0x390
Oct 30 06:31:40 Thorn kernel: [  726.087016]  worker_thread+0x50/0x3a0
Oct 30 06:31:40 Thorn kernel: [  726.087017]  ? rescuer_thread+0x390/0x390
Oct 30 06:31:40 Thorn kernel: [  726.087018]  kthread+0xe7/0x110
Oct 30 06:31:40 Thorn kernel: [  726.087019]  ? kthread_complete_and_exit+0x20/0x20
Oct 30 06:31:40 Thorn kernel: [  726.087021]  ret_from_fork+0x22/0x30
Oct 30 06:31:40 Thorn kernel: [  726.087024]  </TASK>
Oct 30 06:31:40 Thorn kernel: [  726.087025] amdgpu 0000:03:00.0: amdgpu: MODE1 reset
Oct 30 06:31:40 Thorn kernel: [  726.087026] amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
Oct 30 06:31:40 Thorn kernel: [  726.087801] amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
Oct 30 06:31:40 Thorn kernel: [  726.087802] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:48 param:0x00000000 message:Mode1Reset?
Oct 30 06:31:40 Thorn kernel: [  726.087804] amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset failed
Oct 30 06:31:41 Thorn kernel: [  726.280191] amdgpu 0000:03:00.0: amdgpu: ASIC reset failed with error, -121 for drm dev, 0000:03:00.0
Oct 30 06:31:51 Thorn kernel: [  736.879566] amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
Oct 30 06:31:51 Thorn kernel: [  736.879718] [drm] PCIE GART of 512M enabled (table at 0x000000800012C000).
Oct 30 06:31:51 Thorn kernel: [  736.879731] [drm] VRAM is lost due to GPU reset!
Oct 30 06:31:51 Thorn kernel: [  736.879733] [drm] PSP is resuming...
Oct 30 06:31:51 Thorn kernel: [  737.091003] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
Oct 30 06:31:51 Thorn kernel: [  737.091104] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
Oct 30 06:31:51 Thorn kernel: [  737.091200] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
Oct 30 06:31:51 Thorn kernel: [  737.091257] [drm] Skip scheduling IBs!

La dernière ligne se répète plusieurs fois et j’ai ces deux lignes au milieu :

Oct 30 06:31:51 Thorn kernel: [  737.091289] amdgpu 0000:03:00.0: amdgpu: GPU reset(2) failed
[…]
Oct 30 06:31:51 Thorn kernel: [  737.091410] amdgpu 0000:03:00.0: amdgpu: GPU reset end with ret = -62

Après, j’ai ça :

Oct 30 06:32:02 Thorn kernel: [  747.117660] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=2162, emitted seq=2163
Oct 30 06:32:02 Thorn kernel: [  747.117808] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Oct 30 06:32:02 Thorn kernel: [  747.117899] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Oct 30 06:32:06 Thorn kernel: [  751.117835] amdgpu 0000:03:00.0: amdgpu: failed to suspend display audio

et la suite, c’est le redémarrage de la machine.

Voici quelques retours de commandes :

  • uname -r
5.19.0-0.deb11.2-amd64
  • grep -vr "^#" /etc/apt/sources.list{,.d}
/etc/apt/sources.list:deb http://deb.debian.org/debian bullseye main contrib non-free
/etc/apt/sources.list:deb http://deb.debian.org/debian bullseye-updates main contrib non-free
/etc/apt/sources.list:deb http://deb.debian.org/debian-security bullseye-security main contrib non-free
/etc/apt/sources.list:
/etc/apt/sources.list:deb http://deb.debian.org/debian bullseye-backports main contrib non-free
  • lspci -nnkd ::300
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 22 [1002:73df] (rev df)
        Subsystem: Sapphire Technology Limited Navi 22 [1da2:e445]
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu

Quelqu’un a une idée de ce que ça peut être ?

Bonjour,

Quelle est la version de ton firmware-amd-graphics? j’ai le même noyau sans aucun problème avec une carte

    05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c4)
            Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:0123]
            Kernel driver in use: amdgpu
            Kernel modules: amdgpu

C’est la version 20210818-1~bpo11+1.

Ben, ça dépend, aujourd’hui, j’ai eu un plantage au bout de douze minutes, j’ai redémarré et ça fait dix heures que la machine tourne sans aucun problème.

ah, le mien tourne 24/24h.
par contre, le firmware est celui de non-free, pas celui du backports

Le firmware que j’utilise est aussi celui de non-free.

Ce dépôt.

non-free non backports pardon :slight_smile:

C’est pas le noyau standard ça?

Comment ça, « pas le noyau standard » ?
C’est le noyau Linux en version 5.19.0 disponible depuis les dépôts rétroportés de Debian 11.

Par contre, la citation que tu fais semble correspondre à une entrée de journal, je ne vois pas le rapport avec le fait que le noyau soit standard ou pas.
Ce que tu cites précise que l’entrée du journal concerne la machine nommée « Thorn » et qui provient du noyau.