116

Linux 6.13 will report the number of hung tasks since boot

My dmesg is already constantly full of

  INFO: task btrfs:103945 blocked for more than 120 seconds.
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Until eventually

  Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
So I'm looking forward to getting an actual count of how often this happens without needing to babysit the warning suppressions and count the incidents myself.
20 hours agobhaney

In my limited experience this is often from overloaded virtualization platform where this VM is located on. Can be vmware or proxmox, on proxmox its sometimes when VM is being live migrated to other virtualization host. Can also happen when backend storage where this VM is located is busy with other hosts.

EDIT: none of the VMs where I have ever seen this had btrfs, it was always ext4 mentioned in follow up error message, so pretty much filesystem agnostic issue. If this is on hardware however, then I don't know whats going on there.

6 hours agomerpkz

NFS client code occasionally triggered this in the (distant) past, but that's just how it is when 'hard' mount option is chosen and the server becomes inaccessible for a prolonged time. The solution there is to get a reliable, sufficiently capable server and network (NOT to use 'soft' mounts as that can lead to silent data corruption) (and avoid cross-ocean mounts ;-}

But yes, running a VM on a grossly overloaded (over-committed memory?) host might trip timeout warnings as this as well.

4 hours agoguenthert

You could leave this problem behind by switching to a filesystem that isn't full of deadlock bugs.

19 hours agojeffbee

I am curious - is this message indicative of a problem in the fs? I would have assumed anything marked "INFO" is, tautologically, not an error, but surely a filesystem shouldn't be locking up? Or is it just suggestive of high system load or poor hardware performance?

18 hours agoyjftsjthsd-h

In my experience, "hung task" is almost always due to running out of RAM and the scheduler constantly thrashing instead of doing useful work. I rarely actually reach the point of seeing the message since I'll sysrq-kill if early enough, or else hard-reboot.

Note also that modern filesystems do a lot of background work that doesn't strictly need to be done immediately for correctness.

(of course, it also seems common for people to completely disregard the well-documented "this feature is unreliable, don't use it" warnings that btrfs has, then complain that they have problems and not mention that they ignored the warnings until everyone is halfway through complaining)

The only problems I've encountered in all my years of using btrfs are:

* when (all copies of) a file bitrots on disk, you can't read it at all, rather than being able to copy the mostly-correct file and see if you can hand-correct it into something usable

* if you enable new compression algorithms on your btrfs volume, you can't read your data from old kernels (often on liveusb recovery disks)

* fsync is slow. Like, really really slow. And package managers designed for shitty CoW-less filesystems use fsync a lot.

15 hours agoo11c

> In my experience, "hung task" is almost always due to running out of RAM

In my case, I don't think this machine ever commits more than around 5GB of its 32GB available memory, so I doubt it's that.

> it also seems common for people to completely disregard the well-documented "this feature is unreliable, don't use it" warnings that btrfs has

Now that I am definitely doing. I won't give up raid6 until it eats all my data for a fourth time.

15 hours agobhaney

It could be any of the above, I'd say it's info because the kernel itself is not in an error state, it's information about a process doing something unusual

16 hours agoshric

That the in-kernel code for btrfs locks up should never happen at all. There is a rumor going around that btrfs never reached maturity and suffers from design issues.

17 hours agoblueflow

That's why I use ext4 exclusively on linux. Never once had a filesystem issue.

16 hours agoSoftTalker

ext4 works fine on my Linux laptop and I agree, it's proven itself over many years to be supremely reliable, though it doesn't compare in features to the more complex filesystems.

On my home media server, however, I'm using ZFS in a RAID array, with regular scrubs and snapshots. ZFS has many features like RAID, scrubs, COW, snapshots, etc. that you just don't get on ext4. However, unlike btrfs, ZFS seems to have a great reputation for reliability with all its features.

12 hours agoshiroiushi

I use ext4 on my home media server (24TB). I'm using LVM and MD, and it's been rock solid for a couple decades now, surviving all sorts of hardware failures.

I haven't missed out on any zfs or btrfs features. Yes, I know about their benefits, and no, I don't care if a few bits flip here or there over time.

10 hours agokelnos

Granted it was at least a decade ago but the team I was on had a terrible experience with ZFS and that bad taste still lingers. And I don’t need any of its features.

12 hours agoSoftTalker

Could I ask you to expand on your problems with ZFS? Code bugs, data loss, operational problems, ...? (Asking because I use it and would like to learn from your problems rather than having to experience the pain myself.)

9 hours agoyjftsjthsd-h

Given the mailing History with Linus I wouldn't be surprised

16 hours agoramon156

A background thread performing blocking io is an implementation detail not a bug. Other filesystems don’t have/need that sort of bookkeeping, so if a block device stalls badly enough to trigger these warnings then it will be attributed to application threads (if at all) rather than btrfs worker threads, but regardless the stall very much still happens

13 hours agokiririn

> if a block device stalls badly

That's really the issue at heart, because I've seen these on zfs as well... but you'd think the filesystem would report some progress to keep bumping the timer so it doesn't start spamming dmesg. /shrug

11 hours agonubinetwork

I was planning on it but the filesystem I wanted to switch to keeps getting set back by the author's CoC drama

16 hours agobhaney

What did you want to switch to?

I suppose the author at least isn't a murderer :)

4 hours agoprzemub

What counts as a hung task? Blocking on unsatisfiable I/O for more than X seconds? Scheduler hasn’t gotten to it in X seconds?

If a server process is blocking on accept(), wouldn’t it count as hung until a remote client connects? or do only certain operations count?

21 hours agogcr

torvalds/linux//kernel/hung_task.c :

static void check_hung_task(struct task_struct *t, unsigned long timeout) https://github.com/torvalds/linux/blob/9f16d5e6f220661f73b36...

static void check_hung_uninterruptible_tasks(unsigned long timeout) https://github.com/torvalds/linux/blob/9f16d5e6f220661f73b36...

21 hours agowesturner

Just to double check my understanding (because being wrong on the internet is perhaps the fastest way to get people to check your work):

Is this saying that regular tasks that haven't been scheduled for two minutes and tasks that are uninterruptible (truly so, not idle or also killable despite being marked as uninterruptible) that haven't been woken up for two minutes are counted?

20 hours agostriking

The comment in the code says two minutes but the time would actually seem to depend on a timeout given as a parameter.

2 hours agoDelk

Your and the Llama's explanations would make good comments for the source and/or the docs if true.

19 hours agowesturner
[deleted]
21 hours ago

And there's https://en.wikipedia.org/wiki/Zombie_process too

20 hours agoape4

Not the same thing by any means - they don't indicate something is wrong with kernel or hardware.

The zombie process state is a normal transient state for all exiting processes where the only remaining function of the process is as a container for the exiting process's id and exit status; they go away once the parent process calls some flavor of the "wait" system call to collect the exit status. A pileup of zombies indicates a userspace bug: a negligent parent process that isn't collecting the exit status in a timely manner.

20 hours agoPolizeiposaune

Additionally, there are a few more process accounting things, rusage, that zombie processes hold until reaped. See wait3(2), wait4(2) and getrusage(2).