this post was submitted on 01 Sep 2023
15 points (100.0% liked)

Linux

8010 readers
63 users here now

Welcome to c/linux!

Welcome to our thriving Linux community! Whether you're a seasoned Linux enthusiast or just starting your journey, we're excited to have you here. Explore, learn, and collaborate with like-minded individuals who share a passion for open-source software and the endless possibilities it offers. Together, let's dive into the world of Linux and embrace the power of freedom, customization, and innovation. Enjoy your stay and feel free to join the vibrant discussions that await you!

Rules:

  1. Stay on topic: Posts and discussions should be related to Linux, open source software, and related technologies.

  2. Be respectful: Treat fellow community members with respect and courtesy.

  3. Quality over quantity: Share informative and thought-provoking content.

  4. No spam or self-promotion: Avoid excessive self-promotion or spamming.

  5. No NSFW adult content

  6. Follow general lemmy guidelines.

founded 1 year ago
MODERATORS
 

On Windows, we've had the defrag tool and others, that happily works on a drive even while it is in use, even the OS disk.

On Linux, I know of the fsck command but that requires the drive in question to be unmounted. Not great when you want to check a running server. I do not want to stop my server and boot it from USB, just to run a disk check. I can't imagine that's what the data centers are doing, either!

Surely some Linux tool exists that can do some basic checks on a running system?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 1 year ago (3 children)

Then what are they doing? It seems very cumbersome to have to take a drive offline for routine maintenance.

[–] [email protected] 5 points 1 year ago

They don’t do anything.

They have lots and lots of redundancy, and when enough drive fails, they decommission the entire server and/or rack.

Them big players play at a very different scale than the rest of us.

[–] [email protected] 3 points 1 year ago

Hardware-backed RAID, with error monitoring and patrol read. iSCSI or similar to present that to a virtualization layer. VMFS or similar atop that. Files atop that to represent virtual drives. Virtual machines atop that.

Patrol read starts catching errors long before SMART will. Those drives get replicated to (and replaced by) hot spares, online. Failing drives then get replaced with new hot spares.

But all of that is irrelevant, because at the enterprise level, they are scaling their applications horizontally, with distributed containers. So even if they needed to do fsck at the guest filesystem level (or even if they weren't using virtualization) they would just redeploy the containers to a different node and then direct traffic away from the one that needs the maintenance.

[–] [email protected] 3 points 1 year ago

We don't do maintenance, we just have redundancy, and backups, then replace failed components.