this post was submitted on 21 Jul 2023
3 points (100.0% liked)
Ask Burggit!
1 readers
1 users here now
Ask Burggit!
Ever had a question you wanted to ask? Get an opinion on something?
Well, here's the place to do it! Ask the community pretty much anything.
Rules:
- Follow the rules of Burggit.moe
- Not intended for tech support or questions for Burggit staff.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Linux already has support for this, but it still has multiple limitations such as requiring multiple CPUs or including a hard limit for how many CPUs can be installed.
Assuming a consumer use case (single CPU socket, no real-time requirements) the easiest approach would be including an additional soft-off power state (S2/S3, but also setting the CPU into G3 and isolating things such as RAM), a way to prevent wake-up while the CPU is not connected, and a restart vector that lets the OS tell applications a CPU has been changed to let them safely exit code dependent on feature flags that may not be present. TPMs stay on the removed CPU, so anything relying on their PCRs gets hosed.
The Linux kernel once again amazes me. It seems to have absolutely everything. I tried looking to see videos of this in action from a hands-on perspective to see how this would work, but I get nothing but LTT clickbait garbage.
Seems like one of those neat features that in reality would see little to no use. Without a rework of cpu cooling systems and installation structures, a "hot swap" of the cpu would take minutes to complete at the fastest, and realistically, there are few circumstances that would benefit from a hot swap. The only realistic scenario would be prosumer dual+ cpu boards that can shift the load, yet are still trying to maintain 100% online time but still cannot afford to just shift it to a second server temporarily.
Too stiff and unlikely to be used by the entry user, and not worth the risk for corporate entities that can afford to just have more servers with buffer to offline one for maintenance.
As for my thoughts on how it'd work, perhaps freezing the entire system somehow, and then dumping all buffers to RAM, then like RA2 said, slowly feeling out what you've got, and waking things up one at a time as the RAM buffer is loaded back in. I can only guess at the landmines you'd run into trying this in a live environment, with any slight deviation from what a process expected immediately hanging that process, if not the whole system. I'd guess the new CPU would need as much or more cache space, although I'm already reaching my computer infrastructure knowledge on the subject.
Yeah, I definitely figured this would have very little use. I'd imagine in such a hypothetical scenario, the CPU cooler would have some sort of mechanism where it'd hold onto the CPU as you're pulling it off of there and it'd be slotted to where you'd have to pull straight up with no deviation in movement, and there'd be some sort of handle on top of the CPU cooler. Fun thought exercise, regardless.