Servers, like home furnaces, are critical pieces of infrastructure that should work without issue for many years. But eventually they will start struggling, maybe rattling a bit or not working as efficiently as they once did. They might limp along for a while, but one sad day, they go dark. Maybe it’s a quick component fix, or maybe you’ll have to replace the whole dang thing.
If you’re unable to get things back up and running quickly, you could be looking at expensive downtime while you try to find a replacement part or new system… all the while suffering the consequences. Don’t let this happen to you! Let’s take a look at signs a server is about to fail as well as some server lifecycle basics.
So, how long can you reasonably expect a server to hold up? It varies, but there are several rules of thumb:
So what are the red flags that might pop up before a server crash wrecks your plans for the weekend and drains your company’s bank account? Here’s what to look out for:
Like a human being, a server might be in trouble when it starts running a fever. According to one vendor, every increase of 18º F above 68º F reduces reliability by around 50 percent.
However, like a fever, the high temperature itself might not be the real problem, but instead an underlying symptom of what’s actually wrong (e.g., issues with power supply, memory, etc.). Therefore, you should check the CPU, chipset, and HDD temperatures, and check whether or not your fans are running properly.
If you can’t immediately determine the cause of excess heat, keep looking. Other possible causes of high temperature could include a clogged front intake, blockage of the exhaust or airflow, recent repositioning of the machine, or a dirty heat sink.
Note: to figure out of your server is running too hot, you can probably check with your vendor for baselines; many models come with acceptable temperature operating specifications.
Even a “healthy” server can give out if put under unusually excessive load (same goes for IT pros). Such failures in isolation are usually nothing to worry about. But a mysterious crash for no clear reason, on a server with no intensive process running on it? Cause for concern. Don’t just reboot and pray for the best. It’s time for a little CSI: server action.
“My computer’s running slow!” is undoubtedly one of the most popular help desk ticket subject lines of all time, and the cause could be almost anything. With a server though, sudden slowness is often the result of deep-seated problems that could put it at risk for failure.
For example, a process may cause a memory leak that could eat up all of your system resources, which could result in the system grinding to a halt. A simple software update might fix things in these instances, but your system may crash for other reasons. For example, your Linux server might decide to go read-only if your hard drive is acting up. Or data corruption might be causing applications to randomly fail. Over time, tiny problems will start to add up, and if regular maintenance isn’t enough to consistently keep your server in working order — it may be time for a replacement.
Really slow data transfer rates are a huge bottleneck and a big red flag for hard drive problems, as are a rising number of bad sectors that don’t respond to read/write operations. Strange noises (for HDDs) are also a warning sign, much like the hypothetical noisy furnace we mentioned earlier.
How to keep a watchful eye on your servers
If all of the things you need to keep track of to ensure proper server operation make your head spin, don’t worry… there’s help. A Managed Services Provider, like ETS gives you the power to keep tabs on servers within minutes of setup. You can closely monitor any critical systems or devices and also get alerts that let you know if something might be wrong with your server’s CPU, memory, disks, and more.