M.2 NVMe Drive Overheating and Failure Issues

Without a doubt, the release of ultra-fast M.2 NVMe PCIe SSD drives has played a huge role in not only the IT world as a whole, but also in the photography community, where more and more photographers are choosing to build their own machines in order to speed up their photography workflow. Using M.2 NVMe drives for storing Lightroom catalogs, RAW files and cached data can speed up performance considerably, which is why many photographers, including myself, have been choosing these drives for our needs. However, after using M.2 NVMe SSD drives in my PC builds, I realized that they come with overheating problems, which can potentially lead to more frequent failures than hard drives or standard SSD drives. Having seen a couple of M.2 drive failures in the past few years, and having recently experienced a complete drive failure myself in a build that is less than two years old, I wanted to warn our readers about use of these drives in their environments.

Let’s take a look at what causes these drives to fail and how you can keep your data safe.

M.2 NVMe Overheating Issues

Having spent most of my life in the IT world, I know one thing for sure – excessive heat in electronics is never a good thing, especially when it comes to storage. To put it simply, cooling electronics is essential in order to keep them running longer. With the modern NVMe drives in small M.2 form factor, manufacturers are not making it clear enough that their drives must be properly cooled. With modern motherboards providing M.2 slots for these blazing fast drives, many PC builders are using their motherboards to host M.2 NVMe SSD drives, often putting them in areas that receive little to no cooling. And if the computer case is full of equipment, there is a likelihood that peripherals that produce a lot of heat (such as video cards) will raise the ambient temperature even higher, leading to more M.2 storage hardware failures.

In fact, considering how hot these things get, manufacturers should be providing proper heat sinks, with specific instructions to keep them nice and cool. The most that I have seen done is by Samsung, which puts an aluminum sticker on its NVMe M.2 drives to spread some of the heat. What’s crazy, is that you are not supposed to remove the sticker, or it will void the warranty. This means that removing the sticker and installing a heat sink for cooling the device is not really an option for most people who want to be able to maintain that 3 to 5 year warranty. To keep the heat under control, Samsung and other manufacturers came up with ways to throttle the performance of M.2 NVMe SSD drives, which can reduce their performance enormously, stripping away their primary appeal as ultra-fast storage devices.

I found out about the heating issues before buying my first M.2 NVMe PCIe SSD drive (which was the 512 GB Samsung 950 Pro). I purchased it with a heat sink, but after some consideration, I decided against removing the label and voiding the warranty. Instead, I went ahead and placed my unit in an open area of my large tower case with good airflow, making sure that air from my case fan directly reaches the unit and cools it off. The device worked well, but under heavy loads it would still heat up and throttle. To reduce the chance of any potential failure and resulting data loss, I decided to only store my operating system on the drive and use it for caching and my Lightroom catalogs, which get frequently backed up anyway. I did the same with two other M.2 NVMe PCIe SSD drives, although they were the newer Samsung 960 Pro versions.

One of the drives failed after about 20 months, resulting in total data loss. Aside from baking my drive in an oven (believe it or not, but that method does work in some cases), I tried a number of different tricks to revitalize it, but to no avail. The hard drive light lit up each time anything tried to access the drive and it would go on and off for hours. Since I had another 3 years of warranty on the device, I decided to send it in and just get it replaced, as I didn’t care for the data I lost on it. However, it did leave a bad taste in the mouth, and I thought about how horrible it would have been to lose data, if it had not been backed up. There are plenty of data recovery options for spinning drives – labs that specialize in hard drive data recovery and they can often do it at reasonable costs (as little as $300 per drive). But recovering data from these new SSD drives is going to cost you, since not many labs even have the right equipment to be able to do it. This is why backing up your data is critical!

Back Up More Frequently

If you decide to use any SSD drives in your computer build (and you should, since they are extremely fast), no matter what form factor, you should plan for a very thorough and working backup solution. You must make it an essential part of your workflow – that’s how critical it is. I often come across photographers who only use their laptops for their post-processing needs, and it is shocking to see how many photographers out there never back anything up and only store their catalogs and photos on the same drive (which is often an SSD drive). SSD drives are more reliable than spinning drives in the long run and they might not get damaged when they are dropped, but it does not mean that they are fail-safe. Another good lesson I learned from my IT career: every type of storage fails, it is just a matter of time! Don’t be fooled into thinking that your SSD drive will never fail you – always back up your data, and make sure to back up often.

Avoid Storing Critical Data on M.2 Drives

If you cannot commit to a proper backup routine for any reason, simply avoid storing any critical data on M.2 NVMe SSD drives. Keep your RAW files in a dedicated external storage array, such as the Synology DS1817+ or a QNAP TVS-882T and automate your backup process by dumping data on another external drive from those storage arrays directly (very easy and quick to set up). If those are too expensive for your needs, you can buy smaller and cheaper storage solutions that will not only address your immediate storage needs, but also protect you from losing data in the future. That’s what I do, and after experiencing the overheating issues pointed out above, I decided to follow the same methodology on all computers I have in my environment.

Lastly, please don’t assume that a RAID array, such as RAID 5 or RAID 6, is a backup of your data! Losing a RAID volume due to power failures, multiple drive failures and other hardware-related failures can be very painful and can cost thousands of dollars to recover. Always make sure to follow the 3-2-1 backup rule, as highlighted in my Photography Backup Workflow article.

Do you use M.2 NVMe PCIe SSD drives in your environment? If you do, please let me know in the comments below if you have experienced any overheating problems or failures. Also, if you are guilty of not backing up your data, feel free to confess and repent – we are not here to make you feel bad, but rather to help you not lose any data :)

Exit mobile version