Without a doubt, the release of ultra-fast M.2 NVMe PCIe SSD drives has played a huge role in not only the IT world as a whole, but also in the photography community, where more and more photographers are choosing to build their own machines in order to speed up their photography workflow. Using M.2 NVMe drives for storing Lightroom catalogs, RAW files and cached data can speed up performance considerably, which is why many photographers, including myself, have been choosing these drives for our needs. However, after using M.2 NVMe SSD drives in my PC builds, I realized that they come with overheating problems, which can potentially lead to more frequent failures than hard drives or standard SSD drives. Having seen a couple of M.2 drive failures in the past few years, and having recently experienced a complete drive failure myself in a build that is less than two years old, I wanted to warn our readers about use of these drives in their environments.
Let’s take a look at what causes these drives to fail and how you can keep your data safe.
M.2 NVMe Overheating Issues
Having spent most of my life in the IT world, I know one thing for sure – excessive heat in electronics is never a good thing, especially when it comes to storage. To put it simply, cooling electronics is essential in order to keep them running longer. With the modern NVMe drives in small M.2 form factor, manufacturers are not making it clear enough that their drives must be properly cooled. With modern motherboards providing M.2 slots for these blazing fast drives, many PC builders are using their motherboards to host M.2 NVMe SSD drives, often putting them in areas that receive little to no cooling. And if the computer case is full of equipment, there is a likelihood that peripherals that produce a lot of heat (such as video cards) will raise the ambient temperature even higher, leading to more M.2 storage hardware failures.
In fact, considering how hot these things get, manufacturers should be providing proper heat sinks, with specific instructions to keep them nice and cool. The most that I have seen done is by Samsung, which puts an aluminum sticker on its NVMe M.2 drives to spread some of the heat. What’s crazy, is that you are not supposed to remove the sticker, or it will void the warranty. This means that removing the sticker and installing a heat sink for cooling the device is not really an option for most people who want to be able to maintain that 3 to 5 year warranty. To keep the heat under control, Samsung and other manufacturers came up with ways to throttle the performance of M.2 NVMe SSD drives, which can reduce their performance enormously, stripping away their primary appeal as ultra-fast storage devices.
I found out about the heating issues before buying my first M.2 NVMe PCIe SSD drive (which was the 512 GB Samsung 950 Pro). I purchased it with a heat sink, but after some consideration, I decided against removing the label and voiding the warranty. Instead, I went ahead and placed my unit in an open area of my large tower case with good airflow, making sure that air from my case fan directly reaches the unit and cools it off. The device worked well, but under heavy loads it would still heat up and throttle. To reduce the chance of any potential failure and resulting data loss, I decided to only store my operating system on the drive and use it for caching and my Lightroom catalogs, which get frequently backed up anyway. I did the same with two other M.2 NVMe PCIe SSD drives, although they were the newer Samsung 960 Pro versions.
One of the drives failed after about 20 months, resulting in total data loss. Aside from baking my drive in an oven (believe it or not, but that method does work in some cases), I tried a number of different tricks to revitalize it, but to no avail. The hard drive light lit up each time anything tried to access the drive and it would go on and off for hours. Since I had another 3 years of warranty on the device, I decided to send it in and just get it replaced, as I didn’t care for the data I lost on it. However, it did leave a bad taste in the mouth, and I thought about how horrible it would have been to lose data, if it had not been backed up. There are plenty of data recovery options for spinning drives – labs that specialize in hard drive data recovery and they can often do it at reasonable costs (as little as $300 per drive). But recovering data from these new SSD drives is going to cost you, since not many labs even have the right equipment to be able to do it. This is why backing up your data is critical!
Back Up More Frequently
If you decide to use any SSD drives in your computer build (and you should, since they are extremely fast), no matter what form factor, you should plan for a very thorough and working backup solution. You must make it an essential part of your workflow – that’s how critical it is. I often come across photographers who only use their laptops for their post-processing needs, and it is shocking to see how many photographers out there never back anything up and only store their catalogs and photos on the same drive (which is often an SSD drive). SSD drives are more reliable than spinning drives in the long run and they might not get damaged when they are dropped, but it does not mean that they are fail-safe. Another good lesson I learned from my IT career: every type of storage fails, it is just a matter of time! Don’t be fooled into thinking that your SSD drive will never fail you – always back up your data, and make sure to back up often.
Avoid Storing Critical Data on M.2 Drives
If you cannot commit to a proper backup routine for any reason, simply avoid storing any critical data on M.2 NVMe SSD drives. Keep your RAW files in a dedicated external storage array, such as the Synology DS1817+ or a QNAP TVS-882T and automate your backup process by dumping data on another external drive from those storage arrays directly (very easy and quick to set up). If those are too expensive for your needs, you can buy smaller and cheaper storage solutions that will not only address your immediate storage needs, but also protect you from losing data in the future. That’s what I do, and after experiencing the overheating issues pointed out above, I decided to follow the same methodology on all computers I have in my environment.
Lastly, please don’t assume that a RAID array, such as RAID 5 or RAID 6, is a backup of your data! Losing a RAID volume due to power failures, multiple drive failures and other hardware-related failures can be very painful and can cost thousands of dollars to recover. Always make sure to follow the 3-2-1 backup rule, as highlighted in my Photography Backup Workflow article.
Do you use M.2 NVMe PCIe SSD drives in your environment? If you do, please let me know in the comments below if you have experienced any overheating problems or failures. Also, if you are guilty of not backing up your data, feel free to confess and repent – we are not here to make you feel bad, but rather to help you not lose any data :)
I built up a beefy box with an Intel 9900x specifically for Lightroom and Photoshop processing a couple of years ago. I installed a Samsung pro NVMe drive for my primary drive and have had no issues at all. Having built PCs for over 30 years, I never skimp on basic components like cases and power supplies. Because of many components running hotter over the years, my last two builds have been on Silverstone cases which are bulky because they have A LOT of cooling fans and a lot of space around components for air circulation. Along with beefy power conditioning, the last component failure I had was more than 15 years ago. I also use monitoring software including CrystalDiskInfo for physical storage devices and I could see right out of the box that the NVMe drive was running hot. So I simply installed an NVMe cooling fan and it always runs well within it’s recommended temperature range. If you spend a lot of money on building up a powerful box for photo editing and don’t have proper cooling, besides failure risk it likely won’t achieve peak performance under heavy loads. This is because current high performance CPUs and NVMe automatically slow down once temps get too high to avoid over temp failures.
Yeah, backups are really important but trying to mitigate overheating failures with a backup strategy is like changing the spark plugs to avoid getting a flat tire.
Love this article . I had a nvme go after 2 years too . I feel board makers putting m.2 slots right near cpu socket is just plain daft one hottest parts of the motherboard potentially!
Nasim & authors/readers,
a 13 monthes old box computer of mine went dead in an unexpected moment 3 weeks ago. It contains an NVMe M2.0 fast SSD, configured as drive C: boot device with Win 10. In our practice the box has been an excellent choice until the failure. With two users configuration, and with moderate daily use it usually woke up in 30 seconds, we could always use it with pleasure.
The failure was a sudden blackout screen (during my watching a youtube video). Windows became unavailable. Some subsequent restart to the box terminated soon in lightblue screen without any visible evidence of Windows startup. Then all further switching on to the box displayed BIOS screen. Checking there the boot information we found a “No NVMe device found” setting.
The NVME device contained drive C: where Windows and some applications were installed. All user data were allocated to other drives.
Considering the one year long service of my box, it survived one season of hot summer without any problem. At the failure the room temperature might have been max. 20 Celsius. Just recently I knew that voltage drops can cause breakdown in NVMe operation. In our environment we may rarely have such voltage drops, and I have not yet invested in a power stabilizer so far. So with this background, a failure triggered by electric power drop is suspicious. i am almost sure that no overheating caused this breakdown.
Are we please aware of any more sophisticated approach that can check/diagnose current service status of the NVMe card? Thank you in advance.
Kind regards,
Peter Füzi (Hungary, Europe)
I have a GIGABYTE PCIe 256GB in my laptop LENOVO 330S, after 18 months of use the pc not recognize it anymore. Reading the blog, now I believe the disk crashed after 2 days of continuous use because the heat.
Unfortunately I losed the warranty (just six months). Avoid the heat in a laptop inside’s is difficult, ¿What can i do if purchase another ssd?
I recommend putting your laptop ontop of a laptop cooler.
You should know that if you were using it for storage, you want NAND to be cold, but for writes the NAND needs to be warm and cutting the temp will massively reduce the life.
Watch this video from 6:15 onward.
www.youtube.com/watch…zSIfxHppPY
I have an INTEL 1TB SSD that just became “invisible” to my PC–after an update from GIGABYTE. I tried re-seating the drive–but to no avail.
It would’ve been nice to have some warning before the drive decided to go AWOL on me!
I had an SSD drive in a laptop fail on me this year and a lab said it was unrecoverable.
Now, in a VESA desktop, the NVME drive is having overheating issues and not being recognized.
my new M.2 looks like died after a month of perfect work . i had windows on it . its still under warranty.
I’ve been running NVMe SSD’s of several brands, beginning with the 512GB Samsung 950 PRO, also MyDigitalBPX (Phison7 Controller), MyDigitalBPX PRO (Phison12 Controller) & lastly, upgraded the 950 PRO to a 970 PRO of the same size in my Z97 Extreme6 build. The others were in either other Z97 or AMD 990 builds (FX-8350/8370).
Here’s a little secret I learned before installing the first one…..heat has been a long term issue with these NVMe drives & many were used in Linux builds long before Windows users (on the consumer level) knew of them.
The trick is simple, if there’s a 2nd PCIe 3.0 x4 (or 2.0 with older AMD) GPU slot, all that’s needed & sold on Amazon is a Sintech PCIe 3.0 x4 adapter with fan for less than $20. Sure, the fan is cheap, has a rifle bearing, will begin to squeal for a minute at boot after a month. However, there’s plenty of double ball bearing of the same size in a 4 pack for less than $15 shipped on eBay. So for less than $35 (after eventual fan replacement), one has the option to keep the NVMe drive nice & cool, there’s an adjustable knob for speed, from off to wide open. I run mine wide open & have never had to replace a fan after the first time.
More important, have never lost a drive due to heat. The MyDigitalBPX (Phison7 based) runs near twice as hot as a Samsung 950 PRO & this adapter with fan still keeps these under 55C or lower, often under 40C with Samsung NVMe drives.
A far as SSD’s ‘sitting’ for long periods between boots, I’ve had no issues with booting a couple stored for over a year & one over two years. Ran fine, although performed the Intel SSD Optimizer or Samsung equivalent (why I saved the older Samsung Magician installers).
BTW, I also use Noctua fans, even in my Dell machines, there’s an adapter to convert the proprietary one to fit 4 pin PWM fan headers. What I do if the option is in the BIOS is place a check in override of automatic fan control, this makes these run at 100% (just have to strike ‘F1’ at boot. Have a XPS 8700 where I had to use SpeedFan instead.
While the above does notebook users little to no good, especially photographers who needs these on the job, I’d hope that the same would backup any work performed daily. Have 6 2TB RE4’s that’s served me well as backup drives & many of the 1TB size for data in PC’s. Just had to use the little wdidle3 /d trick to stop the head parking ever 8 seconds, this causes more wear & tear on the drives than anything. Granted, some leaves the timer on, can be set to every 300 seconds with the small FreeDOS boot disk. Again, never had the first drive to fail, own over 20 RE4’s.
Thought I’d give my two cents on the matter. The good thing with the newer PCIe 4.0 NVMe drives are that many ships with heatsinks, Hopefully Samsung will follow suit when they release the 980 PRO/EVO series & will surely be PCIe 4.0.
Cat
nvme slots are often on the BACK side of the motherboard. PCI fans are not gonna help there. Many cases don’t even have grills back there for cooling.
I’ve had 2 new NVME drives go literally back to back in external aluminium enclosures with thermal strips for contact with case. really dissapointed got myself heavily invested in them for rapido but now see must retreat to spining drives and good ole usb sticks.
mind you I then inserted an el cheapo NVME drive by pioneer that took the data , generated some heat but didn’t crap out. somebody needs lynching…
The best way to handle the situation is by not using the product. No one “needs” lynching!
Wise after the event, though!