How To Solve SSD Longevity Challenges

There are lots of reasons to use SSD storage devices. SSD devices are lightning fast. You don’t have to wait for the drive to spin up a thin sheet of metal (or polymer). You don’t have to wait for a drive head to get properly positioned over the right physical location on the drive. [Note: This was a real problem with legacy disk storage – until EMC proved that cache is the best friend that any spinning media could have.] Today, solid state storage is demonstrably smaller than any physical storage media. But in the past few years, SSD longevity has become a serious concern.

Background

Many of us carry a few of these devices with us wherever we go. They are a very durable form of storage. You can drop a 250GB thumb drive (or SSD) into your pocket and be confident that your storage will be unaffected by the motion. If you did the same with a small hard disk, then you might find that data had been lost due to platter and/or R/W head damage.

Similarly, the speed, power, and thermal properties make these devices a fantastic inclusion into any mobile platform – whether it be a mobile phone, a tablet, or even a laptop. In fact, we just added SSD devices to a number of our systems. With these devices, we have exceptionally good multi-boot options at our disposal. For my personal system, I can boot to the laptop’s main drive (to run an Ubuntu 19.04 system) or I can boot to an external, USB-attached SSSD drive where I have Qubes 4.0 installed.

Whether you want fast data transfer speeds, reduced power needs, or a reduced physical footprint, SSD storage is an excellent solution. But it is not without its own drawbacks.

Disadvantages

No good solution comes without a few drawbacks. SSD is no exception. And the two real drawbacks are SSD cost and SSD longevity. The cost problems are real. But they are diminishing over time. As more phones are coming with additional storage (e.g., 128GB – 256GB of solid state storage on recent flagship phones), the chip manufacturers have responded with new fabrication facilities. But even more importantly, there is now substantial supply competition. And increased supply necessarily results in price reductions.

Even more importantly, device construction is becoming less complex. Manufacturers can stuff an enclosure with power, thermal flow control, media, rotational controls (e.g., stepper motors, servos), and an assortment of programmable circuits. Or manufacturers can just put power and circuits into a chip (or chip array). For things like laptops, this design streamlining is allowing vendors to swap spinning platters for additional antenna arrays. The result of this is inevitable. Manufacturing is less complex. Integration costs (and testing costs) are also less. This means that the unit costs of manufacturing are declining.

Taken together increased supply and decreased costs have bent the production function. So SSD is an evolutionary technology that is rapidly displacing spinning media. But there is still one key disadvantage: SSD longevity.

SSD Longevity

In the late eighties, the floppy disk was replaced by optical media. The floppy (or rigid floppy) was replaced by the CD-ROM. In the nineties, the CD-ROM gave way to the DVD-ROM. But in both of these transitions, the successor technology had superior durability and longevity. That is not the case for SSD storage. If you were to treat an EEPROM like a cD-ROM or DVD-ROM, it would probably last for 10+ years. But the cost per write would be immense.

Due to its current costs, no one is using SSD devices for WORM (write once, read many) storage. These devices are just too costly to be written as an analog to tape storage. Instead, SSD’s are being used for re-writable storage. And this is where the real issue arises. As you re-write data (via electrical erasure and new writing), the specific physical location in the chip becomes somewhat unstable. After numerous cycles, this location can become unusable. So manufacturers are now publishing the number of program / erase cycles (i.e. p/e cycles) that their devices are rated to deliver.

But is there a real risk of exhausting the re-write potential of your SSD device? Yes, there is a real risk. But with every new generation of chips, the probability of failure is declining. Nevertheless, probabilities are not your biggest concern. Most CIO’s should be concerned with risk. If you data is critical, then the risk is real – regardless of the probabilities for failure.

Technology Is Not The Answer

Most technologists focus on technology. Most CIO’s focus on cost / benefit or risk / reward. While scientific and engineering advances will decrease the probability of SSD failure, these advances won’t really affect the cost (and risks) associated with an inevitable failure. So the only real solutions are ones to mitigate a failure and to minimize the cost of recovery. When a failure occurs (and it will occur), how will you recover your data?

Bypass The Problem

One of the simplest things that you can do is to limit the use of your SSD devices. That may sound strange. But consider this. When a failure occurs, your system (OS and device drivers) will mark a “sector” as bad and write the data to an alternate location. If such a location exists, then you continue ahead w/o incurring any real impact.

The practical upshot of this is that you should always seek to limit how much data is written to the device in order to ensure that there is ample space for rewriting the data to a known “good” sector. Personally, I’m risk averse. So I usually recommend that you limit SSD usage to ~50% of total space. Some people will recommend ~30%. But I would only recommend this amount of unused space if your SSD device is rated for higher p/e cycles.

Data Backup and Recovery Processes

For most people and most organizations,it takes a lot to recover from a failure. And this is true because most organizations do not have a comprehensive backup and recovery program in place. In case of an SSD failure, you need to have good backups. And you should continue to perform these backups until the cost of making backups exceeds the costs of recovering from a failure.

For a homeowner who has a bunch of Raspberry Pi’s running control systems, the cost of doing backups is minimal. You should have good backups for every specific control system that you operate. For our customers, we recommend that routine backups be conducted for every instance of Home Assistant, OpenHab, and any other control system that the customer operates.

For small businesses, we recommend that backup and recovery services be negotiated into any management contract that you have with technology providers. If you have no such contracts, then you must make sure that your “in-house” IT professionals take the job of backup and recovery very seriously.

Of course, we also recommend that there be appropriate asset management, change management, and configuration management protocols in place. While not necessary in a home, these are essential for any and all businesses.

Bottom Line

SSD devices will be part of your IT arsenal. In fact, they probably already are a part of your portfolio – whether you know it or not. And while SSD devices are becoming less costly and more ubiquitous, they are not the same as HDD technology. Their advantages come at a cost: SSD longevity. SSD devices have a higher probability of failure than do already-established storage technologies. Specifically, they do have a higher probability of failure. Therefore, make sure that you have processes in place to minimize the impact of failures and to minimize the cost of conducting a recovery.

Disintegration and Compartmentalization: Necessary Best Practices

Safety Deposit Boxes in Safe Bank.

Several months ago, I wrote about my never-ending privacy story. Since then, I’ve given numerous presentations about security and personal privacy. In one of those presentations, I talked about how using personal clouds (e.g., Nextcloud) could limit your exposure to those who offer you their “free” services in exchange for your personal data. But there has always been an elephant in the room. Specifically, we want to have a simple and easy desktop experience – myself included. And most people will trade almost anything for that experience. But those carefree times where everything is “free” and everything is “safe” are now disappearing. So to kick my privacy efforts up another notch, I’ve begun the process of online compartmentalization.

As you read that word, many of you might be thinking about the psychological consequences of compartmentalizing your life. And almost every psychologist will tell you that breaking your life down into smaller fragments separated by impenetrable walls can be unhealthy. These self-imposed walls separate your family life from your work life and your faith life. Some people keep all sorts of separate personalities locked up in secure closets. And this can be a terrible burden.

But when it comes to privacy and security, you can no longer afford to keep all of your eggs in one basket. In fact, compartmentalization is now becoming an altogether mandatory part of a “connected” life. You should not let data from your home life be accessible to actors in your work life. And it would be wise to dis-integrate your work life from your home life.

The Technologies of Disintegration

In order to protect the integrity of the various roles in our life, you need to isolate data. But that is increasingly difficult. For example, most businesses ask you to be “on call” twenty-four hours a day, seven days a week. But they don’t want to pay for a separate phone. And they want to ensure that any personal equipment does not exfiltrate company data and/or intellectual property. So most companies reserve the right to access all of your phone’s capabilities (and data) in order to protect any of their data which might be on the phone.

You can easily see the problems with this example. If you are considering alternate employment, it might be unwise to let your current employer have unfettered access your email and instant messages with potential future employers. Fortunately, there are technologies that can help you build the walls that you might want (or need). These include: virtualization, containers, and secure cloud services.

Step One: Use Application Virtualization

We are victims of a culture that shares way too much information. For many of us, we willingly share data with companies that we shouldn’t trust. We do this so that we can share even more personal data with friends who really aren’t our friends.

And we count upon our applications to enable this kind of sharing. We unconsciously (and indiscriminately) copy and paste data between apps. Of course, this allows bad actors to exploit data sharing as a channel for data exfiltration or data corruption.

But if we want to protect ourselves, we need to erect barriers between apps. And the latest means of erecting such barriers is to exploit containers. Whether we use snap or flatpak, we are adding an execution layer that seeks to impose barriers. And the same thing is rue for the other darling of micro-services: Docker. Like the app management tools provided by Linux distro teams, the folks at Docker are trying to standardize application execution and enable application isolation.

Among other activities this summer, I’ve invested quite a bit of personal time into Docker, docker-compose, and a variety of support apps. And I now use Docker for Plex, Let’s Encrypt, most web servers (and proxies), the TICK stack (i.e., Telegraf, InfluxDB, Chronograf, and Kapacitor), and a variety of home automation applications.

Step Two: Use A Secure OS

Nevertheless, sometimes, you need more than just a good application manager. In order to effectively use compartmentalization as a defense, you need to get onto a more secure OS. Most security experts will tell you that there are many platforms that are intrinsically more secure than Windows. Yes, you can harden Windows. I know. I’ve done it for myself and for other. At the same time, you need to use a platform that is not built by someone who makes money off of your identity (e.g., Apple).

Earlier this summer, I finally switched to a Linux-only infrastructure. All of my Windows servers are gone. And all of my Windows desktops are now Linux desktops. I have rooted all of the phones that I can and replaced their OS with one that is no longer dependent upon Google services.

Step Three: Use System Virtualization

While you may run your apps in virtual environments and/or containers, you probably need more compartmentalization. Yes, you should isolate your apps. But you also need to isolate systems from one another. Indeed, there are times when you need more than just a secure app. You need a secure stack.

Over the past few months, I’ve started using virtual machines to isolate applications that are accessible from the Internet. I do this so that I can minimize the damage that can be done from any single app to the OS that it runs upon. By adding system isolation in addition to app isolation, I have increased the security and availability of my customer applications.

Step Four: Use The Most Secure Platform That You Can Afford

All of us can be more secure. But for some of us, the cost of maximum security must be paid – either in coin of the realm or in tokens of inconvenience. For me, my most important resource is my time. So I carefully choose each and every experiment that I undertake. And this past weekend, I finally chose to take the leap – and I finally added Qubes OS 4.0 to my core laptop.

The process of moving to Qubes was frustrating. I had just reclaimed a 500GB external SSD drive. And it took about four (4) hours to get Qubes installed. It’s really not that hard. But special partitioning and formatting was required in order to write to the drive. In the end, I had to write the boot image onto a raw partition on a thumb drive. I then had to update grub on my internal drive so that I could multi-boot. Finally, I re-partitioned the SSD drive and finally wrote Qubes to the external drive. After completing the installation, I can now boot to either my internal Ubuntu 19.04 system or to my Qubes OS 4.0 system.

Step Five: Consciously Choose Your Threshold of Inconvenience

I must now learn how to use my “reasonably secure OS” to perform my day-to-day activities. Last night, I spent a few hours setting up my entertainment / streaming apps. [Note: Yes, they are important. I really do like to listen to music as I write.] And for what it’s worth, I am now writing this post from my Qubes OS system. It took some time to set up NoScript properly. But once I did that, I’ve had little problems with this blog post. And earlier this morning.

Alright, that’s not altogether true. The simple process of sharing files between processes is a tad more complex. For example, when taking a screenshot of the entire desktop, the file is stored in the dom0 (i.e.,master domain) file system. So I had to learn how to copy files to/from dom0. But once I figured that out, I realized that the process isn’t nearly as hard as it had originally seemed.

Takeaways

I’ve finally addressed some structural insecurities in how I use my computers – both at work and at home.

  • We moved to a Linux-based system.
  • The team migrated to containers both for casual (desktop) apps and for more service-oriented applications.
  • Our IT team moved key services onto virtual machines that could be isolated from less disciplined processes.
  • Finally, I converted my primary laptop to an even more secure OS (i.e., QubesOS) – one that features compartmentalization and maximum isolation).

Do you need to do all of these things? I won’t answer that for you. But as for myself, I needed to become more secure. So I took those steps that I needed to take in order to become safer and to secure my private life from public scrutiny.