Software Raid on Ubuntu 20.04

Introduction

If you have data that requires more power than the typical laptop, creating a software RAID can be an easy and inexpensive solution. With the rise of data science and machine learning, processing large sets of data using Pandas, NumPy, Spark or even SQLite can drastically benefit from fast and resilient storage.

RAID (Redundant Array of Inexpensive Disks) is useful for achieving high performance and recovery if a disk fails. There are many types of configurations including hardware, pretend hardware (raid actually done in software), and software. RAID10 (mirrored and striped) provides high performance and redundancy against a single disk failure. RAID10 writes a copy of the data to two disks, and then in the 4-disk setup described below, creates a volume that spans a second set of two disks. This allows for reading and writing using all four disks, and potentially allows for up to two disk failures without losing data (if we are lucky). In the worst-case RAID10 can always suffer a single disk failure without losing any data.

Another popular approach is RAID5, which splits data across devices using an Exclusive OR (XOR) and has a parity block. RAID5 setups are common but have some pitfalls including compounding failures during rebuilds and checksum overhead. RAID5 configurations will lose data if two disks fail.

Rationale

Development using medium sized data can get expensive quickly using cloud providers. Using Amazon Web Services Elastic Block Store (AWS EBS), a comparable setup would run $160/mo, not including the additional charges of EC2 instances, network transit etc. AWS does have additional enterprise / data center capabilities priced in, however for many developer workflows a local setup is often more efficient and performant.

There are many choices and constraints to take into consideration with both architectures. AWS only includes 125MB/s throughput by default in the gp3 tier for durable storage, which is considerably slower than a local commodity RAID setup, and incurs additional latency on each IOP. On the other hand Samsung rates the 860 EVO 1TB for a 600 TBW lifecycle (5 year warranty), which in practice most workloads will never come close to using, but is an engineering decision to take into account.

Saving $1420 in the first year alone ($1920/yr each additional year) allows for more powerful compute and graphics components, not including additional cloud pricing premiums that would be incurred.

Hardware

TOTAL: ~$500

Configuration

Identify the target disks to be used for the RAID using blkid or fdisk -l:

sudo fdisk -l
Disk /dev/sdb: 931.53 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: Samsung SSD 860 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/sde: 931.53 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: Samsung SSD 860 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/sdc: 931.53 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: Samsung SSD 860 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/sdd: 931.53 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: Samsung SSD 860 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Create partitions for each disk:

sudo parted -a optimal /dev/sd{b,c,d,e} --script mklabel gpt
sudo parted -a optimal /dev/sd{b,c,d,e} --script mkpart primary ext4 0% 100%

Create a RAID10 software raid:

sudo apt install mdadm
sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/sd{b,c,d,e}1
watch -n 10 mdadm --detail /dev/md0

Once md0 has finished syncing, setup a filesystem:

sudo mkfs.ext4 /dev/md0

Add /etc/mdadm/mdadm.conf and /etc/fstab entries, which instruct the system on how to assemble the raid and mount the filesystem:

sudo mkdir /mnt/md0
sudo mdadm --detail /dev/md0 | sudo tee -a /etc/mdadm/mdadm.conf
echo "/dev/md0     /mnt/md0     ext4 defaults 0 0" | sudo tee -a /etc/fstab

In order to allow the volume to be used prior to the root filesystem being available, update the initial ramdisk to include the module and configuration:

sudo update-initramfs -u

SSD TRIM Support

By default Ubuntu 20.04 has periodic fstrim which discards blocks of data no longer in use, ensuring optimal performance:

# systemctl status fstrim.timer
● fstrim.timer - Discard unused blocks once a week
     Loaded: loaded (/lib/systemd/system/fstrim.timer; enabled; vendor preset: enabled)
     Active: active (waiting) since Wed 2021-03-17 11:30:39 MDT; 4h 1min ago
    Trigger: Mon 2021-03-22 00:00:00 MDT; 4 days left
   Triggers: ● fstrim.service
       Docs: man:fstrim

Mar 17 11:30:39 dxdt systemd[1]: Started Discard unused blocks once a week.

Performance

In this particular configuration we now have 1.8T of capacity, and even with budget disks get decent performance (388.88MB/sec using hdparm -tT):

# sudo hdparm -tT /dev/md0
/dev/md0:
 Timing cached reads:   38396 MB in  1.99 seconds = 19316.93 MB/sec
 Timing buffered disk reads: 1168 MB in  3.00 seconds = 388.88 MB/sec

# df -h /mnt/md0
Filesystem      Size  Used Avail Use% Mounted on
/dev/md0        1.8T   77M  1.7T   1% /mnt/md0

hdparm is not a precise benchmark. After running tests using bonnie++, md0 delivered 179MB/s writes and 392MB/s reads for the sequential tests.

The limiting factor is the Marvell 9215 PCIE 2.0 card, which has a maximum throughput of 380-450MB/s. Using higher end SATA controllers will likely yield improved results.

Conclusion

After creating the partition entries, assembling the raid, syncing across disks, creating a filesystem and updating /etc to reflect the changes, the RAID will now persist across restarts. Ubuntu will take care of periodically checking for unused blocks, and mdadm can be used to check the health of the raid device. This setup will likely exceed most commodity cloud storage performance offerings with lower latency, higher throughput and IOPS.