So this last week, I spun up ceph on my 3 hosts and pretty impressed by it. I've moved all my container workloads to store their data on it, so fingers crossed I dont have any serious problems (because I'm too young
on knowledge to know how to fix it).
Really interested to hear how you go with this. It's one of those things I knew was there but have never tried (partly because I didn't have the time to sort it out if it went sideways).
I was using glusterfs for a while, but for reasons I don't recall I ended up converting to plain old NFS for my persistent storage.
BTW: I came across your youtube on the "red led" issue with QNAPs. I
have a TS451+ that died from it, and a TS251+ which lives in my van that is yet to hit it.
While my QNAP actually runs TrueNAS, I didnt need to fix it to get the data off, I just moved the drives to a new QNAP 664, booted off the USB disk (that has TrueNAS) and away I went.
I did "fix" the 451+ (well it appears to fix) with the resister - I came across your youtube when I was researching what I needed to do - so most helpeful :) I dont use that 451 anymore, its in a box gathering dust :(
I still have that NAS - sometime I'll find a SBC I can put in the bottom of it to replace the factory board. I was looking at a Pi, but I only get PCIe x1 and not the x4 I need... I just can't bring myself to throw out what looks like a good candidate to be reincarnated as a new low power NAS.
deon wrote to tassiebob <=-
Me too, I dont like throwing away stuff that still could have a useful life (to somebody anyway).
Really interested to hear how you go with this. It's one of those things I knew was there but have never tried (partly because I didn't have the time to sort it out if it went sideways).
So this weekend, I did some updates to the hosts running ceph (updating packages, etc), and rebooted each host after their updates (one at a time).
While I didnt do much testing for stuff being accessible while a host
was down, it all appeared to be ok - even though there was a delay I
guess while ceph figured out a node was down and had to shuffle around
who was the next "master" to handle the IO.
Pretty happy with this setup - I was prevously using a proprietary file system, which I had to nurse if I rebooted nodes - and occassionally drives would go offline, especially if there was busy I/O going on (all three nodes are VMs of the same host).
So the only thing I need to figure out (learn) if single node dies, rebuilding back the third node and hopefully not loosing data along the way. I'll tackle that when I get to it... ;)
I had that kind of experience with a docker swarm not that long ago. I had 3 manager nodes and 2 workers. Upgraded one of the manager nodes and the swarm fell apart. Probably something specific to the specific from/to versions, as it had always worked prior, and since (until I retired the swarm maybe a month ago).
I'm about to embark on talos/kubenetes with rooks(ceph) now that I'm comfortable with ceph. The goal is to retire the swarm and use kube...
Its going to get bumpy...
Sysop: | smooth0401 |
---|---|
Location: | New Providence, NJ |
Users: | 6 |
Nodes: | 4 (0 / 4) |
Uptime: | 105:58:04 |
Calls: | 87 |
Files: | 301 |
D/L today: |
2 files (1,548K bytes) |
Messages: | 41,272 |