• CEPH

    From deon@1337:2/101 to All on Sunday, October 06, 2024 23:25:59
    Howdy,

    One of the things I've wanted to try (for a while now) is ceph. I remember looking at it a year or two ago and decided at the time that it was complicated to setup and install, and I would need to find time to figure it out. So it was always put on the back burner and never got there.

    I wanted to try it out, because I use docker (pretty much for everything) in swarm mode - with containers floating between 3 hosts. Persistent cluster storage in this config is pretty much a must - and things are so much easier if you have it.

    I recently watched a NetworkChuck video (never watched him before, but some how I got to watch one of his videos on ceph) and my image of ceph changed - didnt seem it was complicated to install at all.

    So this last week, I spun up ceph on my 3 hosts and pretty impressed by it. I've moved all my container workloads to store their data on it, so fingers crossed I dont have any serious problems (because I'm too young on knowledge to know how to fix it).

    Anyway, I recall that ceph is also available on proxmox, and I think a few of you run it? Just curious on how well its running and any gotchas to be aware of.


    ...лоеп
    --- SBBSecho 3.20-Linux
    * Origin: I'm playing with ANSI+videotex - wanna play too? (1337:2/101)
  • From tassiebob@1337:2/106 to deon on Monday, October 07, 2024 19:37:30
    So this last week, I spun up ceph on my 3 hosts and pretty impressed by it. I've moved all my container workloads to store their data on it, so fingers crossed I dont have any serious problems (because I'm too young
    on knowledge to know how to fix it).

    Really interested to hear how you go with this. It's one of those things I knew was there but have never tried (partly because I didn't have the time to sort it out if it went sideways).

    I was using glusterfs for a while, but for reasons I don't recall I ended up converting to plain old NFS for my persistent storage.

    I did have a docker swarm too, although I've only just scaled that back to a single standalone docker host and moved some of the containers to a linode VM. Trying to get the power bill back under control, lol.

    Cheers,
    Bob.

    --- Mystic BBS v1.12 A47 2021/12/24 (Linux/64)
    * Origin: TassieBob BBS, Hobart, Tasmania (1337:2/106)
  • From deon@1337:2/101 to tassiebob on Monday, October 07, 2024 19:55:45
    Re: Re: CEPH
    By: tassiebob to deon on Mon Oct 07 2024 07:37 pm

    Howdy,

    Really interested to hear how you go with this. It's one of those things I knew was there but have never tried (partly because I didn't have the time to sort it out if it went sideways).

    So I might be naive, but I'm thinking there shouldnt be too much that goes wrong (me praying) - because the deployment is pretty much orchestrated in docker. So the only things will be configuration related (how do I do ...) and procedural (this broke, what are the steps to fix...).

    Ceph is pretty popular by google searches, so I'm thinking I'll find the answers to my problems as they crop up (so far that's been the case...)

    I have my fingers crossed... ;)

    I was using glusterfs for a while, but for reasons I don't recall I ended up converting to plain old NFS for my persistent storage.

    Yeah, I'm retiring my gluster with ceph. While gluster wasnt being used for docker containers before (I tried, and immediately hit issues with buffer reads - if I recall it was easy to hit with postgres, and there were configuration fixes that got around it), I was using gluster as a shared home directory amongst my hosts. (Most of my "hosts" are VMs across a couple of machines and PIs).

    BTW: I came across your youtube on the "red led" issue with QNAPs. I have a TS451+ that died from it, and a TS251+ which lives in my van that is yet to hit it.

    While my QNAP actually runs TrueNAS, I didnt need to fix it to get the data off, I just moved the drives to a new QNAP 664, booted off the USB disk (that has TrueNAS) and away I went.

    I did "fix" the 451+ (well it appears to fix) with the resister - I came across your youtube when I was researching what I needed to do - so most helpeful :) I dont use that 451 anymore, its in a box gathering dust :(


    ...лоеп
    --- SBBSecho 3.20-Linux
    * Origin: I'm playing with ANSI+videotex - wanna play too? (1337:2/101)
  • From tassiebob@1337:2/106 to deon on Tuesday, October 08, 2024 21:06:10
    BTW: I came across your youtube on the "red led" issue with QNAPs. I
    have a TS451+ that died from it, and a TS251+ which lives in my van that is yet to hit it.

    I still have that NAS - sometime I'll find a SBC I can put in the bottom of it to replace the factory board. I was looking at a Pi, but I only get PCIe x1 and not the x4 I need... I just can't bring myself to throw out what looks like a good candidate to be reincarnated as a new low power NAS.

    While my QNAP actually runs TrueNAS, I didnt need to fix it to get the data off, I just moved the drives to a new QNAP 664, booted off the USB disk (that has TrueNAS) and away I went.

    I'm sure I could have done that - the QNAP OS is just Linux at heart - it was just easier to fudge the resistor in and copy across the network :-)

    I did "fix" the 451+ (well it appears to fix) with the resister - I came across your youtube when I was researching what I needed to do - so most helpeful :) I dont use that 451 anymore, its in a box gathering dust :(

    Glad it was helpful. It's just a temporary fix, but enough to get it running and get the data off :-)

    --- Mystic BBS v1.12 A48 (Linux/64)
    * Origin: TassieBob BBS, Hobart, Tasmania (1337:2/106)
  • From deon@1337:2/101 to tassiebob on Wednesday, October 09, 2024 10:21:20
    Re: Re: CEPH
    By: tassiebob to deon on Tue Oct 08 2024 09:06 pm

    Howdy,

    I still have that NAS - sometime I'll find a SBC I can put in the bottom of it to replace the factory board. I was looking at a Pi, but I only get PCIe x1 and not the x4 I need... I just can't bring myself to throw out what looks like a good candidate to be reincarnated as a new low power NAS.

    Me too, I dont like throwing away stuff that still could have a useful life (to somebody anyway).

    Let me know if you find a suitable board. I'm toying with re-using mine as just a "normal" linux/docker machine to run my BBS and other small projects. It could replace my apu1d machines :)



    ...лоеп
    --- SBBSecho 3.20-Linux
    * Origin: I'm playing with ANSI+videotex - wanna play too? (1337:2/101)
  • From poindexter FORTRAN@1337:3/178 to deon on Thursday, October 10, 2024 07:53:00
    deon wrote to tassiebob <=-

    Me too, I dont like throwing away stuff that still could have a useful life (to somebody anyway).

    I have a desktop with an i7-4790 that runs fine, sitting in my closet.
    I upgraded in order to get more RAM, NVMe support, support for Windows
    11 and a newer CPU, but having that old box sitting idle is killing me.



    --- MultiMail/Win v0.52
    * Origin: realitycheckBBS.org -- information is power. (1337:3/178)
  • From deon@1337:2/101 to tassiebob on Saturday, October 12, 2024 17:38:15
    Re: Re: CEPH
    By: tassiebob to deon on Mon Oct 07 2024 07:37 pm

    Howdy,

    Really interested to hear how you go with this. It's one of those things I knew was there but have never tried (partly because I didn't have the time to sort it out if it went sideways).

    So this weekend, I did some updates to the hosts running ceph (updating packages, etc), and rebooted each host after their updates (one at a time).

    While I didnt do much testing for stuff being accessible while a host was down, it all appeared to be ok - even though there was a delay I guess while ceph figured out a node was down and had to shuffle around who was the next "master" to handle the IO.

    Pretty happy with this setup - I was prevously using a proprietary file system, which I had to nurse if I rebooted nodes - and occassionally drives would go offline, especially if there was busy I/O going on (all three nodes are VMs of the same host). With Ceph, I did nothing, it sorted itself out and made the cluster healthy again on its own.

    Gluster was equally problematic for different reasons. But both of those filesystems are now disabled and no longer used.

    Even the nfs client recovered on its own. (I normally hate NFS, because my experience has always been if the nfs server goes down, the clients normally useless unless they reboot - and sometimes it requires are hard reboot.)

    So the only thing I need to figure out (learn) if single node dies, rebuilding back the third node and hopefully not loosing data along the way. I'll tackle that when I get to it... ;)


    ...лоеп
    --- SBBSecho 3.20-Linux
    * Origin: I'm playing with ANSI+videotex - wanna play too? (1337:2/101)
  • From tassiebob@1337:2/106 to deon on Wednesday, October 16, 2024 18:44:30
    So this weekend, I did some updates to the hosts running ceph (updating packages, etc), and rebooted each host after their updates (one at a time).

    While I didnt do much testing for stuff being accessible while a host
    was down, it all appeared to be ok - even though there was a delay I
    guess while ceph figured out a node was down and had to shuffle around
    who was the next "master" to handle the IO.

    Nice. Maybe I should add it to the list of things to look at sometime (Christmas maybe)...

    Pretty happy with this setup - I was prevously using a proprietary file system, which I had to nurse if I rebooted nodes - and occassionally drives would go offline, especially if there was busy I/O going on (all three nodes are VMs of the same host).

    I had that kind of experience with a docker swarm not that long ago. I had 3 manager nodes and 2 workers. Upgraded one of the manager nodes and the swarm fell apart. Probably something specific to the specific from/to versions, as it had always worked prior, and since (until I retired the swarm maybe a month ago).

    So the only thing I need to figure out (learn) if single node dies, rebuilding back the third node and hopefully not loosing data along the way. I'll tackle that when I get to it... ;)

    :-)

    --- Mystic BBS v1.12 A48 (Linux/64)
    * Origin: TassieBob BBS, Hobart, Tasmania (1337:2/106)
  • From deon@1337:2/101 to tassiebob on Wednesday, October 16, 2024 19:10:49
    Re: Re: CEPH
    By: tassiebob to deon on Wed Oct 16 2024 06:44 pm

    Howdy,

    I had that kind of experience with a docker swarm not that long ago. I had 3 manager nodes and 2 workers. Upgraded one of the manager nodes and the swarm fell apart. Probably something specific to the specific from/to versions, as it had always worked prior, and since (until I retired the swarm maybe a month ago).

    Hmm, I've been using swarm for years (and still do, my ceph is used on the swarm nodes).

    I'm about to embark on talos/kubenetes with rooks(ceph) now that I'm comfortable with ceph. The goal is to retire the swarm and use kube...

    Its going to get bumpy...


    ...лоеп
    --- SBBSecho 3.20-Linux
    * Origin: I'm playing with ANSI+videotex - wanna play too? (1337:2/101)
  • From tassiebob@1337:2/106 to deon on Wednesday, October 16, 2024 20:13:30
    I'm about to embark on talos/kubenetes with rooks(ceph) now that I'm comfortable with ceph. The goal is to retire the swarm and use kube...

    Its going to get bumpy...

    I wish you well with your decision to prod the bear, lol.

    To be fair, my kubernetes experience has been with what I expect were sub-optimal deployments. One definitely was - it was a packaged NMS from a major networking vendor and the vendor had it setup with no persistent storage, so if the box ever rebooted the whole network needed to be rediscovered. Asshats.

    The other was an IT environment at a previous employer, but it was setup by someone who liked to twiddle every knob. IIRC it was distributed over 3 physical sites for redundancy. If there was a knob then he'd tweak it - even if it had a big label saying "Don't touch unless you know what you're doing". Needless to say it had constant problems and eventually got replaced with a docker swarm (also distributed) that pretty much worked as expected.

    Fingers crossed you have a smoother ride with it :-)

    --- Mystic BBS v1.12 A49 2024/05/29 (Linux/64)
    * Origin: TassieBob BBS, Hobart, Tasmania (1337:2/106)