Data Hoarders Thread - The Scriptatorium of the Modern Age

  • 🔧 Site instability resolved. You can report double-posts and broken attachments. For bigger issues, use the Technical Grievances thread.
    🇵🇦 Nuestro primer dominio localizado está en español en kiwifarms.pa. Our first localized domain is on Spanish on kiwifarms.pa.
  • Want to keep track of this thread?
    Accounts can bookmark posts, watch threads for updates, and jump back to where you stopped reading.
    Create account
That's actually a very incisive way to put it, however it's really only true when you are dealing with very large (enterprise) amounts of data. Amazon Glacier and other large tape backup centers use complex robotic systems to file and retrieve the vast numbers of tapes. The complex storage and low speed of access are acceptable tradeoffs in some scenarios for the density, survivability, and low power usage of magnetic tapes.

Also consider that LTO-7 drives, while quite expensive, can put 6.5 TB on a single $67 tape, so you only need three to get you close to the largest hard drives readily available, at a cost of one cent per GB. That's the same cost as large, economy HDDs and a fraction of the cost of even budget SSDs. LTO-6, with drives available for under $500 and tapes for $4 for a 2.5 TB tape, gives you nine tapes to a 22 TB HDD. How much data could you possibly need to store that tape storage and filing becomes an issue?

If you really do have that much data, a refurbished tape library with 36-72 TB capacity can be had for $1,000 at the low end to a couple grand for newer models, a comparable cost to setting up a state-of-the-art NAS with 2-4x the storage. Throw in a SAS drive or two as backups in case your tape library fails and you have a pretty good offsite backup for a pretty reasonable price.
 
I need to aquire more storage soon. anyone know a good source of high capacity HDDs that are cheap? used is great as long as I can get the specs
 
I need to aquire more storage soon. anyone know a good source of high capacity HDDs that are cheap? used is great as long as I can get the specs
I've had good luck with Facebook marketplace and eBay, using sellers that were selling a hundred or more. But that was over 4tb to 14tb drives, not sure about smaller drives.

I'll be needing to get a set of 16+tb drives soon but they get pricy
 
WD sells refurbed drives that are in good shape on eBay. I'm wary of FB Marketplace for something like hard drives unless they can give you a Crystal Disk run down to make sure it's still in good shape.
 
Última edición:
WD sells refurbed drives that are in good shape on eBay. I'm weary of FB Marketplace for something like hard drives unless they can give you a Crystal Disk run down to make sure it's still in good shape.
The ones I got were good, but I had made a judgment call and had started with buying 4tb drives for $20 from the guy and slowly upgraded to bigger drives over time
 
Is there a good program for mass downloading YouTube comments?
This should work.

I need to aquire more storage soon. anyone know a good source of high capacity HDDs that are cheap? used is great as long as I can get the specs
Until a few months ago, goharddrive on ebay had extremely affordable refurbed drives with a 5y warranty, Linus Racemixing Tips made a video on them and they've gone up quite a bit and gone scarce as a result, but they're still quite affordable.
 
what's the best way to save and access files on my hard drives? I have four 14tb drives formatted as ext4 and pooled using mergerfs, but I find that when I'm importing a massive amount of data the drive mergerfs is copying to is pretty much unavailable and it's really slow getting data off it at the same time.
should i reformat the drives as XFS? I'd need to wait until I buy some more drives to do so tho
 
I have used XFS for bulk storage drives and did not have any big complaints, though I did not subject them to anything highly intensive. XFS excels in very large single drives with very large files, but there are reports of struggles with snapshots, journal recovery, and RAID.

Assuming you do not have a massive array with a need for a hardware RAID controller, your best bet is probably ZFS, which has improved a lot in recent years. ZFS is actually a lot more than a traditional file system and incorporates a lot of features that would normally be handled by RAID or LVM. It has on-disk compression, encryption, snapshots, and block-level CRC, making it more resilient to write failures that would cause data loss in a traditional file system.
 
Última edición:
I have used XFS for bulk storage drives and did not have any big complaints, although I did not subject them to anything highly intensive. XFS excels in very large single drives with very large files, but there are reports of struggles with snapshots, journal recovery, and RAID.

Assuming you do not have a massive array with a need for a hardware RAID controller, your best bet is probably ZFS, which has improved a lot in recent years. ZFS is actually a lot more than a traditional file system and incorporates a lot of features that would normally be handled by RAID or LVM. It has on-disk compression, encryption, snapshots, and block-level CRC, making it more resilient to write failures that would cause data loss in a traditional file system.
Is it possible to migrate my setup to ZFS without having to delete everything off the drives and start again? I have 4 14 tb drives that are about 90% full, I plan to get two more within a month and format then as xfs, copy two drives over then format then to xfs, then repeat for the last pair then use meegerfs's rebalance feature to evenly distribute the files.
 
In theory, if you had a filesystem that supported shrinking volumes, you could format the free space as ZFS, move as many files as you could to the new volume, shrink the old, expand the new, and rinse and repeat. It would be very tedious but should work. Only problem is that ext4 does not support shrinking volumes unless something has changed relatively recently.

There is no way to convert in-place because the filesystem affects the very low-level structure of block on the metal. The only practical way to do it is buy a new 4 TB (minimum) drive then migrate one drive at a time with the same procedure I described above. Using ZFS would probably (I have never used mergerfs so can't say for sure) simplify your setup and improve performance. XFS was never really intended for multi-drive arrays, it was and still is designed to store large files like video media that are accessed infrequently and in their entirety.
 
In theory, if you had a filesystem that supported shrinking volumes, you could format the free space as ZFS, move as many files as you could to the new volume, shrink the old, expand the new, and rinse and repeat. It would be very tedious but should work. Only problem is that ext4 does not support shrinking volumes unless something has changed relatively recently.

There is no way to convert in-place because the filesystem affects the very low-level structure of block on the metal. The only practical way to do it is buy a new 4 TB (minimum) drive then migrate one drive at a time with the same procedure I described above. Using ZFS would probably (I have never used mergerfs so can't say for sure) simplify your setup and improve performance. XFS was never really intended for multi-drive arrays, it was and still is designed to store large files like video media that are accessed infrequently and in their entirety.
The drives are 14tb, not 4tb. Mergerfs takes multiple drives and make them appear as one, so file one may be on drive 1 and file 2 is on drive 2 and so on, you can still access each drive independently as they are mounted as normal drives but each drive only has some of the files.
 
The drives are 14tb, not 4tb. Mergerfs takes multiple drives and make them appear as one, so file one may be on drive 1 and file 2 is on drive 2 and so on, you can still access each drive independently as they are mounted as normal drives but each drive only has some of the files.
Yeah that was a typo, but I think you can understand my point. So MergerFS is like a userspace JBOD, with slightly more favorable prospects in the event of a single drive failure? Sounds interesting, but for the home data hoarder who is constantly tinkering, adding disks, etc. I think ZFS is what you really want, unless you plan on getting enough disks to make it worth installing a hardware RAID controller. ZFS makes it trivial to add more disks, even of different sizes, so you could move your archives over from the old ext4 array bit by bit. Adding drives to RAID arrays ranges from difficult to impossible depending on your controller and configuration.
 
I have a Toshiba MG10 drive that I bought last summer and it is very nice, very solid. I bought it because I had two SMR-drives for backup/storage and with the way my system is set up they were driving me insane! SMR sucks fucking donkey dick, avoid it at all costs!
 
Does anyone know good tools to archive (vanilla) xenforo forums? I have a forum with a vast amount of niche knowledge I want to catalogue and save. I've been trying with HTTrack and Wget, but the trouble is that while they do link conversion, xenforo seems to set all CSS using javascript that's baked into the page with the onLoad() tag, and neither HTTrack nor Wget do js link transcription... I'm currently seeing if I can simply `sed` all the HTML pages to transcribe them, but if anyone knows a better way, please do share.
 
Does anyone know good tools to archive (vanilla) xenforo forums? I have a forum with a vast amount of niche knowledge I want to catalogue and save. I've been trying with HTTrack and Wget, but the trouble is that while they do link conversion, xenforo seems to set all CSS using javascript that's baked into the page with the onLoad() tag, and neither HTTrack nor Wget do js link transcription... I'm currently seeing if I can simply `sed` all the HTML pages to transcribe them, but if anyone knows a better way, please do share.

Have you tried Archive Box? https://archivebox.io/

I've been meaning to try it, but haven't gotten around to it yet. I'm not sure if it can crawl a whole site, or is more like archive-today that's just set up for single pages.
 
The latest episode of ExplainingComputers contained a few interesting nuggets about long term storage:


Suggestions are made wrt best practices for home users for maintaining data, as well as which storage media to consider.

vlcsnap-2026-04-07-09h42m55s506.png

HDDs are a good long-term storage option, but data should be re-written every 3 years.
SSDs should be read once a year
Optical media should be changed over prior before it reaches its life expectancy
Cloud storage doesn't need to be refreshed, but don't rely on one provider
USB/SD cards (pro or endurance class) should be read once a year
USB/SD cards (standard quality) should be rewritten once a year

vlcsnap-2026-04-07-09h42m29s462.png

Good quality archival-grade optical media stored in ideal conditions i.e. relatively stable temperature should last at least 50 years; in my experience this just means keeping optical media stored in a closet either in spindles or folders. For what it's worth, I have data written to high quality DVD-Rs (Taiyo Yuden) dating back to the mid '00s and every one I've read over the past 12 months (probably around 30-40 discs) has been perfect.

He recommends having at least one optical drive, which I think is wise considering that optical drives are becoming more difficult to find, and it's always a good idea to have a spare or two.

The whole video is worth watching, especially as he explains how and why data stored on each type of media decays in different ways.
 
Última edición:
Have you tried Archive Box? https://archivebox.io/

I've been meaning to try it, but haven't gotten around to it yet. I'm not sure if it can crawl a whole site, or is more like archive-today that's just set up for single pages.
I'm trying to figure this out now. The latest image is missing half a year worth of updates, and building from source is broken at the moment. The specific update I'm looking for is the update regarding the crawler to be set to a depth>1.
 
Does anyone have a recommendation for cases that can hold 8 drives or more? I had been buying rack mount cases, but I don't really need dual 1600 watt power supplies to run 8 hard drives.

I was looking at this one.

Jonsbro?

The vendor name is sketchy and I've only tried to stick to reputable guys. It was either this or a Fractal XL (but they are too big.) I'm trying to find something in the 8 to 12 drive range.
 
Does anyone have a recommendation for cases that can hold 8 drives or more? I had been buying rack mount cases, but I don't really need dual 1600 watt power supplies to run 8 hard drives.

I was looking at this one.

Jonsbro?

The vendor name is sketchy and I've only tried to stick to reputable guys. It was either this or a Fractal XL (but they are too big.) I'm trying to find something in the 8 to 12 drive range.
Depends on levels of retarded you are. If full retarded and rich then supermicro cse-847 off of ebay, if same level retarded but poor -- go to your DIY store buy some aluminum, drill some holes and build one yourself. The latter is best value per hdd, however you would end up needing PSUs and sata adapters -- that can be solved by buying either chinese cases off of amazon, or look for bankruptcy auctions and buy tower office PCs for literally near 0. They would be like i5 gen 7 shit but you dont care abuut compute as you only need PCI slot to plug your Adaptec HBA (ebay) and PSU to power the hdds. Then for 50bucks you add double 10gbps GBe lines if needed. This is peek hoarder shit. If you are tech savvy you can polish it off with Ceph on top of it for redundancy or even raid6 arrays with mixed hdds (so they dont fail at the the same time) and you have kinda redundant setup. Takes a lot of space and power, but costs almost nothing compared to full retard+rich options.
 
Atrás
Top Abajo