bedes | (Reply)

From:

Yes! First of all, thank you for making this post, as I am not super familiar with social media/image archiving. And thanks for the AO3 downloader too, that'll save me from having to script something myself.

I will post a word of caution that any tool or website that requires access to credentials is best used with an alternative account, and of course to practice best password management practices (i.e. don't use the same password in multiple places).

I have specialized in a few different kinds of archiving:

EBooks
EBooks are best stored as an .epub; it's the smallest type of file and is native to all e-readers. This is one of the options for downloading fics on AO3 as well. I recommend Calibre for organizing all of those .epubs. There are several plugins available for Calibre that will also strip DRM from books purchased from Amazon or Kobo; as this veers a bit into piracy, I won't provide any links here, but DuckDuckGo can point you in the right direction if you're so inclined. Amazon DRM is harder to strip, but Kobo is easy.

Podcasts
My ride or die tool here is podcast-dl. It works with any RSS feed for a podcast, so if you have access to one from Patreon or a free feed, you can paste it in. Pretty much any podcast on an app or Spotify will have an RSS feed *somewhere*, so hunt around for one. This is a command line tool, but it has great documentation. It even has options for autofilling with the correct metadata, so you don't have to do anything after the fact!

Audiobooks
Amazon will pull audiobooks from your library even if you've purchased them; this has happened with copies of 1984 and histories of the Taliban I have in my library (what can I say, I love collecting history that people want to bury). To strip DRM from these, I recommend Libation - this is also a management tool for updating tags and sorting everything on your machine.

YouTube
Youtube downloaders are a dime a dozen, and it's a bit of a game of cat and mouse with Google to get these to work. Currently, I think this is the most up to date one? Youtube-dl - most Youtube downloaders are command line and require some Python/Docker knowledge, unfortunately.

Specific Topics
Maybe you have a special interest you want to preserve data for that doesn't fit into one of the above categories. For example, maybe you're interested in climate data gathered by universities and/or governments. Anything more specialized will most likely require scripting knowledge and familiarity with how APIs work, but the great news is there are most likely people who also want to archive that sort of data! Poke around forums, Reddit, and GitHub to see if people have made tools for whatever it is you're looking to use.

Data Management
You've got all this data, now where are you going to store it? You have a few places, based on the amount of data you have, what it is, and your comfort level with the possibility of others having access to it.
* The Internet Archive is currently dealing with lawsuits about copyright infringement, and has suffered from DDOS attacks. Depending on what it is you want to archive, especially if you want consistent access to it, this may not be the best place for it. Totally throw stuff on here for webpages though!
* General cloud storage (ex. Google Drive, OneDrive) is good for ease of access and quickly transferring across devices. It will be tied to your account, and there's a pretty good chance it will be scanned for AI training/data gathering purposes. I wouldn't store anything sensitive or containing PII. But if you want to move podcasts/fanfics/fanart quickly over, this is a great way to do it. Remember: There Is No Cloud, it's just someone else's computer.
* The next question is HDD or SSD (hard disk drive or solid state drive). HDDs are cheaper, but SSDs are much faster and are getting cheaper by the minute. If you go the hardware route, I would get on it sooner rather than later; due to Political Reasons, especially in the USA, I fully expect computer hardware prices to get much more expensive in the near future. If you do this, I'd have a sorting system per drive and label them as such, otherwise you'll spend so much time wondering if X image was on Y drive and plugging/unplugging stuff to find it. Not that I've... done.. that.... before.........
* You could, of course, set up a personal server. This is beyond me, though, and requires a pretty hefty financial investment. If you do this, get UNRAID set up and learn how to install dockerized containers of any applications you like, and go the full self-hosted route. A word of advice here is to avoid the Reddits specializing in self-hosted stuff, as a lot of people tend to get pretty weird/cagey about this topic or assume you'll have oodles of tech knowledge already.