So I run a video production company. We have 300TB of archived projects (and growing daily).
Many years ago, our old solution for archiving was simply to dump old projects off onto an external drive, duplicate that, and have one drive at the office, one offsite elsewhere. This was ok, but not ideal. Relatively expensive per TB, and just a shit ton of physical drives.
A few years ago, we had an unlimited Google Drive and 1000/1000 fibre internet. So we moved to a system where we would drop a project onto an external drive, keep that offsite, and have a duplicate of it uploaded to Google Drive. This worked ok until we reached a hidden file number limit on Google Drive. Then they removed the unlimited sizing of Google Drive accounts completely. So that was a dead end.
So then we moved that system to Dropbox a couple of years ago, as they were offering an unlimited account. This was the perfect situation. Dropbox was feature rich, fast, integrated beautifully into finder/explorer and just a great solution all round. It meant it was easy to give clients access to old data directly if they needed, etc. Anyway, as you all know, that gravy train has come to an end recently, and we now have 12 months grace with out storage on there before we have to have this sorted back to another sytem.
Our options seem to be:
- Go back to our old system of duplicated external drives, with one living offsite. We’d need ~$7500AUD worth of new drives to duplicate what we currently have.
- Buy a couple of LTO-9 tape drives (2 offices in different cities) and keep one copy on an external drive and one copy on a tape archive. This would be ~$20000AUD of hardware upfront + media costs of ~$2000AUD (assuming we’d get maybe 30TB per tape on the 18TB raw LTO 9 tapes). So more expensive upfront but would maybe pay off eventually?
- Build a linustechtips style beast of a NAS. Raw drive cost would be similar to the external drives, but would have the advantage of being accessible remotely. Would then need to spend $5000-10000AUD on the actual hardware on top of the drives. Also have the problem of ever growing storage needs. This solution we could potentially not duplicate the data to external drives though and live with RAID as only form of redundancy…
- Another clour storage service? Anything fast and decent enough that comes at a reasonable cost?
Any advice here would be appreciated!
AWS Glacier Deep Freeze is designed for this. Something you access a couple of times per year if that, and it’s $.99/TB/mo. Price that out compared to a $10k NAS or tape backup that will still need consumables like drives and tapes, and it might be your best option. There are costs on retrieval, but since as you’ve said this is archive footage that customers might request you could pass that cost down to them.
Tip: AWS Snowcone & AWS Snowball are less expensive for data-out when you need to move many TiB. There is no time-limit on how long they can be rented.
NAS.
Over the last 24 months I’ve built 300TB (a mix of 10 and 14TB disks) for $2500 in disks. I could do that right now for $2100. A 18TB LTO9 tape is more expensive than what I’m paying per TB for 14TB disks.
$700 in hardware to build the NAS with 25 bays.
Glacier would cost you $1080/mo in storage fees alone (300,000GB @ $0.0036) not including the $0.09/GB to get any data back out. Deep Glacier is less (by half, for storage), but comes with strings attached.
Don’t forget to factor in labor hours of what it’s going to cost you to maintain a tape library or a local server in general.
Are you charging clients for long term storage after a project is complete? If not, you should be.
You have 3 issues, online archive of past projects, long term (offline) storage & client access.
LTO is your long term solution for offline archive of projects. Depending on the average / largest project you might want to do 1 project per tape so LTO7/8 sizes. Scales really well, multiple copies, etc.
For the online storage, a NAS is really the only option. How it’s sized & configured comes into play. You can go cheaper with used enterprise gear, but then you’re dealing with more disks & higher power bills. Fewer larger disks can help with the power bill & noise levels.
Splitting things between a read-only share (of things that have been archived to tape), and a normal working share would help on the workflow.
The catch is what you do for client data exchanges. Giving them access via Dropbox is nice, but you need better housekeeping around data. Once the 1 year grace is over, what’s the size they have committed to? While self-hosting a client accessible share is possible, there’s ongoing costs & I would be cautious around exposing the NAS to the internet directly.
Have you considered Amazon S3? It’s made for enterprises with unlimited storage, a lot of pricing options and could save you a lot of headaches long term.
Backblaze B2 is cheaper by a long way.
Yeah we looked into it. But as subven1 pointed out, it’s a brutal monthly cost.
s3 is designed with high availability and high throughput in mind, op needs a cold storage solution like aws glacier or azure cold storage. but even that is not cheap
I don’t know that I’d take on tape with your use case. There’s a good bit of tech debt involved there.
NAS (either bought or built) + Amazon glacier or Backblaze for cloud archival backup.
The NAS (including drives) will probably cost you $7000-8000 USD for 400ish TB of storage with room to grow
It was easy to give clients access to old data directly if they needed, etc.
I hope you charge for this. It would help to offset your storage costs.
What drives and NAS would give you 400tb for 7-8k?
Yes I too am interested and would purchase this solution
300 TB in Backblaze B2 using their online calculator is $21,600 USD a year. I’m sure you can build / expand a new NAS every year for the similar prices. But then you have to deal with the overhead of managing it and replacing disks.
Wasabi has their Reserved Capacity Storage where you can get discounts if you commit to a minimum amount of storage. According to their site the absolute minimum to qualify is 25TB.
I suspect 300+ TB will get a decent deal.
I’m in the same biz. I use tape. Specifically a Mac mini + canister from guys that make Hedge. I then index each tape with neofinder, it makes it easy to find and pull projects. The idea was to make a system simple enough that it wasn’t one persons full time job.
Maybe something like that can be a part of your solution? : https://www.hetzner.com/dedicated-rootserver/matrix-sx
I’ve worked for several production companies that have similar or larger archives (one was well into the Petabyte range). LTO is the way to go. It is the cheapest option for very large archives, and if the tapes are properly stored, they last a lot longer than hard drives sitting on a shelf.
The real way to do it is a tiered archive, where everything goes to LTO, you have more recent media (1-2 years old, depending on project length) on hard drives, and current media (still in use + past year or so) on a NAS. LTO is still your primary archive; everything else is for easy access to media you’re more likely to need now or in the near future.
look at IDrive e2, no egress, $4,500 first year…screaming deal
https://youtu.be/lO-SAzFaN18?si=Rp0mvidHMxBFNedC
https://youtu.be/JHVSoJDZ06U?si=_7kmEUZNKc3UDfRK
I’m by no means qualified to give you an actual answer, but these videos may be of some use to you.
Yeah sweet. I haven’t checked in on the Slow Mo Guys storage setups in a while. I’ll have a watch.
Raid/NAS, as many others have said, isn’t a backup.
However, you could have a single NAS and backup to AWS Glacier where storage costs for larger files is cheap going in and getting out in DR scenario is expensive, but maybe covered by your insurance depending on the DR event.
Listen to me. Here is the pro solution. Get yourself something like Fujitsu Eternus cs800 plus Fujitsu lto tape library. Contact sales team and tell them how many data are you going to put there. Result will be all the data available quick if they reside on disk cache, or little bit later if need to be pulled from tape. From your point of view data will be available from mounted network share and transparent in terms of technical magic behind it. Basically - imagine yourself an infinite folder where algorithm is moving data to and from tapes, keeps them healthy, refresh and consolidate when needed. 20 tapes each 12tb plus dedup is like 0.5 PB of data. And you can always duplicate tapes and move to external location. Even if somebody would stole everything from 1st location including hardware, you can get data back.
backblaze is probably the cheapest cloud option. But it might still be too expensive.
A lot of people have suggested charging clients for long-term storage. I agree with that sentiment. If you go this route, you may be able to use cloud storage a la Dropbox/gDrive - which seems most convenient for you. Costs for consumer-facing cloud storage run roughly $10USD for 2TB. Expensive for hundreds of terabytes indefinitely, but if a single client needs access to (idk) 0.5 tb you could easily charge $30-50 a year to provide them a shared folder in google drive. Maybe more if you want redundancy against the cloud provider losing data.
For anything you need to actively use for work, a giant NAS is probably your best bet. Those YouTubers you’ve seen also use it as part of their team workflow, and maybe that’d also apply to you anyways. You should probably run a regular backup job of these to the other office or to AWS/backblaze. Should be manageable cost if you only need 10-20TB of data for active work.
For everything else… maybe tape if you really want to keep everything. A lot of big organizations seem to be moving away from tape towards networked spinning disk as the price drops. Seems mostly driven by tape being seen as a massive pain to use (not that I have personal experience with it) and expensive equipment. It’s really an organizational decision to directly quantify long term archival needs and value. Once you have a $/TB value to the business, see what fits your budget (could be nothing!) You could try Backblaze or AWS glacier but those get expensive and the cost is ongoing forever.
There are a whole bunch of niche and small-scale companies doing cloud data storage, but I don’t know how they’d get lower cost per byte stored over some big companies (lower margins? Slower speeds? Lower guarantees?). I’d be suspicious of them for mission-critical storage. It’s one thing for a home-user to use them to store their torrented movies, but it’s very different for a business. It could be worth it to just search around. Look at what’s supported as a target by whatever NAS software you use if that’s your route.
Tape. You’ll thank yourself in the long run.