![](https://burggit.moe/pictrs/image/bdec8c90-90d1-4d2e-bb3d-a826d4a8f90a.jpeg)
![](https://burggit.moe/pictrs/image/e8e0df03-cf4e-4885-a159-14ab463c647f.png)
I know it’s not ideal and space is certainly an issue, but yt-dlp can download playlists… so effectively what you can do to make it easier is to just dump everything you want to keep in a playlist, and you can even automate it so yt-dlp runs every 6 hours or whatever to get anything you’ve added to your list. At least for stuff you deem important, it’s at least something.
I’m actually surprised there isn’t an extension to “cache” any videos watched… stage6 (rip) used to just inherently save everything you watched. It’s been over 15 years and I still miss that damn site ;_; we’ve certainly gone in the very wrong direction since then
Meanwhile I’m about to buy ~50tb of local storage because YouTube is finally pulling the plug on the grandfathered gsuite “as much storage as you need” plan and I’ve got 30tb of shit to move, so I certainly feel your pain. At least my old mining rig motherboards will finally have something to do again… 16 1x pcie slots can support 64 harddrives, so I should be good for a bit.
Can confirm. It seems counterintuitive, but more data needs more resources, more indexing, more room for errors.
In my experimentation with RVC, I’ve experimented with all sorts of sizes, and I’ve found my 2 hour datasets take forever and produce subpar results. 5-15 minutes worth of speech data is the sweet spot. No amount of training seems to fix it, it’s counterproductive to overtrain it, but the model just can’t figure out what to do with all of that data it seems.
Granted, different models can have different advantages and will certainly have different results, but how many times have you been researching something and found so many conflicting pieces of information? If it’s 1 out of 10 pieces of data, that’s easy enough, but now a larger dataset is 10 out of 100 pieces of conflicting information… It’s still 10%, but unfortunately, it’s now 10 pieces of data that it needs to figure out how to interpret, even if the other 90 pieces agree with each other. Just like us, it can get to a point where it’s just too much information to deal with.
Definitely a point of diminishing returns.