• 4 Posts
  • 6 Comments
Joined 1 year ago
cake
Cake day: June 11th, 2023

help-circle
  • Can confirm. It seems counterintuitive, but more data needs more resources, more indexing, more room for errors.

    In my experimentation with RVC, I’ve experimented with all sorts of sizes, and I’ve found my 2 hour datasets take forever and produce subpar results. 5-15 minutes worth of speech data is the sweet spot. No amount of training seems to fix it, it’s counterproductive to overtrain it, but the model just can’t figure out what to do with all of that data it seems.

    Granted, different models can have different advantages and will certainly have different results, but how many times have you been researching something and found so many conflicting pieces of information? If it’s 1 out of 10 pieces of data, that’s easy enough, but now a larger dataset is 10 out of 100 pieces of conflicting information… It’s still 10%, but unfortunately, it’s now 10 pieces of data that it needs to figure out how to interpret, even if the other 90 pieces agree with each other. Just like us, it can get to a point where it’s just too much information to deal with.

    Definitely a point of diminishing returns.


  • I know it’s not ideal and space is certainly an issue, but yt-dlp can download playlists… so effectively what you can do to make it easier is to just dump everything you want to keep in a playlist, and you can even automate it so yt-dlp runs every 6 hours or whatever to get anything you’ve added to your list. At least for stuff you deem important, it’s at least something.

    I’m actually surprised there isn’t an extension to “cache” any videos watched… stage6 (rip) used to just inherently save everything you watched. It’s been over 15 years and I still miss that damn site ;_; we’ve certainly gone in the very wrong direction since then

    Meanwhile I’m about to buy ~50tb of local storage because YouTube is finally pulling the plug on the grandfathered gsuite “as much storage as you need” plan and I’ve got 30tb of shit to move, so I certainly feel your pain. At least my old mining rig motherboards will finally have something to do again… 16 1x pcie slots can support 64 harddrives, so I should be good for a bit.



  • The model sounds a bit gritty, I’m on my 4th revision of the dataset and trying to clean it up. I find it weird that I was able to make an AI of “hat kid” with 2 and a half minutes of audio, yet 55 minutes of Alex Jones and there are still issues. Granted, I really push Alex to the limit, but hat kid was able to sing “I want you back” by the Jackson 5 and sounded damn good, hella cleaner than Alex, with a fraction of the data.

    I’ve pruned a ton of data, got rid of extraneous data, got rid of almost everything that wasn’t absolutely pristine audio quality (with the exception of the moustache rant, I refuse to remove that). I got it down to a little over 28 minutes, and now 17 minutes of pristine data, and I have it retraining yet again. If it’s dramatically better, I’ll repost it.

    Edit: I’ll edit the hat kid one into the main post, as well as Herbert singing Linkin Park, as well as Butters singing Christina Aguilera… Actually I’ll post a bunch of random shit I made, lol