fusrodaftpunk

fusrodaftpunk@lemmy.world · edit-2 2 years ago

Re: c) I will be a dirty shill for VSCode and R lol, example here. I find it much better for R shiny development, projects with multiple people and projects with multiple languages. Notebook support is less good out of the box, you will have to get a jupyter kernel set up - but I use scripts more so than notebooks anyway.

Anyway, onto the question! Base R. Yeah, I said it! Whenever I have a weird enough situation where tidyverse functions won’t work due to poor quality data, then I shed a single solemn tear and quietly wish I had done the project in python as I start writing a for loop in what will no doubt be the most hacky solution ever.

fusrodaftpunk@lemmy.world · 2 years ago

I use clipr::write_clip and clipr::read_clip - can paste to excel but also read in something you’ve copied from excel.

Helpful when you have a client with poorly formatted excel files, to the point that readxl won’t do the job 😭

fusrodaftpunk@lemmy.world · edit-2 2 years ago

+1 for parquet and arrow. If you’re pushing memory better to just treat it as a completely out of memory problem. If you can split the data into multiple parquet files with hive style or directory partitioning it will be more efficient. You don’t want parquet files too small though (I’ve heard people saying 1 GB each file is ideal, colleagues at work like 512 MB per file - but that’s on an AWS setup).

Bonus is once you’ve learned the packages it’ll be the same for all out of memory big datasets.

fusrodaftpunk@lemmy.world · 2 years ago

Wow! So impressive!

fusrodaftpunk@lemmy.world · edit-2 2 years ago

Well there is !dataisbeautiful@lemmy.ml but doesn’t seem to be very active (made approx 2 years ago) - hope that can change or a new community pops up

Edit: idk the correct way to write a community lol