You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working with large CSV files that often need to be split into chunks no larger than X size (e.g. max file size 1024 MB) for file transfer. The amount of data in each column varies wildly, so taking a 10 gb file and splitting it into 10 chunks doesn't usually work. One file may end up being significantly over 10gb, and others may be significantly under. Same with splitting by the number of lines. I could split it into more chunks, until the largest is below 1gb, but that requires some trial and error at best and I'm trying to optimize this process.
The following one-liner does what I want, more or less, but it's not the fastest process. Replace both instances of InFile.csv with your file.
Apologies if this exists, I looked in https://miller.readthedocs.io/en/latest/reference-verbs/#split and search the Issues for similar suggestions.
I'm working with large CSV files that often need to be split into chunks no larger than X size (e.g. max file size 1024 MB) for file transfer. The amount of data in each column varies wildly, so taking a 10 gb file and splitting it into 10 chunks doesn't usually work. One file may end up being significantly over 10gb, and others may be significantly under. Same with splitting by the number of lines. I could split it into more chunks, until the largest is below 1gb, but that requires some trial and error at best and I'm trying to optimize this process.
The following one-liner does what I want, more or less, but it's not the fastest process. Replace both instances of InFile.csv with your file.
tail -n +2 InFile.csv | split -C 1000MB -d - --filter='sh -c "{ head -n1 InFile.csv ; cat; } > $FILE.csv"'
Edit: I should note that, in my case at least, the order of the lines does not have to remain the same.
The text was updated successfully, but these errors were encountered: