Sorry, I know I gave a very simplistic answer above. But I have done a lot of data processing, but the old days. But as so many Things have changed with processor power, bandwidth etc... It's really worth to check where the biggest bottle neck is in the process. Can be a surprising out come, then again maybe not. Eg, I can't believe the speed Of SQLite sometimes. Then you tweak a few pragmas as well as employing transactions can be so quick. Some of the tweaks are not compliant for power failure, rollback etc... But with bulk processing you can normally live with this. normally it's simple, it's a big pass or total failure. A rollback for example can be catastrophic with huge datasets. Better just to start the process again
I don't if my comments are useful or not. I also love data processing related problems. Does not make me good at them just because I like them though.