Posts
Embarrassingly Parallel Reduction in CUDA
·10 mins
A step-by-step guide on turning simple math into a flex.
PyTorch: I’m Fast, JAX: You Call That Fast?
·7 mins
A recipe to train Object Detection Transformers (really) fast.
Data Parallelism using standard Ethernet
·8 mins
Bag-of-tricks for multi-node training for the GPU Poor.
Redirects to an external URL.