You May Also Enjoy
Use Nsight System to Profile a Model Training with DeepSpeed on Multi-Node Cluster
11 minute read
This post is to log how I managed to profile a model training running on multiple nodes in a cluster with DeepSpeed and Nsight System. Click here to jump to ...
Profiling of different implementation of attention modules
less than 1 minute read
Placeholder for the blog logging the results of profiling vllm
Profiling of different implementation of attention modules
less than 1 minute read
Placeholder for the blog logging the results of profiling different implementations of attention modules.
Training Custom Mixtral Model with DeepSpeed
less than 1 minute read
Placeholder for the blog logging how I trained a custom Mixtral model with DeepSpeed.
Comments