This project is my first endeavor developed for the final evaluation of Coursera's course "CUDA At Scale For Enterprise." My contributions include the implementation of four distinct versions of this ...
This repository demonstrates a fully standalone C++/CUDA implementation of a multi-layer perceptron (MLP) using cuBLASLt and a few lightweight custom kernels. It performs forward inference directly on ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results