BareNet

GPU-Accelerated Deep Learning Framework

A minimal deep learning framework built from scratch with CUDA acceleration, automatic differentiation, and PyTorch-like API

System Architecture

Explore the layered architecture of BareNet, from CUDA kernels to Python API, including memory management and GPU acceleration.

Understand how reverse-mode automatic differentiation works with computational graph tracking and backpropagation.

See how a 2-layer MLP is trained on MNIST dataset using the BareNet framework with GPU acceleration.

All tensor operations run on GPU using custom CUDA kernels, achieving 5X speedup over CPU implementations.

Built-in autograd engine with reverse-mode backpropagation, tracking computational graphs automatically.

PyTorch-like interface using Pybind11, making it easy to build and train neural networks.

Successfully trained 2-layer MLP achieving 97.48% test accuracy on MNIST handwritten digits.

NYU Machine Learning Systems Course • Built with CUDA, C++, and Python