BareNet
GPU-Accelerated Deep Learning Framework
A minimal deep learning framework built from scratch with CUDA acceleration, automatic differentiation, and PyTorch-like API
System Architecture
Explore the layered architecture of BareNet, from CUDA kernels to Python API, including memory management and GPU acceleration.
Automatic Differentiation
Understand how reverse-mode automatic differentiation works with computational graph tracking and backpropagation.
Training Pipeline
See how a 2-layer MLP is trained on MNIST dataset using the BareNet framework with GPU acceleration.
Key Features
🚀 GPU Acceleration
All tensor operations run on GPU using custom CUDA kernels, achieving 5X speedup over CPU implementations.
⚡ Automatic Differentiation
Built-in autograd engine with reverse-mode backpropagation, tracking computational graphs automatically.
🐍 Python API
PyTorch-like interface using Pybind11, making it easy to build and train neural networks.
📊 MNIST Training
Successfully trained 2-layer MLP achieving 97.48% test accuracy on MNIST handwritten digits.
NYU Machine Learning Systems Course • Built with CUDA, C++, and Python
