5

Mnist neural network implemented in pure x86 assembly from scratch

I implemented a neural network from scratch in x86 assembly (no frameworks, no Python) to recognize handwritten digits from MNIST. Feedback on performance optimizations or next steps is welcome Uses AVX-512 SIMD for parallel float32 ops (~7× faster than NumPy). Runs in a lightweight Debian Slim Docker container. The goal was to understand neural networks at the CPU level.

19 hours agomghaderi

> ~7× faster than NumPy

Is that on the CPU (not sure if NumPy has a GPU backend)

18 hours agochecker659

Yes CPU same resources And same implementation