This is a neat idea. When I'm looking up models I usually want to see something about the architecture, but also some of the hyperparameters for the specific model---residual dimension, total number of layers, tokenizer configs. There's some of that in the visualization but it's spotty.
The results for Nemotron 3 Nano are hard to parse, and I think actually incorrect: https://hfviewer.com/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-B... I'm guessing this is because the implementation uses layers that are all instances of the same class, with forward passes that branch on the layer type specified at construction time.
Where is it capturing the model "structure" from?
Most hugging face models are implemented in PyTorch, with an architecture specified as a series of layers. This looks like a nice visualization of that.
This is a neat idea. When I'm looking up models I usually want to see something about the architecture, but also some of the hyperparameters for the specific model---residual dimension, total number of layers, tokenizer configs. There's some of that in the visualization but it's spotty.
The results for Nemotron 3 Nano are hard to parse, and I think actually incorrect: https://hfviewer.com/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-B... I'm guessing this is because the implementation uses layers that are all instances of the same class, with forward passes that branch on the layer type specified at construction time.
Where is it capturing the model "structure" from?
Most hugging face models are implemented in PyTorch, with an architecture specified as a series of layers. This looks like a nice visualization of that.