That's awesome. The original discussion of bitnet made it seem like you needed to train a model from scratch but its neat they were able to adapt an existing model. This is quite exciting.
the performance is still a bit degraded though.
Very exciting, although it was a bit disappointing to see that they're hitting just llama1 7b performance by quantizing llama3. but i'm sure the performance gap will close over time!
That's awesome. The original discussion of bitnet made it seem like you needed to train a model from scratch but its neat they were able to adapt an existing model. This is quite exciting.
the performance is still a bit degraded though.
Very exciting, although it was a bit disappointing to see that they're hitting just llama1 7b performance by quantizing llama3. but i'm sure the performance gap will close over time!