A framework for implementing Llama models that enables better compatibility and performance on V100 GPUs.
Discovered on Reddit:r/LocalLLaMA via Reddit:r/LocalLLaMA