A high-performance implementation for running models efficiently, suitable for local environments.
Discovered on Reddit:r/LocalLLaMA via Reddit:r/LocalLLaMA