On-Device AI: Running Llama on a Phone
We have quantised Llama 3 down to 4-bit precision and it runs at 30 tokens per second on a flagship Android device. The quality loss is surprisingly s...
https://blogs.nvidia.com/ai-podcast/
We have quantised Llama 3 down to 4-bit precision and it runs at 30 tokens per second on a flagship Android device. The quality loss is surprisingly s...