Intel

Edge AI & On-Device Inference

technology Emerging Active
Momentum 5.5
Total Mentions 15
First Seen 23 Feb 2026
Last Seen 26 Mar 2026

Weekly Change

Mentions: +1 Momentum: +1.30

Why It Matters

Edge AI reduces inference costs by 80-90% for high-volume use cases and eliminates cloud latency. As models shrink and hardware improves, this becomes viable for an expanding set of enterprise applications.

Summary

The push to run AI models directly on edge devices, phones, and local hardware rather than in the cloud. Driven by latency requirements, privacy concerns, and cost optimisation for high-volume inference.

Momentum Over Time

Source Breakdown

SourceTypeItems
The AI Podcast (NVIDIA) Podcast 1
Exponential View (Azeem Azhar) 1

Notable Excerpts

We have quantised Llama 3 down to 4-bit precision and it runs at 30 tokens per second on a flagship Android device. The quality loss is surprisingly small -- maybe 2-3% on standard benchmarks. For most enterprise use cases like classification, summarisation, and extraction, this is more than adequate.

The AI Podcast (NVIDIA) 71% relevant

Related Items

On-Device AI: Running Llama on a Phone

We have quantised Llama 3 down to 4-bit precision and it runs at 30 tokens per second on a flagship Android device. The quality loss is surprisingly small -- maybe 2-3% on standard...

The AI Podcast (NVIDIA) 71% Medium

Edge AI Will Eat the Cloud

My prediction: by 2028, more AI inference will run on edge devices than in the cloud. The economics are compelling -- once you amortise the device cost, edge inference is essential...

69% Medium