New AI image generator runs with 10x fewer steps than today's best models - and it's coming to smartphones and laptops

Prev Article Next Article

Artificial intelligence (AI) image generators are becoming more powerful, and they typically rely on heavyweight large language models (LLM) running in the cloud. But researchers say they’ve built a new system that can generate high-quality images using about 10 times fewer processing steps.

The result is AI that is fast and efficient enough to run locally on phones and laptops, while being safer and more environmentally friendly than AI that runs on power-hungry data centers.

The article continues below

They outlined how the new model works in a study uploaded on September 25, 2025, to the preprint arXiv database and announced on March 4 in a statement that Lenovo has licensed the model for integration into its upcoming on-device AI platform. That means this system will soon appear in upcoming smartphones, tablets and laptops.

The goal is simple but ambitious: to bring powerful generative artificial intelligence out of remote data centers and into the devices people actually use. This not only has implications for environmental impact and privacy, but could also make AI-based image generation faster than ever before.

Why most AI image generators are slow

Most modern text-to-image systems rely on a technique called diffusion. These AI models start with random noise—essentially a grid of pixels filled with random values—and gradually refine it into an image through a long sequence of steps.

Typically, that process takes 30 to 50 iterations to produce a finished image, with each step requiring significant computing power. That’s why many popular AI image generation tools run on large clusters of graphics processing units (GPUs) on remote servers via the cloud, rather than locally on a phone or laptop.

Achieving this level of efficiency is technically challenging, as it requires compressing a diffusion model to run in just a few steps while maintaining quality
Hmrishav Bandyopadhyay, PhD researcher at the University of Surrey

That architecture works well for producing high-quality images, but it also creates practical limitations. The models are slower and energy demandingand they must send messages or images to remote servers before waiting for a response.

In the new study, the researchers set out to tackle that bottleneck. SD3.5-Flash dramatically shortens the production pipeline. Instead of dozens of iterations, the model can produce an image in just four processing steps, the researchers said.

This is achieved by compressing the diffusion process into a more efficient form while preserving image quality. Essentially, the system learns how to “jump” through the fine-tuning process in larger leaps instead of progressing step by step. However, according to the study, maintaining visual quality while reducing the number of steps is the core technical challenge.

“Our SD3.5 Flash model allows users to create images from text descriptions solely on their device, with no data leaving their hardware,” said Hmrishav Bandyopadhyaya doctoral researcher at the University of Surrey who developed the model during an internship at Stability AI, in the statement. “Achieving this level of efficiency is technically challenging, as it requires compressing a diffusion model to run in just a few steps while maintaining quality.”

Reducing the number of inference steps means the model requires far fewer computational resources, making it possible to run on consumer-grade hardware.

Greater privacy, speed and AI sustainability

Running generative AI locally rather than in the cloud can have several advantages. The first is privacy: if an AI model runs exclusively on a device, messages and generated images do not need to be sent to remote servers, reducing the risk of data exposure, eavesdropping or misuse.

The second is speed: With fewer processing steps and no network latency, image generation can be almost instantaneous.

Finally, there is an environmental angle. Large cloud AI models use significant energy and water through data center operations, but lightweight models running locally can dramatically reduce these requirements.

Servers in a data center.

AI centers use significant energy to operate. (Image credit: Oleksiy Mark / Shutterstock.com)

Yi-Zhe sangdirector of the SketchX Lab at the University of Surrey, said the broader goal is to make AI more accessible and practical: “SD3.5-Flash puts a powerful creative tool directly into users’ hands while keeping their data private and reducing the energy demands associated with cloud processing.”

In the study, the team tested SD3.5-Flash against traditional diffusion pipelines to measure whether the drastic reduction in processing steps affected the quality of the images. They evaluated the system using standard benchmarks for generative models, including image quality and the degree to which output matches text messages. These metrics are widely used in machine learning research to compare different approaches to image generation.

Tests on standard image generation benchmarks found that the model could deliver results similar to traditional diffusion systems, despite cutting the number of processing steps from around 30–50 down to just four.

Most notably, the technology is already on its way to real products. Lenovo has licensed the model for integration into the upcoming Personal Ambient Intelligence platform, called Qira, which aims to bring AI capabilities directly to consumer devices.

It can enable features such as AI image generation on laptops, tablets and smartphones without the need for an internet connection. In March, the company was introduced its first set of Qira-compatible devicesincluding new concept units, suggesting that it won’t be much longer before we see this new AI system integrated into laptops, tablets and smartphones.

If successful, it will represent a broader shift in how generative AI is delivered. Instead of relying on centralized infrastructure, future AI tools may increasingly run locally at the edge – embedded directly in everyday devices. It’s something the researchers see as part of a larger push to make generative AI more efficient and practical.

Compressing large models without sacrificing quality is still an active area of research, but SD3.5-Flash suggests that the gap between powerful AI systems and consumer hardware may be shrinking rapidly. If companies like Lenovo follow through with device integrations, the next wave of AI creativity tools may live not in the cloud, but in your pocket.

Click Here to Get More