DeepSeek’s new Engram technique could slash AI memory costs while boosting reasoning power and easing global DRAM pressure

DeepSeek’s Engram separates static memory from computation, increasing efficiency in large AI models
The method reduces high-speed memory needs by enabling DeepSeek models to use lookups
Engram supports asynchronous prefetching across multiple GPUs with minimal performance overhead

DeepSeek, in collaboration with Peking University, introduced a new training method called Engram, designed to decouple memory storage from computational processes.

Traditional large language models require high-bandwidth memory for knowledge retrieval and basic computation, creating a bottleneck in both performance and cost.

Validation and technical approach

The researchers said existing models waste sequential depth on trivial operations, which could otherwise support higher-level reasoning.

Engram allows models to efficiently “look up” essential information without overloading GPU memory, freeing capacity for more complex reasoning tasks.

The system was tested on a 27-billion-parameter model and showed measurable improvements across standard industry benchmarks.

By performing knowledge retrieval through hashed N-grams, Engram provides static memory access independent of the current context.

The retrieved information is then adjusted using a context-aware gating mechanism to align with the model’s hidden state.

This design allows models to handle long context inputs more efficiently and supports system-level prefetching with minimal performance overhead.

The Engram method complements other hardware-efficient approaches, including solutions such as Phison’s AI inference accelerators.

Engram minimizes the amount of high-speed memory required by using lookups for static information, making memory usage more efficient.

Phison offers a cost-effective way to expand total memory using SSDs, supporting large AI models such as Engram or Mixture-of-Experts systems.

Combined, these approaches allow AI systems to optimize fast-memory usage while affordably increasing overall memory capacity.

It also works alongside emerging CXL (Compute Express Link) standards, which aim to overcome GPU memory bottlenecks in large-scale AI workloads.

The method separates static pattern storage from dynamic computation, enhancing the Transformer backbone without increasing FLOPs or parameter counts.

DeepSeek formalized a U-shaped expansion rule to optimize the allocation of parameters between the MoE conditional computation module and the Engram memory module.

Tests show that reallocating around 20–25% of the sparse parameter budget to Engram yields better performance than pure MoE models, maintaining stable gains across different scales.

Memory slot expansion provides predictable improvements without additional computational cost.

This confirms the scalability of conditional memory as an independent axis for sparse models.

Engram’s deterministic retrieval mechanism allows memory capacity to scale linearly across multiple GPUs while supporting asynchronous prefetching during inference.

It offloads static knowledge reconstruction from lower layers, freeing attention mechanisms to focus on global context.

GPU and system memory architectures, potentially avoiding costly HBM upgrades.

This technique may relieve pressure on expensive memory hardware, particularly in regions such as China, where HBM access lags behind competitors such as Samsung, SK Hynix, and Micron.

Early validation of Engram suggests models can expand parameter scale and reasoning capacity while managing memory demands more efficiently.

This approach may help ease memory constraints across AI infrastructure, potentially reducing sharp DDR5 DRAM price swings.

Via SCMP

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

https://cdn.mos.cms.futurecdn.net/f6no4XW3TzUhwgzgJVGbBJ-2560-80.jpg

Source link

GameSir G8 Galileo review: a mobile controller that’s just like the real thing

Windows 11 is such a memory hog that I’ve had to resort to RAM optimizers — but can Microsoft turn things around with project...

Windows 11 is such a memory hog that I’ve had to resort to RAM optimizers — but can Microsoft turn things around with project...

iPhone 17 vs iPhone Air: do you want specs or style?

Ex-Israeli Intelligence Official: Shockwaves of Trump’s “Take Over Gaza” Heard, Felt Across Region

What UK political parties are promising in the 2019 general election

Otto Warmbier’s parents want North Korea to suffer for their son’s death

Could a ‘youthquake’ cause Boris Johnson to lose the general election?

Box Office: ‘Devil Wears Prada 2’ Dazzles With $77 Million Debut

Olivia Wilde Reacts to “Startling” San Francisco Film Festival Appearance

10 Greatest Sci-Fi Movie Villains of All Time, Ranked

The Most Anticipated Sequel of the Year Just Silenced Every Box Office Doubt

US Attorney Pirro says Fed IG’s findings will dictate future of her Powell probe

Trumps says ‘very interesting’ UFO files to be revealed, and Pentagon vows ‘never-before-seen’ info

US Attorney Pirro says officer was shot during White House correspondents’ dinner by suspect’s gun

Landlords barred from evicting tenants during COVID are in settlement talks with DOJ for $1.5 bil

The YouTuber who has become one of Gen Z’s most beloved celebrities

26 last-minute holiday gifts that are still thoughtful and unique

Practicing gratitude regularly can make you less stressed and sleep better

8 things millennials wish you would just stop getting them for the holidays

DeepSeek’s new Engram technique could slash AI memory costs while boosting reasoning power and easing global DRAM pressure

Box Office: ‘Devil Wears Prada 2’ Dazzles With $77 Million Debut

US Attorney Pirro says Fed IG’s findings will dictate future of her Powell probe

Olivia Wilde Reacts to “Startling” San Francisco Film Festival Appearance

Trumps says ‘very interesting’ UFO files to be revealed, and Pentagon vows ‘never-before-seen’ info