Peking University, July 16, 2025: A research team led by Prof. Yang Yuchao from the School of Electronic and Computer Engineering at Peking University Shenzhen Graduate School has achieved a global breakthrough by developing the first sort-in-memory hardware system tailored for complex, nonlinear sorting tasks. Published in Nature Electronics, the study titled “A fast and reconfigurable sort-in-memory system based on memristors” proposes a comparator-free architecture, overcoming one of the toughest challenges in the field of processing-in-memory (PIM) technology.
Background
Sorting is a fundamental computing task, but its nonlinear nature makes it difficult to accelerate using traditional hardware. While memristor-based PIM architectures have shown promise for linear operations, they have long struggled with sorting. Prof. Yang’s team addressed this by eliminating the need for comparators and introducing a novel Digit Read mechanism, along with a new algorithm and hardware design that reimagines how sorting can be performed within memory.
Why It Matters
This work represents a significant step forward in the evolution of PIM technology, from linear matrix operations to nonlinear, high-complexity tasks like sorting. By proposing a scalable and reconfigurable sorting framework, the team provides a high-throughput, energy-efficient solution that meets the performance demands of modern big data and AI applications.
Key findings
The study presents a comparator-free sorting system built on a one-transistor–one-resistor (1T1R) memristor array, using a Digit Read mechanism that replaces traditional compare-select logic and significantly enhances computational efficiency. The team also developed the Tree Node Skipping (TNS) algorithm, which speeds up sorting by reusing traversal paths and reducing unnecessary operations. To scale performance across diverse datasets and configurations, three Cross-Array TNS (CA-TNS) strategies were introduced. The Multi-Bank strategy partitions large datasets across arrays for parallel processing; Bit-Slice distributes bit widths to enable pipelined sorting; and Multi-Level leverages memristors’ multi-conductance states to enhance intra-cell parallelism. Together, these innovations form a flexible and adaptable sorting accelerator capable of handling varying data widths and complexities.
Figure 1. Overview of sorting systems.
Application Demonstrations
To validate real-world performance, the team fabricated a memristor chip and integrated it with FPGA and PCB hardware to build a complete, end-to-end demonstration system. In benchmark tests, it delivered up to 7.70× faster speed, 160.4× higher energy efficiency, and 32.46× greater area efficiency compared to leading ASIC-based sorting systems. The system also proved effective in practical applications: in Dijkstra path planning, it successfully computed the shortest paths between 16 Beijing Metro stations with low latency and power consumption. In neural network inference, it enabled run-time tunable sparsity by integrating TNS with memristor-based matrix-vector multiplication in the PointNet++ model, achieving 15× speed and 67.1× energy efficiency improvements. These results highlight the system’s broad applicability in both conventional and AI-driven workloads.
Figure 2. In-memory sorting is compatible with existing matrix-based in-memory computing, enabling real-time adaptive sparse AI computation.
Future implications
This work redefines what’s possible in processing-in-memory systems. By demonstrating a flexible, efficient, and scalable sorting system, Prof. Yang’s team has opened the door for next-generation intelligent hardware capable of powering AI, real-time analytics, and edge computing. It lays the foundation for future nonlinear computation acceleration, pushing the boundaries of what memristor-based systems can achieve.
*This article is featured in PKU News "Why It Matters" series. More from this series.
Read more: https://doi.org/10.1038/s41928-025-01405-2
Written by: Akaash Babar
Edited by: Chen Shizhuo
Source: PKU News (
Wechat)