NPU - Neural Processing Unit
The Brain behind AI
ARTIFICIAL INTELLIGENCE
Jeugene John V
10/3/20252 min read
What is NPU
Neural Processing Unit or commonly known as NPU is the chip that powers the AI revolution. Its architecture is based on the human nervous system. Our body contains a mesh of nerve clusters that pass stimuli like touch or pain to the cerebrum. This then gets processed, analyzed and a response is generated.
NPU works the same way where inputs from various sensors like Infrared, LIDAR, User inputs are collected, analyzed and needed action performed.
Why not CPU
The main processor in our computer be it Intel i7 or an AMD Ryzen chip do perform complex tasks. However when doing AI operations like matrix multiplication and addition, NPU has an advantage. This is because of parallel processing. It means the chip can perform multiple operations simultaneously with lower power consumption. Within the NPU chip, there are hardware modules which facilitate this transition.
Modules
Each module performs a specific task which on a whole constitute the end result.
Multiply-Accumulate Unit : Performs matrix multiplication and addition where a large dataset is split into smaller chunks/units and distributed to multiple cores.
Activation Function Module : The same action as above, performed over linear transformation operations like ReLU, Sigmoid.
Tensor Accelerator : Certain machine models and weights uses tensor inputs which are multi dimensional matrix. They are calculated using this module.
On Chip Memory : There is an inbuilt SRAM memory within the NPU chip for faster access time. This allows lower latency and less dependence on main memory (DRAM).
Direct Memory Access Engine : Allows efficient data transfer and queuing between main memory and NPU. This reduces the work overhead of the CPU ( Central Processing Unit ).
Compression/Decompression Module : The AI model with the integrated weights can take up large memory space. This module compresses the AI model, allowing integration into memory constraint devices like Internet of Things and tablets.
Memory Management Unit : Parallel processing allows for multiple instances of an operation to run concurrently on different cores. The module allows physical isolation between each process facilitating a sandbox configuration and preventing cross talk.
GPU
GPU or Graphical Processing Unit also uses parallel processing to execute graphical intensive tasks like video editing or gaming. They also have a number of performance cores to implement AI operations. However they consume large amount of power which can be a drawback in case of laptops or other portable devices. Also the cores are less attuned to AI operations and more to graphic rendering purposes.