Tag: 推理加速

All the articles with the tag "推理加速".

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Updated:2025年9月24日 at 15:07Published: 2025年9月23日 at 13:50
VLDB2024，阿里的工作，看起来工程特别扎实。LLM任务上只通过对weight做sparse load就能在decode阶段获得3-4倍的提速。
GPU上的SNN稀疏加速
Updated:2025年7月14日 at 11:09Published: 2025年7月13日 at 14:11
把最近做的关于GPU上SNN稀疏加速的东西做一下总结，虽然不太成功。
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
Published:2025年7月7日 at 16:23
T-MAC, 用LUT加速BitNet系列的工作，在CPU上跑，后续还有一个工作叫T-MAN是在移动端的高通CPU里面的NPU上跑LUT加速。
HYTE: Flexible Tiling for Sparse Accelerators via Hybrid Static-Dynamic Approaches
Published:2025年6月25日 at 16:27
ISCA2025，做稀疏数据流分块的，后半截没什么精力看了，现在的工作还没做稀疏编码。
SNN on GPU
Published:2025年6月24日 at 11:48
接下来要开始着手做这个SNN在GPU上的推理加速了，写一些笔记整理思路。
Prosperity: Accelerating Spiking Neural Networks via Product Sparsity
Published:2025年6月11日 at 16:52
HPCA在投的一篇SNN加速器文章，里面的“Product Sparsity”本质是减少相同内容的重复计算，和一般讨论的稀疏是两种不同的概念。
Recurrent Residual Module for Fast Inference in Videos
Published:2025年6月9日 at 15:25
CVPR2018， DiffEncode + 稀疏加速，但感觉太老了。
Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
Published:2025年6月9日 at 14:18
NIPS2022上一篇比较有影响力的论文，对GAN和扩散模型做推理加速的工作，提出了Spatially Sparse Inference，仅在被编辑区域上稀疏地应用卷积滤波器，同时对未编辑区域复用缓存的特征
初探AI Infra
Updated:2025年3月11日 at 18:30Published: 2025年3月4日 at 16:04
趁最近找实习的机会学习、总结一下之前零散接触过的模型推理/训练加速的知识，还有一些CUDA编程的体系架构之类的内容。
SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference
Updated:2025年3月8日 at 15:06Published: 2024年10月17日 at 14:18
GPU上做MM相关的算子生成，利用load balancing和稀疏做加速，根据model生成PTX代码

Tag: 推理加速

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

GPU上的SNN稀疏加速

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

HYTE: Flexible Tiling for Sparse Accelerators via Hybrid Static-Dynamic Approaches

SNN on GPU

Prosperity: Accelerating Spiking Neural Networks via Product Sparsity

Recurrent Residual Module for Fast Inference in Videos

Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models

初探AI Infra

SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference