文章
全部文章列表。
-
WWW: What, When, Where to Compute-in-Memory
更新于:一些关于存内计算的验证与思考。
-
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
更新于:谷歌的,第一篇完整跑通interger-only量化推理流程的工作。
-
SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural Networks
更新于:SNN部署的硬件设计or evaluation benchmark。
-
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
更新于:From IPADS, 利用模型预测LLM中需要激活的MoE or Neuron,减少资源消耗。
-
Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
更新于:GEMM data mapping的介绍,主要是各种脉动阵列相关的加速器。
-
HAWQ: Hessian Aware Quantization of Neural Networks with Mixed-Precision
更新于:模型量化经典方法,基于黑森矩阵,一种二阶信息的量化方法。
-
Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing
更新于:BISMO优化。
-
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
更新于:TVM。
-
Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures
更新于:Roofline model,描述一个系统的性能是受内存制约还是受计算制约。
-
A Comprehensive Survey on Electronic Design Automation and Graph Neural Networks: Theory and Applications
更新于:图神经网络在EDA领域应用的综述。