Gavin Niendorf
Gavin Niendorf
Home
Projects
Publications
Posts
Posts
FFMA Speedup with CUDA const
Using
__device__ const
lets NVCC embed network weights directly into FFMA instructions, reducing memory accesses and speeding up DNN inference in Line Segment Tracking (LST).
Cite
×