Posts

FFMA Speedup with CUDA const

Using __device__ const lets NVCC embed network weights directly into FFMA instructions, reducing memory accesses and speeding up DNN inference in Line Segment Tracking (LST).