NTP A Neural Net Topology Profiler for Inference

  • Slides: 6
Download presentation
NTP : A Neural Net Topology Profiler for Inference Raghavendra Bhat, Principal Engineer, Intel

NTP : A Neural Net Topology Profiler for Inference Raghavendra Bhat, Principal Engineer, Intel Corporation Co Authors: Pravin Chandran (Intel), Juby Jose (Intel), Viswanath Dibbur (ex Intel) and Ajith Sirra (ex Intel)

Problem statements with existing benchmark frameworks required new approach = NTP Surveyed Bm. FW

Problem statements with existing benchmark frameworks required new approach = NTP Surveyed Bm. FW either too granular or very implementation specific. Trained models a necessary and time consuming pre-requisite for end to end profiling with surveyed Bm. FW. Surveyed Bm. FW do not provide detailed performance metrics for compute, memory, bandwidth and hotspots. Surveyed Bm. FW do not allow for a quick comparison of workload performance on a combination of hardware and popular frameworks. Surveyed Bm. FW do not allow for detailed study of performance impact of changed workload hyper parameters like precision, number of hidden units etc. Workload = Neural network solution like GNMT, Transformer etc. Bm. FW = Benchmark Frameworks Surveyed Benchmark frameworks like Deep. Bench, MLPerf, DLInf. Bench etc.

NTP speeds up benchmarking with its unique capabilities Simple markup language based interface to

NTP speeds up benchmarking with its unique capabilities Simple markup language based interface to quickly define workload for benchmark. Workload can be a kernel or an end topology. ONNX compatible. Use randomly initialized weights for defined workload. No pretrained model required. Tight integration with respective HW profiler means detailed insights to performance metrics for compute, memory, bandwidth and hotspots. Allows choice of framework and hardware(s) for each benchmark run. Detailed comparisons across runs can be done as a result. Hyperparameter impact can be easily studied by simply changing the workload definition (using markup language).

Nuts and bolts that make up NTP Load a pretrained model. Tensorflow and Py.

Nuts and bolts that make up NTP Load a pretrained model. Tensorflow and Py. Torch* models supported Framework to use selected at runtime Single or a combination of HW can be selected for the benchmark run

Unleashing the tool on an example benchmark…

Unleashing the tool on an example benchmark…