Research

FTL Model Compiler Framework

Mar 6, 2024
·
Ali Boubezari, Muyang Yu, Nikolay Korovaiko, Hongze Zhao

Fig 1: Supporting multiple training frameworks with ONNX.

Fig. 2a: Example of how the Orchestrator Segmenter compiles relevant parts of the graph to TensorRT.

Fig. 2b: Example of incorporating multiple sub-compiler passes to produce a final stitched binary.

Fig. 3: Relative improvements in onboard resource utilization after general adoption of FTL.

Fig 4: Injecting a PyTorch GPU kernel into the final compiled graph

Fig. 5: Example of issues brought about by the model export / conversion process.

Fig 6: Using the FTL Segment Breaker in the exported graph to isolate and configure a subgraph to compile in FP32.

Fig 7: How the user can configure multi-GPU inference in their model.

Fig 8: Nuro perception detector latency over time. Note the ~27% drop in latency after multi-gpu inference is applied.