Skip to Content

RJ Corneille

1 post

An inventory of transformers inference optimisation methods in the HuggingFace echosystem

An inventory of transformers inference optimisation methods in the HuggingFace echosystem

Latency is one of the main challenges to making machine learning impactful for an organisation. Depending on the latency requirements and the inference methods, the emphasis on latency can be either about cost efficiency and / or about scalability. Machine Learning Engineering teams need to be able to provide value and