FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference
Production inference for encoder models - ColBERT, GLiNER, ColPali, embeddings etc. - as vLLM plugins for online and in-process deployment