Optimised BAAI/bge-m3 serving with dense + sparse + ColBERT embeddings, async dynamic batching and pipeline GPU inference