Extract phoneme-level timestamps from speeh audio.
A multilingual phoneme recognizer capable of generalizing zero-shot to unseen phoneme inventories.