AudioMeta: Text Extraction from Speech
AudioMeta® is igolgi’s real-time, speech-to-text engine that accepts HTTP media streams as input and generates text (metadata) as output. At present, AudioMeta®’s accuracy can enable indexing/search on the input media content from the generated output text.
Each AudioMeta® service instance can handle one HTTP audio stream in real-time to perform speech-to-text conversion. On a single server, the number of audiometa service instances that can run in real-time is dictated by the peak RAM bandwidth and RAM clock rate. Each input stream needs 10-12GB/s of RAM bandwidth.
- Inputs: MP3, FLV, AAC, MPEG2-TS, MP4
- Outputs: Distribution Format Exchange Profile (DFXP)
- Fast processing of audio information to generate metadata for video
- Easy to integrate REST API
- Executes on Virtual or Cloud Environment