AudioMeta: Text Extraction from Speech

AudioMeta® is igolgi’s real-time, speech-to-text engine that accepts HTTP media streams as input and generates text (metadata) as output. At present, AudioMeta®’s accuracy can enable indexing/search on the input media content from the generated output text.

Each AudioMeta® service instance can handle one HTTP audio stream in real-time to perform speech-to-text conversion. On a single server, the number of audiometa service instances that can run in real-time is dictated by the peak RAM bandwidth and RAM clock rate. Each input stream needs 10-12GB/s of RAM bandwidth. 

Product Highlights

  • Inputs: MP3, FLV, AAC, MPEG2-TS, MP4
  • Outputs: Distribution Format Exchange Profile (DFXP)
  • Fast processing of audio information to generate metadata for video
  • Easy to integrate REST API
  • Executes on Virtual or Cloud Environment