Zip Beans ☕️| The Approach to Conversational AI for Early-Stage Teams
Conversational AI and Sales Enablement as a problem statement is what we are tackling at Zipteams. Getting started in the sphere of conversational intelligence poses some crucial roadblocks. The primary one, in our experience, was the sheer variety of data that flows into our systems.
Initially, we approached every use case from a perfectionist’s point of view. We started applying state-of-the-art algorithms and models to the entire bulk of calls taking place in our digital meeting rooms. We imagined that this would ensure that we let nothing slip through the cracks.
This posed 3 major issues:
1. Risk of a very high spend when using some off-the-shelf models for various downstream inference tasks
2. Handling noise and edge cases was difficult in situations where we had to build models from scratch because a significantly larger training dataset (with negative sampling) was required
3. Long Inference time in certain use cases
We realized that most of the insights or information worth indexing came from a tiny subset of a conversation when compared to the entire duration of the call. Therefore, we adopted a different processing style to combat the issues we faced above.
1. We built a lightweight monitoring engine that focuses on quickly (read: VERY quickly) identifying “interesting” moments from a call in real-time. We do this with a combination of age-old techniques (E.g. Regex, NER, tokenization, POS tagging, etc.) as well as some low-latency inference models deployed at the nodes through which the call session is being managed i.e. the audio-video bridges.
2. Once an “interesting” moment is captured, we asynchronously push it to our secondary inference layer that hosts some larger use-case-specific high-latency models, which weed out the false positives.
This approach has now put us in a position to tackle some complex use cases in real time and have the analysis ready without any post-processing requirements.
Although this AI engineering paradigm has existed for a while, we’re seeing the vast potential for its use in the conversational AI space, especially in some high-cost processing stages, such as speech-to-text engines.