Despite not being a particularly novel field, natural language processing (NLP) has gained significant attention recently, in large part due to the ChatGPT generative AI hype train. There’s a tangible sense that AI’s entry into the mainstream is just around the corner thanks to other NLP models like Hugging Face’s Transformers and Google’s LaMDA, which is slated to power its ChatGPT-rival Bard. But it’s simple to overlook all the effort that goes into developing the underlying AI models and getting them ready for mass consumption for those who type a few keywords into ChatGPT to make it generate lyrics in the vein of Nick Cave. Developers require a ton of high-quality data in addition to algorithms in order to create NLP models.
Developers need a ton of high-quality training data that is accurately “labeled,” a method of categorizing raw data to enable machines to understand and learn from it, in addition to algorithms, to create NLP models. A platform has been developed by the German startup Kern AI for NLP developers and data scientists to not only control the labeling process but also automate and orchestrate tangential tasks and enable them to address low-quality data that comes their way. A number of companies exist substantively to power this labeling process.
NLP is currently one of the hottest trends in AI, so Kern AI today announced that it has raised €2.7 million ($2.9 million) in seed funding to accelerate its recent growth. Commercial clients using NLP include the insurance companies Barmenia and VHV Versicherungen, logistics companies like Metro Supply Chain Group subsidiary Evolution Time Critical, and venture-backed startups like Crowd.dev. The business also claims that data scientists at organizations like Samsung and DocuSign have used its basic open source incarnation.
Founded in Bonn in 2020, co-founder and CEO Johannes Hötter acknowledged that developers require more control and flexibility over the NLP development process and stated that he started the company “with the belief that NLP will turn into a core digitization technology.”
The company’s flagship product, Refinery, is open source and gives developers the ability to take a data-centric approach to developing NLP models by semi-automating their labeling, identifying low-quality datasets in their training data, and monitoring all of their data in a single interface.
Additionally, Bricks, which is also open source, is a set of modular, standardized “code snippets” that developers can incorporate into Refinery; according to the company, this is the “application logic driving your NLP automations.”
Hötter stated that internal tooling for businesses is a typical real-world use-case for the Kern AI platform. For instance, a logistics provider might have to respond to a client’s request to “please ship 20 palettes to our plant in Gothenburg by tomorrow 4pm”; such urgent requests must be resolved right away. The logistics company could use Kern AI to automatically identify the request’s requirements and intent in order to synchronize incoming requests with their transport management system (TMS).
According to Hötter, who spoke with TechCrunch, “this is accomplished by synchronizing the service inbox with our commercial product workflow, which then pushes the data to Refinery.” Here, developers can examine the request using NLP techniques before pushing the structured extracted data right to their TMS.
So, in some ways, it functions similarly to Zapier, but it is designed for more sophisticated natural language understanding rather than using a rules-based methodology.