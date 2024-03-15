New investors in Unstructured, which develops technology for ingesting and pre-processing unstructured data for use in developing large language models, or LLMs, include the funding arms of several AI-focused tech companies including Databricks Ventures, IBM Ventures, and Nvidia’s NVentures.





Unstructured, an upstart developing technology aimed at ingesting and pre-processing a wide range of unstructured data for use in developing large language models, or LLMs, for generative AI, Thursday said it has raised a B round of funding worth $40 million.

The new financing round brings total funding for San Francisco-based Unstructured to $65 million.

The B round was led by Menlo Ventures, and supported by the funding arms of several AI-focused tech companies including Databricks Ventures, IBM Ventures, and Nvidia’s NVentures. Other investors in the round include Sacramento Kings Chairman Vivek Ranadivé, Datastax CEO Chet Kapoor, Allison Pickens of the New Normal Fund, Madrona, Bain Capital Ventures (BCV), and Mango Capital.

Madrona, BCV, and Mango previously invested in Unstructured.

Generative AI, or GenAI, has in just the last 12 months become one of the most important tech innovations.

Global IT consultant EY in February released its EY Reimagining Industry Futures Study which found that 43 percent of the 1,405 enterprises surveyed are investing in GenAI.

EY also found that GenAI ranks third among the nine emerging technologies tracked in the study, with “Automation and AI” ranking first. Among the companies already investing in GenAI, 80 percent are working on proof-of-concept for applications, while 20 percent have pilot projects underway, EY said.

Unstructured, founded in 2022, develops technology that makes unstructured data ready for use by LLMs for GenAI. Unstructured data, which includes data such as emails, documents, images, video, and so on that is difficult to manage with traditional tools, needs to be pre-processed into formats that it can be used by machine learning to build the LLMs on which GenAI depends.

Unstructured’s technology automates the transformation of unstructured data into formats needed for retrieval augmented generation (RAG and LLM fine tuning. The company claims it can drive performance improvements of over 20 percent for LLM models without the need for any customization. Its open source library has also been downloaded over 6 million times.

Unstructured in January released its commercial SaaS API, which it said already has over 1,000 paying customers, and in February unveiled its enterprise platform to continuously extract raw unstructured data to significantly cut the time developers and data scientists need to prepare data, the company said.

Unstructured was unable to respond to a CRN request for information by press time.

However, Unstructured CEO and Founder Brian Raymond, in a statement, said his company gives developers the ability to interact with all their data through large foundation models via LLMs, orchestration frameworks, new cloud storage technologies, and ingestion and preprocessing tools.

“A critical bottleneck to realizing the emerging value of LLMs is the ability to ingest and preprocess any human-generated data into an LLM-ready format. 2024 will be the year of moving LLM prototypes into production and organizations of all types and sizes are hungry to build out these architectures efficiently and at scale. Automating the process of structuring data and seamlessly delivering it into storage is critical for enterprises that want to build solutions on this new tech stack and go to market quickly,” said Raymond, who previously served in the CIA and worked at The White House’s National Security Council before winding up in the investment banking and startup world.