Intentionally Leaking Data To AI

Can companies steer AI responses?

getty

Many companies hesitate to embrace artificial intelligence because they fear AI engines will expose their proprietary data to other companies, including competitors. At the same time, some companies desire to intentionally insert their data into AI engines as part of brand building. Is this a billion-dollar opportunity or a(nother) Fatal Flaw in the evolution of AI?

This counterintuitive idea arose during a panel discussion hosted by NextAccess, a consulting firm that advises clients on how to best use AI to improve their strategies for taking products to market and generating revenue.

Let me start at the beginning. Simply put, an AI engine has two components. The first is an extensive database of content, called a large language model (LLM), that contains all of the information that the AI company can find. This includes all of Wikipedia, the New York Times, and other publicly available content. (There is serious and growing controversy over copyright violations, but that’s a topic for another time.)

The second component of an AI engine is an algorithm that uses the LLM data to compose responses to queries. If I ask an AI engine to complete the sentence, “The dog ran up the…”, the algorithm checks the LLM to see how often this fragment already exists and what words typically complete the sentence. It then gives the user the statistically most likely next word. In this instance, “hill” is a typical response, whereas “casserole” is not.

A company trying to leverage AI can start by asking questions. For example, a clothing company might ask, “What is the latest trend in men’s footwear?” However, just by asking this question, the AI engine knows that the clothing company is contemplating a new product in the category, which is information the company would like to hide from its competitors.

A much more impactful use of AI would have the company upload some of its data – customer reactions or sales history – and then ask the AI engine to find patterns and compare them to any other information in its LLM. However, many AI engines add the uploaded corporate data to their LLMs so that a person from another company with exactly the right question could generate a response that reveals this data. Even though most AI companies have policies and other protections to guard against this data leakage, in several recent studies, 60-75% of companies have outlawed the use of AI because they worry that these protections are insufficient. (There are many other reasons that companies are hesitating, but data privacy consistently ranks at the top.)

Despite these corporate bans, I suspect that every company in the world has at least one employee who has used an AI engine – perhaps on a personal computer with no corporate affiliation – to solve a business problem.

In the NextAccess panel discussion, one participant runs a consulting company. In direct opposition to most other companies, she actively yearns to insert her company’s data into LLMs, especially if it can be somehow attached to her company’s brand name. If someone poses a query to an AI engine where her company’s data would improve the response, she wants the inquirer to see her company as the source of the wisdom, hoping it can drive new client engagements.

Putting a company’s wisdom and brand in front of information seekers is not a new concept. Search Engine Optimization (SEO) is the practice of making a company’s website more available to search engines like Google so that the company’s web link appears in more Google queries. This practice has spawned an entire industry of consulting and technology companies that can assist brands in designing their websites for maximum visibility to Google’s scanning tools. Companies can even pay Google to have their weblink appear at the top of the page for related queries. Importantly, these “sponsored” results are clearly marked so the internet traveler knows which Google responses are based on organic content and which are based on corporate payments.

Google has trained us all to know that the results from its search engine do not necessarily deliver the right—or even best—answer to the question. Clicking on multiple links to scour the source sites has become a normal, expected routine for web searchers.

Users of AI engines currently have a different expectation. They assume that the AI engine is providing the best answer possible. Even known AI flaws like bias and hallucinations are becoming less frequent in new, more powerful AI engines. User trust in AI accuracy is growing.

Will the pull of additional revenue convince AI companies to reveal some of their algorithmic secrets to create an AI Engine Optimization (AEO) industry, such that companies can re-arrange their data in a way that is particularly easy for AI companies to hoover up into the LLMs and increase the likelihood of referencing the company’s data and brand in the AI responses to users’ queries? Will AI engines offer paid placement (ideally with a notation of sponsored content) to brands who seek to appear in AI responses?

And how will AI users react? Will they appreciate more relevant, specific responses? Or will they question the AI company’s objectivity and neutrality? These open questions demonstrate that AI is both unlike prior technological tools and, therefore, as yet unsettled in the path(s) that it will take. Stay tuned.

Reference

Eugen Boglaru

Eugen Boglaru is an AI aficionado covering the fascinating and rapidly advancing field of Artificial Intelligence. From machine learning breakthroughs to ethical considerations, Eugen provides readers with a deep dive into the world of AI, demystifying complex concepts and exploring the transformative impact of intelligent technologies.

Leave a Comment Cancel reply