Salesforce is betting that its own content can bring more trust to generative AI

It has become apparent in recent weeks that generative AI has the potential to transform how we interact with software, allowing us to describe what we want instead of clicking or tapping. That shift could have a profound impact on enterprise software. At the Salesforce World Tour NYC event last week, that vision was on full display.

Consider that during the 67-minute main keynote, it took less than five minutes for Salesforce CMO Sarah Franklin to introduce the subject of ChatGPT. The company then spent the next 40 minutes and several speakers talking about generative AI and the impact it would have across the entire platform. The final speaker talked about Data Cloud, an adjacent technology. It’s fair to say that other than a few minutes of introduction, it was all the company talked about.

That included discussions of EinsteinGPT, a tool for asking questions about Salesforce content, and SlackGPT, a tool for asking Slack questions about its content. In addition, the company talked about the ability to create landing pages on the fly, write sales emails (if that’s what you want) and write Apex code (Salesforce’s programming language) to programmatically trigger certain actions in a workflow, among other things.

When you think about the fact that generative AI wasn’t even really a thing people were talking about until OpenAI released ChatGPT at the end of last year, and events like this take months of planning, the company probably had to switch gears recently to focus its presentation so completely on this single subject.

Salesforce isn’t alone in its new focus on applying generative AI to its existing products and services. Over the past several months, we’ve seen many enterprise software companies announce plans to incorporate this technology into their stacks, even if overall most of these new tools are still a work in progress.

Just last week we had announcements from Zoho, Box and ServiceNow, while other companies too numerous to mention individually have made similar announcements in recent months.

A year after we saw the crypto and metaverse hype machines come crashing down, it’s fair to ask if these companies are moving too fast, chasing the next big shiny thing without considering some of the technology’s limitations, especially its well-documented hallucination problem. For this post, we are going to concentrate on Salesforce’s view of things and how it hopes to overcome some of those known issues when it comes to incorporating generative AI onto the platform.

Got 99 problems, but data ain’t one

Perhaps it’s unfair to put generative AI in the same category as other hyped technologies because we are only now seeing the direct impact of this approach. It took decades of research, development and technological shifts to get us to this point, said Juan Perez, Salesforce’s CIO, who is in charge of the company’s technology strategies.

“This is different, actually. First of all, it’s more real, and AI is not new. We’ve had decades and decades of advancement in AI,” Perez said. And he pointed out that it’s not new for Salesforce, either. It introduced its AI layer, Einstein, back in 2016, and has been refining it ever since.

Perez told TechCrunch+ that he actually uses Einstein AI to help generate reports to do his work, and the developments we are seeing with generative AI will only make the process easier. “With the advances of generative AI, with the compute power, the large-scale systems that can support these large language models, the game is entirely different,” he said.

One theme that Salesforce kept coming back to at the event was the notion of trust and that building AI solutions on top of Salesforce data could help develop more trusted AI. A more trustworthy underlying dataset could in turn help limit hallucination issues where the AI doesn’t actually know with certainty what the response should be and essentially makes one up.

But the company is working hard to make sure that the AI is giving the best answers possible with the understanding that nobody can guarantee that the generative AI won’t hallucinate answers at this point, according to Silvio Savarese, the company’s EVP and chief scientist.

“Good quality data is key for generating good quality outputs.Training or fine-tuning models using curated high-quality CRM data allows you to build trusted generative capabilities. However, even with high-quality data, LLMs can still generate hallucinations,” he said. It’s important to understand that as you implement the technology at your company.

Salesforce is working to mitigate the problem on several fronts, he said. By building its own models, the company can control for some factors that can cause the model to hallucinate. “We have full control of the learning procedure … can inject additional labeling/instruction capabilities and embed constitutional AI methods to mitigate hallucinations,” he said.

In addition, training can be ongoing rather than training once and deploying, as is sometimes the case with LLMs today, he said. “This is especially vital in the world of CRM, where data is constantly changing and freshness is mission critical. By keeping LLMs trained on the most up-to-date information, a common source of mistakes can be minimized.” It’s worth noting, however, that as customers build or bring their own LLMs, Salesforce will still supply the data but have less control over how it gets incorporated, managed and used in external models.

A matter of trust

By using a more constrained set of data for the LLMs that comes from a source like Salesforce, the company is operating on the theory that it will limit the hallucination problem. Vishal Sikka, CEO and founder at Vianai Systems, an MLOps startup told TechCrunch+ in a recent interview that it’s imperative to solve the hallucination issue before it can be used in mission-critical applications in enterprise settings.

“The first part is the safety issue because in the current state of the art, the scientists who have built this transformer technology don’t know how to make it produce good answers and not produce bad ones. They don’t know if it is even possible that it can be done,” he said.

That means that if you have a problem that requires a precise answer, you need total certainty, and we don’t have that yet.

But Ray Wang, founder and principal analyst at Constellation Research, told TechCrunch+ that there are business cases where you don’t need total accuracy to be useful.

“Generative AI ultimately requires massive amounts of data for high precision,” he said. “This requires removing false positives and false negatives with training and human augmentation. Areas where we need 100% accuracy will be hard to achieve, but if we can live with 70% or 80% accuracy, many tasks such as self-service customer care, or sales lead scoring, or campaign automation will become easier.”

Brent Hayward, CEO at Salesforce subsidiary Mulesoft, thinks that putting humans, who understand the data in the process could help tell the model when it’s right and when it’s not, what he calls “tuning for true.” That could help correct the AI when wrong and help improve models along the way.

“If the generative AI is helping create a workflow and generating code to help, the source of that code really matters,” Hayward said. “If the dataset we’ve trained the model on is all of our API’s, you can say the trust is quite high.”

He sees possibly developing a trust score based on where the data is coming from, and how much we can rely on the answers from a given set of data, an approach he thinks will be increasingly important.

People in fact remain a key part of Salesforce’s AI vision, Savarese said. “By enabling human-in-the-loop capabilities, users can verify the quality of the output of generative AI and intervene to fix hallucinations or other factual errors. This is both a powerful safety feature and an example of our core value at Salesforce AI, which is augmenting human talent rather than attempting to replace it,” he said.

Perez anticipates that part of his job, and that of all CIOs moving forward, will be ensuring that the company’s LLMs are using trusted data. “Remember the evolution of the CIO in the areas of security and privacy. We have had to really take a much stronger stance as CIOs to ensure that security is a priority, that privacy is priority. Well, now with generative AI, I think CIOs are going to have to also be like the guards of the castle and will have to ensure that there’s trusted data in support of AI,” he said.

It’s more than hallucinations

The hallucination issue is just one of the problems associated with generative AI. Another issue will be making sure that the generative AI doesn’t supply confidential company information or other sensitive data to people who aren’t supposed to see it.

Patrick Stokes, EVP and GM of platform at Salesforce, thinks that there will be limits put on what types of data can be put in the models to prevent this from happening. “Businesses and organizations like Salesforce are going to have to start to figure out what some of those swim lanes look like,” he said.

In practice that would mean looking at hiding certain fields from the model if it includes data you didn’t want unauthorized people seeing, but that’s still something that companies like Salesforce need to work out.

There’s also the issue of data ownership. For example, if you are creating a landing page on the fly, do you have permission to use the photos on that landing page (or the source of generated images)? These kinds of legal issues could slow enterprise enthusiasm for generative AI until there are clearer answers.

It’s going to be imperative to solve all of these problems, and others that are sure to arise, as we insert generative AI into more of our software. But of all the issues, limiting hallucinations is going to be paramount because everyone using the generative AI capabilities in Salesforce (and all enterprise software) is going to need to trust that the answers they are getting from the system are true and accurate and not putting the company at risk.

Salesforce is making a big bet that using its own data in LLMs will be the key to doing this. Time will tell if this is right, or at least, if it can help limit the problem.