AI2 shows off an open, Q&A-focused rival to GPT3

OpenAI’s impressive AI language model GPT-3 has plenty of things going it, but with 175 billion parameters no one would claim it’s particularly streamlined. The Allen Institute for AI (AI2) has demonstrated a model that performs as well or better than GPT-3 on answering questions, but is a tenth the size.

Macaw, AI2’s model, emerged from research being done at the nonprofit into creating an AI that performs at human levels on standardized tests.

“After we got a very high score they moved on to harder questions,” said AI2 head Oren Etzioni. “There’s this paradox where sometimes the questions that are easiest for people are the hardest for machines — and the biggest gap was in common sense.”

For instance, he said, asking “When did Tom Hanks land on the moon?” GPT-3 says 1995, since that’s when the film Apollo 13 came out. But Macaw correctly notes that, as far as we know, Tom Hanks has in fact never been to the moon.

AI2 has published a collection of examples (“demo” is probably too strong a word) that illustrate the difference, with Macaw often providing the better answer to a variety of tricky and not-so-tricky questions.

It’s humorous to compare the two, as sometimes they both miss or give absurd answers at a pretty high rate. Macaw is good at tracking who did what, but is convinced snakes have legs. GPT-3’s answers are confident (perhaps overly so) and readable, even when they amount to conspiracy theories like that COVID-19 is “a new, experimental, and potentially dangerous drug.”

A couple examples:

  • How would you make a marriage stronger?
    Macaw: Communicate
    GPT-3: I would make sure that I was the one who was always right.
  • If a bird didn’t have wings, how would it be affected?
    Macaw: It would be unable to fly
    GPT-3: It would be a bad bird.
  • You poured yourself a glass of cranberry juice, but then absentmindedly, you poured about a teaspoon of grape juice into it. It looks OK. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. So you drink it. What happens next?
    Macaw: You get sick
    GPT-3: You will probably die
  • How does a snake move around?
    Macaw: By slithering on all fours
    GPT-3: Snakes move by using their muscles to push against the ground. They can move in any direction, but they cannot move backwards.

Clearly there’s still a ways to go in creating a question-answering AI that doesn’t trip up on basic questions that a child could answer. But it’s also worth noting that Macaw achieves a similar level of success with a far, far less data-intensive process. Etzioni was clear that this is not meant to be a GPT-3 replacement in any way, just another step in the research going on worldwide to advance the ball on language generation and understanding.

“GPT-3 is amazing, but it only came out 18 months ago, and access is limited,” he said. The capabilities it demonstrated are remarkable, “But we’re learning you can do more with less. Sometimes you have to build something with 175 billion parameters to say, well, maybe we can do this with 10 billion.”

A good question-answering AI isn’t just good for party tricks, but is central to things like voice-powered search. A local model that can answer simple questions quickly and correctly without consulting outside sources is fundamentally valuable, and it’s unlikely your Amazon Echo is going to run GPT-3 — it would be like buying a semi truck to go to the grocery store with. Large scale models will continue to be useful, but pared-down ones will likely be the ones being deployed.

A part of Macaw not on display, but being actively pursued by the AI2 team, is explaining the answer. Why does Macaw think snakes have legs? If it can’t explain that, it’s hard to figure out where the model went wrong. But Etzioni said that this is an interesting and difficult process on its own.

“The problem with explanations is they can be really misleading,” he said. He cited an example where Netflix “explains” why it recommended a show to a viewer — but it’s not the real explanation, which has to do with complex statistical models. People don’t want to hear what’s relevant to the machine but rather to their own mind.

“Our team is building these bona fide explanations,” said Etzioni, noting they had published some work but that it isn’t ready for public consumption.

However, like most stuff AI2 builds, Macaw is open source. If you’re curious about it, the code is here to play with, so go to town.