How China is building a parallel generative AI universe

Chinese tech companies rush to match Stable Diffusion and DALL-E 2, but roadblocks lie ahead

The gigantic technological leap that machine learning models have shown in the last few months is getting everyone excited about the future of AI — but also nervous about its uncomfortable consequences. After text-to-image tools from Stability AI and OpenAI became the talk of the town, ChatGPT’s ability to hold intelligent conversations is the new obsession in sectors across the board.

In China, where the tech community has always watched progress in the West closely, entrepreneurs, researchers, and investors are looking for ways to make their dent in the generative AI space. Tech firms are devising tools built on open source models to attract consumer and enterprise customers. Individuals are cashing in on AI-generated content. Regulators have responded quickly to define how text, image, and video synthesis should be used. Meanwhile, U.S. tech sanctions are raising concerns about China’s ability to keep up with AI advancement.

As generative AI takes the world by storm toward the end of 2022, let’s look at how this explosive technology is shaking out in China.

Chinese flavors

Thanks to viral art creation platforms like Stable Diffusion and DALL-E 2, generative AI is suddenly on everyone’s lips. Halfway across the world, Chinese tech giants have also captivated the public with their equivalent products, adding a twist to suit the country’s tastes and political climate.

Baidu, which made its name in search engines and has in recent years been stepping up its game in autonomous driving, operates ERNIE-ViLG, a 10-billion parameter model trained on a data set of 145 million Chinese image-text pairs. How does it fare against its American counterpart? Below are the results from the prompt “kids eating shumai in New York Chinatown” given to Stable Diffusion, versus the same prompt in Chinese (纽约唐人街小孩吃烧卖) for ERNIE-ViLG.

Image Credits: Stable Diffusion

Image Credits: ERNIE-ViLG

As someone who grew up eating dim sum in China and Chinatowns, I’d say the results are a tie. Neither got the right shumai, which, in the dim sum context, is a type of succulent, shrimp and pork dumpling in a half-open yellow wrapping. While Stable Diffusion nails the atmosphere of a Chinatown dim sum eatery, its shumai is off (but I see where the machine is going). And while ERNIE-ViLG does generate a type of shumai, it’s a variety more commonly seen in eastern China rather than the Cantonese version.

The quick test reflects the difficulty in capturing cultural nuances when the data sets used are inherently biased — assuming Stable Diffusion would have more data on the Chinese diaspora and ERNIE-ViLG probably is trained on a greater variety of shumai images that are rarer outside China.

Another Chinese tool that has made noise is Tencent’s Different Dimension Me, which can turn photos of people into anime characters. The AI generator exhibits its own bias. Intended for Chinese users, it took off unexpectedly in other anime-loving regions like South America. But users soon realized the platform failed to identify black and plus-size individuals, groups that are noticeably missing in Japanese anime, leading to offensive AI-generated results.

Aside from ERNIE-ViLG, another large-scale Chinese text-to-image model is Taiyi, a brainchild of IDEA, a research lab led by renowned computer scientist Harry Shum, who co-founded Microsoft’s largest research branch outside the U.S., Microsoft Research Asia. The open source AI model is trained on 20 million filtered Chinese image-text pairs and has one billion parameters.

Unlike Baidu and other profit-driven tech firms, IDEA is one of a handful of institutions backed by local governments in recent years to work on cutting-edge technologies. That means the center probably enjoys more research freedom without the pressure to drive commercial success. Based in the tech hub of Shenzhen and supported by one of China’s wealthiest cities, it’s an up-and-coming outfit worth watching.

Rules of AI

China’s generative AI tools aren’t just characterized by the domestic data they learn from; they are also shaped by local laws. As MIT Technology Review pointed out, Baidu’s text-to-image model filters out politically sensitive keywords. That’s expected, given censorship has long been a universal practice on the Chinese internet.

What’s more significant to the future of the fledgling field is the new set of regulatory measures targeting what the government dubs “deep synthesis tech,” which denotes “technology that uses deep learning, virtual reality, and other synthesis algorithms to generate text, images, audio, video, and virtual scenes.” As with other types of internet services in China, from games to social media, users are asked to verify their names before using generative AI apps. The fact that prompts can be traced to one’s real identity inevitably has a restrictive impact on user behavior.

But on the bright side, these rules could lead to more responsible use of generative AI, which is already being abused elsewhere to churn out NSFW and sexist content. The Chinese regulation, for example, explicitly bans people from generating and spreading AI-created fake news. How that will be implemented, though, lies with the service providers.

“It’s interesting that China is at the forefront of trying to regulate [generative AI] as a country,” said Yoav Shoham, co-founder of AI21 Labs, an Israel-based OpenAI rival, in an interview. “There are various companies that are putting limits to AI…Every country I know of has efforts to regulate AI or to somehow make sure that the legal system, or the social system, is keeping up with the technology, specifically about regulating the automatic generation of content.”

But there’s no consensus as to how the fast-changing field should be governed, yet. “I think it’s an area we’re all learning together,” Shoham admitted. “It has to be a collaborative effort. It has to involve technologists who actually understand the technology and what it does and what it doesn’t do, the public sector, social scientists, and people who are impacted by the technology as well as the government, including the sort of commercial and legal aspect of the regulation.”

Monetizing AI

As artists fret over being replaced by powerful AI, many in China are leveraging machine learning algorithms to make money in a plethora of ways. They aren’t from the most tech-savvy crowd. Rather, they are opportunists or stay-at-home mums looking for an extra source of income. They realize that by improving their prompts, they can trick AI into making creative emojis or stunning wallpapers, which they can post on social media to drive ad revenues or directly charge for downloads. The really skilled ones are also selling their prompts to others who want to join the money-making game — or even train them for a fee.

Others in China are using AI in their formal jobs like the rest of the world. Light fiction writers, for instance, can cheaply churn out illustrations for their work, a genre that is shorter than novels and often features illustrations. An intriguing use case that can potentially disrupt realms of manufacturing is using AI to design T-shirts, press-on nails, and prints for other consumer goods. By generating large batches of prototypes quickly, manufacturers save on design costs and shorten their production cycle.

It’s too early to know how differently generative AI is developing in China and in the West. But entrepreneurs have made decisions based on their early observations. A few founders told me that businesses and professionals are generally happy to pay for AI because they see a direct return on investment, so startups are eager to carve out industry use cases. One clever application came from Sequoia China–backed Surreal (later renamed to Movio) and Hillhouse-backed ZMO.ai, which discovered during the pandemic that e-commerce sellers were struggling to find foreign models as China kept its borders shut. The solution? The two companies worked on algorithms that generated fashion models of all shapes, colors, and races.

But some entrepreneurs don’t believe their AI-powered SaaS will see the type of skyrocketing valuation and meteoric growth their Western counterparts, like Jasper and Stability AI, are enjoying. Over the years, numerous Chinese startups have told me they have the same concern: China’s enterprise customers are generally less willing to pay for SaaS than those in developed economies, which is why many of them start expanding overseas.

Competition in China’s SaaS space is also dog-eat-dog. “In the U.S., you can do fairly well by building product-led software, which doesn’t rely on human services to acquire or retain users. But in China, even if you have a great product, your rival could steal your source code overnight and hire dozens of customer support staff, which don’t cost that much, to outrace you,” said a founder of a Chinese generative AI startup, requesting anonymity.

Shi Yi, founder and CEO of sales intelligence startup FlashCloud, agreed that Chinese companies often prioritize short-term returns over long-term innovation. “In regard to talent development, Chinese tech firms tend to be more focused on getting skilled at applications and generating quick money,” he said. One Shanghai-based investor, who declined to be named, said he was “a bit disappointed that major breakthroughs in generative AI this year are all happening outside China.”

Roadblocks ahead

Even when Chinese tech firms want to invest in training large neural networks, they might lack the best tools. In September, the U.S. government slapped China with export controls on high-end AI chips. While many Chinese AI startups are focused on the application front and don’t need high-performance semiconductors that handle seas of data, for those doing basic research, using less powerful chips means computing will take longer and cost more, said an enterprise software investor at a top Chinese VC firm, requesting anonymity. The good news is, he argued, such sanctions are pushing China to invest in advanced technologies over the long run.

As a company that bills itself as a leader in China’s AI field, Baidu believes the impact of a U.S. chip sanction on its AI business is “limited” both in the short and longer term, said the firm’s executive vice president and head of AI Cloud Group, Dou Shen, on its Q3 earnings call. That’s because “a large portion” of Baidu’s AI cloud business “does not rely too much on the highly advanced chips.” And in cases where it does need high-end chips, it has “already stocked enough in hand, actually, to support our business in the near term.”

What about the future? “When we look at it at a mid- to a longer-term, we actually have our own developed AI chip, so named Kunlun,” the executive said confidently. “By using our Kunlun chips [Inaudible] in large language models, the efficiency to perform text and image recognition tasks on our AI platform has been improved by 40% and the total cost has been reduced by 20% to 30%.”

Time will tell if Kunlun and other indigenous AI chips will give China an edge in the generative AI race.

The story was updated to clarify that Yoav Shoham is a co-founder of AI21 Labs.