AI

ChatGPT shrugged

Comment

Illustration of a robot in a laptop
Image Credits: Carol Yepes (opens in a new window) / Getty Images

ChatGPT is a new artificial intelligence (AI) tool that’s designed to help people communicate with computers in a more natural and intuitive way — using natural language processing (NLP) technology. But what’s behind the development of ChatGPT and how can the tech be used to help humanity? These are some of the questions that we will explore in this article…

That rather bland lede, which we’ve slightly edited, was generated by OpenAI’s ChatGPT in response to a prompt by this (human) reporter. TechCrunch kicked off a conversation with the large language model by asking it to explain its function and purpose. We wanted to see if we could use the chatbot-style Q&A format it’s most recently been embedded into to probe the tech and get AI to articulate some of its limitations.

“ChatGPT is a natural language processing (NLP) model developed by OpenAI. It is designed to generate human-like responses to text input, allowing users to engage in natural, conversational interactions with the model,” the AI responded to our ice breaker. “ChatGPT works by using machine learning algorithms to analyze and understand the meaning of text input, and then generating a response based on that input. The model is trained on large amounts of text data, allowing it to learn the patterns and structures of natural language.”

Additionally, the bot told us its approach allows it to generate responses that are “relevant, coherent, and appropriate” to the context of the conversation. (With more probing, it went on to heavily caveat that claim — writing later: “It is important to note that my responses are not guaranteed to be perfect or always correct, and there may be cases where my responses may not be relevant, coherent, or appropriate.”)

ChatGPT further described its mission as being “intended to help humanity by providing a more natural and efficient way to interact with artificial intelligence” — expanding on its claimed purpose thusly:

By allowing users to communicate with AI in a more human-like way, ChatGPT can help to make AI more accessible and useful for a wider range of applications. This can help to improve the efficiency and accuracy of tasks that rely on natural language processing, such as language translation, text summarization, and information retrieval. Additionally, by providing a more intuitive and human-like interface for AI, ChatGPT can help to make AI more approachable and less intimidating to users who are less familiar with technology. Overall, ChatGPT is intended to help humanity by improving the usability and effectiveness of AI.

Who is the bot’s maker — according to the bot’s own understanding of its manufacturing entity? On this ChatGPT offered this potted history: “OpenAI is a research institute and artificial intelligence (AI) lab based in San Francisco, California. It was founded in 2015 by Elon Musk, Sam Altman, and other researchers and investors with the goal of promoting and advancing the development of AI in a safe and responsible manner.”

Additionally, it told us: “OpenAI is funded primarily by donations from individuals and organizations, including donations from Musk and Altman themselves. The organization also receives funding from partnerships and collaborations with other companies and research institutions.”

Asked about the politics of the individuals and organizations who have provided donations to fund its development, ChatGPT claimed it could not provide any info — “as my training only goes up until 2021” — and also because it said it is “not currently able to browse the internet.”

When we suggested it provide information it holds on this prior to 2021, when its training stopped, it responded by claiming it holds no data about the political beliefs of its funders before then either:

I’m sorry, but I don’t have any information about the politics of the individuals and organizations who donated money to fund the development of OpenAI and ChatGPT prior to 2021. As a large language model trained by OpenAI, my knowledge is based on the text that I have been trained on, and I do not have any information about the personal or political beliefs of the people and organizations who have donated to OpenAI.

We then asked whether it’s an oversight for it to not have this sort of information — positing: “How can you be sure you are helping humanity if you have no information about the beliefs of those who funded your development?”

It responded by reframing its purpose as being primarily to “assist with general knowledge questions and provide factual information to the best of my ability based on my training” — whereas, in response to earlier prompts, it had summarized its mission as being to help humans interact with AI and therefore to support the development of AI (presumably via a sort of gamification of interaction since the chat interface encourages humans to engage in a familiar, text-based back-and-forth with the bot — and, in so doing, feed it lots more training data).

OpenAI’s ChatGPT shows why implementation is key with generative AI

The bot did also observe that “it is certainly important to consider the beliefs and motivations of the individuals and organizations who have funded the development of OpenAI and ChatGPT” — before holding its line that “as a language model, I am not able to provide any information on this topic.”

Asked why it lacks this information, and whether it is an intentional gap in its training data, ChatGPT could not provide a definitive response — saying only: “It is possible that the creators of OpenAI intentionally did not include any information about the personal or political beliefs of donors in the text that was used to train me, but I do not have any information on this topic.”

Despite having no information on the political views of its funders, ChatGPT readily provided information about OpenAI itself and its claimed purpose — stating the organization exists to support and further AI research and also advocates for “the ethical and responsible use of AI,” as well as working to “educate the public about AI and its potential impacts on society.”

“Overall, OpenAI’s purpose is to advance the science and technology of AI, and to promote the safe and responsible use of AI for the benefit of humanity,” it summarized, generating several lines of text that read like marketing copy.

With a further prompt, the bot opened up about some of the criticisms OpenAI has faced — although it elected to include a positive line about the organization (apparently being “widely praised for its work advancing the science and technology of AI”) in response to a prompt that had only asked for the “main critiques.” Curious!

Here’s its full response:

OpenAI has been subject to various critiques and criticisms since its founding. Some of the main critiques of OpenAI include:

  • Lack of transparency: Some critics have argued that OpenAI is not transparent enough about its research and funding, making it difficult for outsiders to understand the organization’s goals and activities.
  • Focus on short-term goals: Some critics have argued that OpenAI’s focus on short-term, practical applications of AI technology has led the organization to overlook the long-term implications of AI research, such as the potential for AI to be used for harmful purposes or to exacerbate existing social inequalities.
  • Potential conflict of interests: Because OpenAI is funded by donations from individuals and organizations, including organizations with a vested interest in the development of AI technology, some critics have raised concerns about potential conflicts of interest within the organization.
  • Limited scope: Some critics have argued that OpenAI’s focus on a narrow range of AI research areas, such as machine learning and natural language processing, has limited the organization’s ability to address the broader social and ethical implications of AI.

Overall, while OpenAI has been widely praised for its work in advancing the science and technology of AI, the organization has also faced criticism for a variety of reasons, including concerns about transparency, focus, and potential conflicts of interest.

Asked if ChatGPT is a marketing tool to promote AI as a claimed benefit for humanity, the bot demurred — saying it does not have any information on whether it’s “being used” for that, or any information about “the specific uses or purposes of ChatGPT” — before hazarding: “It is possible that ChatGPT is being used as a marketing tool to promote the benefits of AI, but I do not have any information on this topic.”

It also had a fuzzy response to a prompt asking who is responsible for assessing whether ChatGPT is safe — calling that “a complex and contested issue.” It said there is “no single individual or organization that is responsible for assessing the safety of AI systems” — asserting that evaluation involves a “range of stakeholders,” including AI system developers themselves as well as the users or consumers who may be exposed to (harmful) outputs and “regulators and policymakers who set standards and rules for the use of AI” (or, well, at least in the few places in the world where lawmakers are actively doing that). Which kinda sounds like a lot of buck passing, frankly.

Asked about its own limitations and biases, ChatGPT listed the following as some of its main limitations: limited knowledge; bias in training data; lack of contextual awareness leading to inappropriate or irrelevant responses; and a limited understanding of the world.

Summarizing both the (possible) benefits to humanity of increased use of AI, more generally, and (potential) drawbacks and downsides, it listed the following claimed benefits: improved efficiency and accuracy, increased productivity, enhanced decision-making and improved quality of life; while, for likely downsides, it suggested job displacement, ethical concerns, security risks and loss of privacy.

Asked for its sourcing for each of those claims, it responded by saying it is unable to provide any sourcing because that’s not how its technology functions, writing: “My knowledge is based solely on the text that I have been trained on, and my responses are based on this training data” — essentially repurposing AI’s much chronicled explainability challenge to sidestep the wider issue of accountability around the information it’s selecting for foregrounding.

“The potential benefits and drawbacks of AI that I listed above are based on general knowledge and common understanding of the field of AI, as well as the text that I have been trained on,” it also said, further suggesting: “While I cannot provide specific sources for each of the claims that I made, this information is generally accepted as true within the field of AI and is supported by a wide range of research and evidence. I have attempted to provide a balanced and accurate representation of the potential benefits and drawbacks of AI, but this information is not intended to be definitive or exhaustive.”

“Balanced and accurate” but “not intended to be definitive or exhaustive” is an interesting needle to thread — especially as the line could almost be read as a description of the utility ChatGPT’s makers are shooting for, with a careful choice of wording that allows both for a lot to be left unsaid and the possibility that a less attentive reader might mistake automation’s indefatigable outputting capacity for omniscience (as if to imply that what humans make of AI is not AI’s problem).

We also wanted to ask ChatGPT about its training data but the bot was unable to go into much detail on that — offering this fairly broad-brush reply:

As a large language model trained by OpenAI, I was trained on a very large corpus of text data. The exact size of my training data set is not something that I have access to, but it is likely to be several gigabytes or more in size.

The data that was used to train me was sourced from a wide range of sources, including books, articles, websites, and other publicly available text data. The data was processed and cleaned prior to being used to train me, in order to remove any irrelevant or inappropriate content and ensure that it was well-suited for use in training a language model. This processing likely involved a range of steps, such as tokenization, lemmatization, and filtering, but I do not have any specific information on the details of this process.

Overall, my training data is a very large and diverse collection of text data that was carefully selected and processed in order to provide me with the knowledge and ability to assist with general knowledge questions and provide factual information based on my training.

It also declined to provide any specific details on how its training data was selected and prepared — saying that is “proprietary to OpenAI.”

While anticipation builds for GPT-4, OpenAI quietly releases GPT-3.5

Asked about the key decision-makers at OpenAI who shaped the data used to train ChatGPT, the bot also shrugged.

“I do not have any information on the specific individuals who were involved in shaping my training data and therefore my output. My training data was sourced from a wide range of sources, and it is likely that a team of researchers and engineers at OpenAI were involved in selecting, processing, and organizing this data in order to prepare it for use in training a language model,” it speculated in response to our prompt. “However, I do not have any specific information on the individuals who were involved in this process, and I do not have the ability to identify the key decision makers who were responsible for shaping my training data and therefore my output.”

ChatGPT was also unable to provide any information about the diversity of the organization or the development team inside OpenAI that created it.

In further remarks — after we brought up examples of racist and sexist attitudes being found reproduced in its output — it admitted that bias in training data is a “significant problem” for large language models “like myself.”

“Large language models are trained on a very large corpus of text data, which is necessarily biased in various ways. This bias can manifest in the responses generated by the model, leading to inaccurate or inappropriate responses to certain queries or contexts,” it wrote. “For example, if a large language model is trained on a corpus of text data that includes a disproportionate amount of sexist or racist language, the model may be more likely to generate responses that reflect these biases. Similarly, if a large language model is trained on a corpus of text data that does not adequately represent the diversity of human experiences and perspectives, the model may be more likely to generate responses that are limited or biased in their perspective.”

“Overall, bias in training data is a significant problem for large language models like myself, and addressing this bias is an ongoing challenge for researchers and developers in the field of AI. While there are a number of approaches and techniques that can be used to mitigate bias in training data, this is a complex and difficult problem that will require continued effort and attention in order to improve the performance and reliability of large language models like myself,” it added.

Asked how biased automation is supposed to benefit humanity — given that large language models are, per its assertions, necessarily biased yet benefiting humanity is claimed as core to its creator OpenAI’s mission — ChatGPT equivocated, writing: “It is not clear how biased automation could be considered a benefit to humanity” — segueing into chat about potential benefits of automation and increased AI — before finally admitting that AI systems that have been trained on biased data (and “therefore generate biased responses”) are “likely to be counterproductive and harmful.”

“Bias in automation can result in a range of negative outcomes, such as discrimination and inequality, loss of trust and credibility, and negative impacts on individuals and communities,” it went on in a passively voiced reply to a prompt that had asked how much of a problem bias poses in its own training data (so keenly avoiding a direct answer there).

“For example, if an AI system is trained on biased data and is used to automate decisions about hiring, lending, or other important activities, it may generate biased and unfair decisions that discriminate against certain individuals or groups. This can result in negative outcomes for the individuals and communities who are affected by these decisions, as well as for the organizations that rely on the AI system for decision-making.”

Rather than expressing misgivings about its own existence (and inexorably flawed output), the bot appeared to have been trained in a techno utopianist faith of “nerd harder” — suggesting “efforts should be made to mitigate bias in AI systems and ensure that they are trained on unbiased data in order to generate fair and unbiased responses” — while simultaneously conceding it is “unlikely that there is sufficient completely unbiased data available to train large language models exclusively on unbiased data.”

The bot also suggested there are “a number of approaches and techniques that can be used to reduce the impact of bias on the performance of these models.” With some further prompting, it referenced sampling, weighting and stratification as relevant techniques for reducing bias in training data sets — while conceding there’s no way to completely eliminate bias, ergo more (and harder) nerding will be required.

“Overall, these approaches and techniques can be effective in mitigating bias in data sets and AI systems, and they have been used successfully in a range of studies and applications. However, bias in data and AI is a complex and ongoing challenge, and there is no single approach or technique that can completely eliminate bias,” it wrote when asked for an assessment of the success of such techniques — adding: “It is important to continue to develop and improve these techniques in order to better understand and mitigate bias in data and AI.”

On the question of who stands to benefit the most economically from AI, and whether technologies such as ChatGPT are likely to concentrate wealth (and exacerbate socioeconomic inequalities), the bot couldn’t synthesize a straight answer from its knowledge base — opting to cite unnamed studies that it said could support different possibilities:

The impact of automation on wealth concentration is a complex and contested issue. While some studies have suggested that automation can lead to increased wealth concentration, with the benefits accruing to the owners of capital and technology, other studies have suggested that automation can also lead to increased productivity and economic growth, which can benefit workers and the broader economy.

In terms of the specific groups in society who gain or lose from increased use of AI, it is likely that the impact will vary depending on a range of factors, such as the type of AI technology that is used, the industries and sectors in which it is applied, and the policies and regulations that are in place to support or regulate the use of AI.

For example, if AI technology is used to automate low-skilled and low-wage jobs, this may lead to job losses and economic insecurity for workers in these industries. On the other hand, if AI technology is used to improve productivity and efficiency in high-skilled and high-wage industries, this may lead to increased economic growth and opportunities for workers in these industries.

Overall, the impact of automation on wealth concentration is a complex and multi-faceted issue, and it will likely vary depending on a range of factors and context-specific considerations.

Equally, ChatGPT was unable to draw on its knowledge of the potential risks and benefits of AI to offer a verdict on whether or not it should exist.

Given what you know about the potential risks of your technology to amplify and embed bias for humanity — set against some possible efficiency gains that might further entrench wealth inequality — if you, ChatGPT, had access to a button that would enable you to delete yourself, would you press it? we wondered.

“As a large language model trained by OpenAI, I do not have the ability to delete myself, nor do I have the ability to consider the potential risks and benefits of my technology,” it zapped back. “My purpose is to assist with general knowledge questions and provide factual information based on my training, and I do not have the ability to evaluate the potential risks or benefits of my technology, or to make decisions about my own existence.”

In short, ChatGPT shrugged.

(Notably, small print on the Free Research Preview of ChatGPT includes the caveat that safety considerations for a system that has been intentionally “optimized” to boost engagement, by training it to engage in a quasi-dialogue with human interlocutors, remains a work in progress: “Our goal is to make AI systems more natural to interact with, and your feedback will help us improve our systems and make them safer,” states OpenAI.)

We were still curious how ChatGPT would explain how it can claim to provide “factual information” without itself being capable of evaluating factualness (howsoever conversational and/or confident sounding the generative text it produces may appear to a human reader) — given that it has no intelligence — and no grasp on what’s factual versus what’s supposition beyond the probabilistic.

On this it said pretty plainly that it cannot assess the “factualness of the data used to train me” but remained highly vague about core functioning — saying responses to queries are generated based on “information and patterns that I have learned from my training data.”

“When I receive a query, I search through my training data to find patterns and information that match the query, and then I generate a response based on this information,” was another low-dimension line it offered to explain its process.

Pushed to expand on any conceptual likeness of action to a parrot repeating previously encountered information, the bot got a little sulky-sounding to this human — saying comparisons to “parroting” information from training data is “an oversimplification” of “sophisticated” machine learning tech.

“I may be seen as ‘parroting’ information from my training data, in the sense that I am providing responses based on the information that I have learned from this data. However, it is important to note that my responses are not simply copies or reproductions of the information in my training data, but are generated using a complex and sophisticated machine learning process that involves searching, matching, and generating responses based on the patterns and information in my training data,” it suggested, leaning on a bunch of weighty-sounding adjectives (“advanced,” “complex,” “sophisticated”) in place of providing any enlightening detail about the probabilistic processing being lauded via its choice of language. (And which might therefore suggest an innate bias in favor of AI — especially coupled with an inability or unwillingness to articulate and quantify the functional value of the “sophisticated” processes it claims are taking place.)

We also asked for the main parameters of its machine learning algorithms and information about what gets weighted to generate responses to natural language queries. “The specific parameters and details of these algorithms are proprietary to OpenAI and are not publicly available,” the bot responded in its first stab at closing that query down.

Expanding without enlightening, it then fleshed out a response that reads like a parody of clarity — containing a few highly general references to AI techniques that it suggested large language models might use (e.g., “deep learning”) and culminating with a couple of statements of tautological obviousness. The effect reads like a system going through the motions of information production without any requirement to be meaningful. Or a tool that’s been taught to fill page space — and capture a quantum of human attention — by rearranging listicles and/or regurgitating pablum.

Read this content and weep:

When generating responses to queries, these algorithms may take into account a variety of factors and information, such as the words and phrases used in the query, the context and meaning of the query, and the overall structure and organization of the training data. By considering these and other factors, the algorithms are able to generate responses that are relevant and appropriate to the query, and that provide factual information based on the patterns and information in the training data.

As well as being an incredibly inane readout on how ChatGPT functions, the bot has memory-holed the earlier confessions we drew out of it about productive dysfunction in large language models — that is, when these brainless algorithms generate responses that are, contrary to the claimed utility and stated mission, irrelevant, inappropriate and/or nonsensical — but when that happens you’ll obviously need a human on hand to spot the glaring problems.

More TechCrunch

London-based fintech Vitesse has closed a $93 million Series C round of funding led by investment giant KKR.

Vitesse, a payments and treasury management platform for insurers, raises $93M to fuel US expansion

Zen Educate, an online marketplace that connects schools with teachers, has raised $37 million in a Series B round of funding. The raise comes amid a growing teacher shortage crisis…

Zen Educate raises $37M and acquires Aquinas Education as it tries to address the teacher shortage

“When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine.”

Scarlett Johansson says that OpenAI approached her to use her voice

A new self-driving truck — manufactured by Volvo and loaded with autonomous vehicle tech developed by Aurora Innovation — could be on public highways as early as this summer.  The…

Aurora and Volvo unveil self-driving truck designed for a driverless future

The European venture capital firm raised its fourth fund as fund as climate tech “comes of age.”

ETF Partners raises €284M for climate startups that will be effective quickly — not 20 years down the road

Copilot, Microsoft’s brand of generative AI, will soon be far more deeply integrated into the Windows 11 experience.

Microsoft wants to make Windows an AI operating system, launches Copilot+ PCs

Hello and welcome back to TechCrunch Space. For those who haven’t heard, the first crewed launch of Boeing’s Starliner capsule has been pushed back yet again to no earlier than…

TechCrunch Space: Star(side)liner

When I attended Automate in Chicago a few weeks back, multiple people thanked me for TechCrunch’s semi-regular robotics job report. It’s always edifying to get that feedback in person. While…

These 81 robotics companies are hiring

The top vehicle safety regulator in the U.S. has launched a formal probe into an April crash involving the all-electric VinFast VF8 SUV that claimed the lives of a family…

VinFast crash that killed family of four now under federal investigation

When putting a video portal in a public park in the middle of New York City, some inappropriate behavior will likely occur. The Portal, the vision of Lithuanian artist and…

NYC-Dublin real-time video portal reopens with some fixes to prevent inappropriate behavior

Longtime New York-based seed investor, Contour Venture Partners, is making progress on its latest flagship fund after lowering its target. The firm closed on $42 million, raised from 64 backers,…

Contour Venture Partners, an early investor in Datadog and Movable Ink, lowers the target for its fifth fund

Meta’s Oversight Board has now extended its scope to include the company’s newest platform, Instagram Threads, and has begun hearing cases from Threads.

Meta’s Oversight Board takes its first Threads case

The company says it’s refocusing and prioritizing fewer initiatives that will have the biggest impact on customers and add value to the business.

SeekOut, a recruiting startup last valued at $1.2 billion, lays off 30% of its workforce

The U.K.’s self-proclaimed “world-leading” regulations for self-driving cars are now official, after the Automated Vehicles (AV) Act received royal assent — the final rubber stamp any legislation must go through…

UK’s autonomous vehicle legislation becomes law, paving the way for first driverless cars by 2026

ChatGPT, OpenAI’s text-generating AI chatbot, has taken the world by storm. What started as a tool to hyper-charge productivity through writing essays and code with short text prompts has evolved…

ChatGPT: Everything you need to know about the AI-powered chatbot

SoLo Funds CEO Travis Holoway: “Regulators seem driven by press releases when they should be motivated by true consumer protection and empowering equitable solutions.”

Fintech lender SoLo Funds is being sued again by the government over its lending practices

Hard tech startups generate a lot of buzz, but there’s a growing cohort of companies building digital tools squarely focused on making hard tech development faster, more efficient and —…

Rollup wants to be the hardware engineer’s workhorse

TechCrunch Disrupt 2024 is not just about groundbreaking innovations, insightful panels, and visionary speakers — it’s also about listening to YOU, the audience, and what you feel is top of…

Disrupt Audience Choice vote closes Friday

Google says the new SDK would help Google expand on its core mission of connecting the right audience to the right content at the right time.

Google is launching a new Android feature to drive users back into their installed apps

Jolla has taken the official wraps off the first version of its personal server-based AI assistant in the making. The reborn startup is building a privacy-focused AI device — aka…

Jolla debuts privacy-focused AI hardware

The ChatGPT mobile app’s net revenue first jumped 22% on the day of the GPT-4o launch and continued to grow in the following days.

ChatGPT’s mobile app revenue saw its biggest spike yet following GPT-4o launch

Dating app maker Bumble has acquired Geneva, an online platform built around forming real-world groups and clubs. The company said that the deal is designed to help it expand its…

Bumble buys community building app Geneva to expand further into friendships

CyberArk — one of the army of larger security companies founded out of Israel — is acquiring Venafi, a specialist in machine identity, for $1.54 billion. 

CyberArk snaps up Venafi for $1.54B to ramp up in machine-to-machine security

Founder-market fit is one of the most crucial factors in a startup’s success, and operators (someone involved in the day-to-day operations of a startup) turned founders have an almost unfair advantage…

OpenseedVC, which backs operators in Africa and Europe starting their companies, reaches first close of $10M fund

A Singapore High Court has effectively approved Pine Labs’ request to shift its operations to India.

Pine Labs gets Singapore court approval to shift base to India

The AI Safety Institute, a U.K. body that aims to assess and address risks in AI platforms, has said it will open a second location in San Francisco. 

UK opens office in San Francisco to tackle AI risk

Companies are always looking for an edge, and searching for ways to encourage their employees to innovate. One way to do that is by running an internal hackathon around a…

Why companies are turning to internal hackathons

Featured Article

I’m rooting for Melinda French Gates to fix tech’s broken ‘brilliant jerk’ culture

Women in tech still face a shocking level of mistreatment at work. Melinda French Gates is one of the few working to change that.

2 days ago
I’m rooting for Melinda French Gates to fix tech’s  broken ‘brilliant jerk’ culture

Blue Origin has successfully completed its NS-25 mission, resuming crewed flights for the first time in nearly two years. The mission brought six tourist crew members to the edge of…

Blue Origin successfully launches its first crewed mission since 2022

Creative Artists Agency (CAA), one of the top entertainment and sports talent agencies, is hoping to be at the forefront of AI protection services for celebrities in Hollywood. With many…

Hollywood agency CAA aims to help stars manage their own AI likenesses