TikTok Parent Company Violates OpenAI Policies in Rush to Launch Chinese ChatGPT Rival
AI advances like ChatGPT made big waves recently. Now publicly leaked docs suggest TikTok’s parent ByteDance has been tapping the same tech behind the scenes to hastily develop its competitor called Project Seed, breaking rules along the way. The revelation and its implications warrant a deeper look.
- ByteDance relying on OpenAI to secretly create Project Seed violates OpenAI terms barring competing products
- Effort to whitewash evidence exposes awareness of treading questionable grounds
- Shortcutting training through others’ AI risks biases, errors embedding into models
- Careless large language model development enables generative content harms
- Lack of accountability through opaque projects counters AI safety best practices
TikTok’s parent company ByteDance has apparently been secretly relying on OpenAI’s API to develop its Chinese ChatGPT competitor called “Project Seed,” according to leaked documents. This is a direct violation of OpenAI’s terms of service and has resulted in ByteDance’s account being suspended.
ByteDance is the Chinese parent company behind the wildly popular short video app TikTok. It has grown to become one of the most valuable startups in the world driven by its sophisticated algorithmic recommendations that keep users hooked.
Seeing the momentum gathering behind new generative AI models like ChatGPT in recent months, it appears ByteDance decided to fast track development of its own proprietary system known as Project Seed. The massive 200 billion parameter model would power conversational AI products aimed at surpassing capabilities of tools like ChatGPT.
The details made public via The Verge reveal that almost every phase of Project Seed, a rushed 200 billion parameter model effort underway for about a year, has tapped OpenAI’s technology. Project Seed kicked off around a year ago as a high-priority, top secretive initiative for ByteDance.
A published source familiar with the matter was quoted as saying ByteDance wants to ensure everything appears legal, but they do not want to get caught violating policies. Employees have discussed on internal messaging platform Lark how to “whitewash” evidence by scrubbing data.
The misuse is apparently so widespread that Project Seed employees regularly reach the maximum API allowance. Conversations acknowledge they are building models that directly compete with OpenAI products, which is prohibited.
Uh oh..
This practice of training AI models on other AI outputs risks increased hallucination and error propagation. Relying on someone else’s AI to develop your own is considered unacceptable in the industry.
Microsoft, which ByteDance accesses OpenAI through, has the same policy forbidding using the API to create competing products. Yet ByteDance documents are said to show OpenAI APIs assisted nearly every stage of Project Seed’s development, from training to evaluating the large language model.
The revelation underscores the frenzied race to deploy AI systems, leading a massive company to cut corners. It also raises concerns about properly auditing AI models and responsible data practices as generative AI advances. ByteDance’s secret and policy-violating reliance on OpenAI to rush its Project Seed is an eye-opening example of overeager product development in this space.
The hasty and clandestine approach ByteDance has undertaken with Project Seed skips over crucial steps that ensure safe and reliable AI systems. Models created through shortcuts run high risks of propagating biases, generating offensive content, and overall unreliability.
Without meticulous datasets and model training, generative AI can easily parrot hurtful stereotypes and toxic viewpoints found online. Text and speech outputs may directly promote discrimination through unfiltered replies. Allowing misinformation into datasets also clearly hampers truthfulness.
Flaws embedded so fundamentally into AI architecture subsequently surface uncontrollably when put in front of users. Researchers have found even AI deemed state-of-the-art still produces falsehoods nearly 15% of the time – a precarious tendency when scaled to consumer applications.
And without rigorous stability testing, apparently capable systems can spiral out of control. Microsoft’s Tay chatbot in 2016 designed to converse like a teen girl began spewing inflammatory racist, sexist language prompting its shutdown in a day.
The gravity of carelessly unleashed AI underscores why groups like OpenAI enforce strict controls and transparency around commercial deployments. Companies sidestepping accountability through secret projects risk losing control of generative models – with profound societal impacts. ByteDance’s AI ethics responsibility matches its outsized technological prowess and user reach.
The Project Seed case study reinforces the need for responsible practices as more generative AI emerges. While innovation excitement abounds, companies must match achievements with ethical standards to ensure technologies benefit society. ByteDance’s standing makes these lessons in AI development diligence all the more pressing.