The goal of a new initiative that launches today is to give regular people control over their interactions with OpenAI’s ChatGPT in order to change the power dynamics behind artificial intelligence training data.
The project, known as the ChatGPT Data Collective, enables users to participate by uploading their individual chat histories in order to get insights and receive incentives for this data. In collaboration with Vana, a service from the Vana Foundation that bills itself as a “home for people who believe the future of AI starts with human-owned data,” the collective was established.
By submitting a.zip file containing their exported ChatGPT history, users can become members of the ChatGPT Data Collective. Complete ChatGPT chats, AI comments, and metadata like language settings and subscription level are all included in the dataset. From the questions posed to the way tone and curiosity change over time, the pieces collectively provide an insight into how people interact with machines.
Depending on the volume and caliber of their data, users receive $GPT tokens in exchange for uploading their datasets.
$GPT tokens serve as a tool for governance as well as a reward. Voters can decide who has access to the shared dataset, how it is utilized, and whether new tools should be supported. Journaling capabilities, memory analysis, and reflecting AI experiences based on personal data are possible future uses.
Because the initiative is set up as a decentralized autonomous organization, or DAO, community members rather than a single business are in charge of governance. Since data is secured until users decide to participate and can be changed or removed at any moment, it also provides complete transparency.
The ChatGPT Data Collective comes as big AI firms have been under fire for leveraging user interactions to train models without getting permission or payment. According to the collective, it provides an alternative with a user-driven, privacy-first strategy that seeks to rebalance who gains from the enormous value produced by AI training data.