AI Training and Category Confusion
I keep hearing people say Open AI has no room to complain if DeepSeek distilled their data, since they crawled the web for data. The article on the Verge about this seems to think that way. We need to quit being lazy and start making careful distinctions in life again; a basic requirement if we are going to ever think seriously on anything.
The web has existed from early on dependent upon spiders grabbing publicly exposed data to make things like search engines work. AI is an extension of that, quite directly, as we’re seeing AI chatbots start to supplant search engines. If a human can read something and learn from it (because it is publicly released), artificial intelligence that’s genuinely going to show any “intelligence” needs the same opportunity.
This is different from signing up for a license to use an API — something that is “paywalled” so to speak — and then breaking the license agreement, which is what DeepSeek did if it used OpenAI’s programming interface to teach its own model. That’s akin to music or software piracy. There’s a difference between learning from a book and writing about it and taking a chapter out of a book and reprinting it — proper AI training is like the former, distilling is like the latter.
Start the Conversation