I already had to use the cached version of a Reddit thread today to solve a technical issue I had with the rust compiler. There is so much valuable content there that is well indexed by search engines, let’s hope they don’t lock down the site even further to prevent AI’s from training on their data.
Although, in that case, Lemmy can take over as the searchable internet forum.
If they actually want to restrict ai training, they also have to restrict search engines. I may be behind the times, but usually those kind of questions have gone to a stack overflow sort of site I would have thought.
If they wanted to restrict AI Training they’ll need to prevent AI’s ability to view the website. Removing the API just removes the low bandwidth low impact manner of gathering the data. Scripts can just as easily HTTP scrape as they can use an API, but that’s a lot more resource intensive on Reddit’s side. Heck, this is the whole reason free public APIs became a thing in the first place.
I know there were at least a few projects not affiliated with IA that basically was a mirror copy of reddit. No idea what has happened to them at this point have not checked in a long time.
I’m pretty sure it’s only a matter of time till an LLM can solve any sort of obscure compiler issue. If organic data growth happens outside of reddit, it’s not going to be of much use once search engines catch to those other sources.
I already had to use the cached version of a Reddit thread today to solve a technical issue I had with the rust compiler. There is so much valuable content there that is well indexed by search engines, let’s hope they don’t lock down the site even further to prevent AI’s from training on their data.
Although, in that case, Lemmy can take over as the searchable internet forum.
If they actually want to restrict ai training, they also have to restrict search engines. I may be behind the times, but usually those kind of questions have gone to a stack overflow sort of site I would have thought.
If they wanted to restrict AI Training they’ll need to prevent AI’s ability to view the website. Removing the API just removes the low bandwidth low impact manner of gathering the data. Scripts can just as easily HTTP scrape as they can use an API, but that’s a lot more resource intensive on Reddit’s side. Heck, this is the whole reason free public APIs became a thing in the first place.
I wonder if the Internet Archive has preserved much of Reddit’s old posts and comments? No one seems to have mentioned it.
I know there were at least a few projects not affiliated with IA that basically was a mirror copy of reddit. No idea what has happened to them at this point have not checked in a long time.
I’m pretty sure it’s only a matter of time till an LLM can solve any sort of obscure compiler issue. If organic data growth happens outside of reddit, it’s not going to be of much use once search engines catch to those other sources.