Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.

Spent many years on Reddit and is now exploring new vistas in social media.

  • 0 Posts
  • 119 Comments
Joined 1 year ago
cake
Cake day: June 9th, 2023

help-circle



  • Another more general property that might be worth looking for would be substantially similar posts that get cross-posted to a wide variety of communities in a short period of time. That’s a pattern that can have legitimate reasons but it’s probably worth raising a flag to draw extra scrutiny.

    One idea for making it computationally lightweight but also robust against bots “tweaking” the wording of each post might be to fingerprint each post based on rare word usage. Spam is likely to mention the brand name of whatever product it’s hawking, which is probably not going to be a commonly used word. So if a bunch of posts come along that all use the same rare words all at once, that’s suspicious. I could also easily see situations where this gives false positives, of course - if some product suddenly does something newsworthy you could see a spew of legitimate posts about it in a variety of communities. But no automated spam checker is perfect.


  • And some of those hosts can decide to serve up their content to AI trainers. Some of those hosts can be run by AI trainers, specifically to gather data for training. If one was to try to prevent that then one would be attacking the open nature of the fediverse.

    There have been many people raging about their content being used to train AIs without permission or compensation. I’m speaking to those people, not the “fediverse collectively”. As you suggest, the fediverse can’t say anything collectively.





  • We’re sick of closed walled-garden monoliths like Reddit! Let’s move to an open federated protocol where anyone can participate and the APIs can’t be locked down!

    …wait, not like that!

    Yeah. This is what you signed up for when you joined the Fediverse, the ActivityPub protocol broadcasts your content to any other servers that ask for it. And just generally, that’s how the Internet works. You’re putting up a public billboard and expecting to be able to control who gets to look at it. That’s not going to work. Even robots.txt is just a gentleman’s agreement, it’s not enforceable.

    If you really want to prevent AI from training on your content with any degree of certainty you’re probably looking for a private forum of some kind that’s run by someone you trust.