Implementing NLP search on fediverse

I’d like to explore a project for NLP based search on the fediverse. But I’m a fediverse beginner and am not sure if it’s possible to index fediverse content.

My general idea is -

Set up my own read-only instance, let’s say of kbin. I’m not sure if the concept of a read-only instance makes sense. It’s read-only because the instance only needs to be able to read the content already on the fediverse and doesn’t need the ability to post content.
At some regular interval, let’s say once a day, monitor any changes in the content from the previous run. I’m not sure if there is a single “fediverse” where all the content can be read from. If not, then I can start with tracking the same content as on kbin.social. Is it possible to monitor changes to content on a kbin instance?
I’ll convert the content into vector embeddings by a using an NLP ML model like CLIP. The embeddings will be stored in a vector store. The vector store will also include the url of the content as metadata.
When a user requests a search, the search term is converted to its vector embedding using the same ML model and the most similar vectors are identified.
The user gets the search results as urls of the most relevant content, and perhaps a preview of the content. The user can then access the full content from where it’s originally posted using its url.

I’m comfortable with setting up steps 3 and 4. But I do not know the fediverse enough to answer whether steps 1, 2, and 5 would work or even make sense how I’m envisioning them.

Can some of the fediverse veterans help me understand if this is a feasible approach or if I’ve got it all wrong?

Chat

noodlejetski@kbin.social
link
fedilink
arrow-up
2·
2 years ago
prepare for a ton of instances defederating from yours on day 1.

Fediverse@kbin.social

fediverse@kbin.social

Create a post

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

This magazine is dedicated to discussions on the federated social networking ecosystem, which includes decentralized and open-source social media platforms. Whether you are a user, developer, or simply interested in the concept of decentralized social media, this is the place for you. Here you can share your knowledge, ask questions, and engage in discussions on topics such as the benefits and challenges of decentralized social media, new and existing federated platforms, and more. From the latest developments and trends to ethical considerations and the future of federated social media, this category covers a wide range of topics related to the Fediverse.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

2 users / day
2 users / week
2 users / month
2 users / 6 months
0 local subscribers
1 subscriber
774 Posts
4.06K Comments
Modlog

mods: