• 0 Posts
  • 8 Comments
Joined 1 year ago
cake
Cake day: June 12th, 2023

help-circle
  • It would make it much easier to keep content online, as everybody could mirror content with close to zero effort. That’s quite opposite to today where content mirroring is essentially impossible, as all the links will still refer to the original source and still turn into 404s when that source goes down. That that file might still exist on another server is largely meaningless when you have no easy way to discover it and no way to tell if it is even the right file.

    The problem we have today is not storage, but locating the data.


  • How does authentication factor into this?

    That’s where it gets complicated. Git sidesteps the problem by simply being a file format, the downloading still happens over regular old HTTP, so you can apply all the same restrictions as on a regular website. IPFS on the other side ignores the problem and assumes all data is redistributable and accessible to everybody. I find that approach rather problematic and short sighted, as that’s just not how copyright and licensing works. Even data that is freely redistributable needs to declare so, as otherwise the default fallback is copyright and that doesn’t allow redistribution unless explicitly allowed. IPFS so far has no way to tag data with license, author, etc. LBRY (the thing behind Odysee.com) should handle that a bit better, though I am not sure on the detail.


  • but the reality is that most documents are generated on the spot from many sources of data.

    That’s only true due to the way the current Web (d)evolved into a bunch of apps rendered in HTML. But there is fundamentally no reason why it should be that way. The actual data that drives the Web is mostly completely static. The videos Youtube has on their server don’t change. The post on Reddit very rarely change. Twitter posts don’t change either. The dynamic parts of the Web are the UI and the ads, they might change on each and every access, or be different for different users, but they aren’t the parts you want to link to anyway, you want to link to a specific users comment, not a specific users comment rendered in a specific version of the Reddit UI with whatever ads were on display that day.

    Usenet did that (almost) correct 40 years ago, each message got an message-id, each message replying to that message would contain that id in a header. This is why large chunks of Usenet could be restored from tape archives and put be back together. The way content linked to each other didn’t depend on a storage location. It wasn’t perfect of course, it had no cryptography going on and depended completely on users behaving nicely.

    Doing so is definitely possible, particularly if they decide to cooperate with archival efforts.

    No, that’s the problem with URLs. This is not possible. The domain reddit.com belongs to a company and they control what gets shown when you access it. You can make your own reddit-archive.org, but that’s not going to fix the millions of links that point to reddit.com and are now all 404.

    All that said, if we limit ourselves to static documents, you still need to convince everyone to take part.

    The software world operates in large part on Git, which already does most of this. What’s missing there is some kind of DHT to automatically lookup content. It’s also not an all or nothing, take the Fediverse, the idea of distributing content is already there, but the URLs are garbage, like:

    https://beehaw.org/comment/291402

    What’s 291402? Why is the id 854874 when accessing the same post through feddit.de? Those are storage locations implementation details leaking out into the public. That really shouldn’t happen, that should be a globally unique content hash or a UUID.

    When you have a real content hash you can do fun stuff, in IPFS URLs for example:

    https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf

    The /ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf part is server independent, you can access the same document via:

    https://dweb.link/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf

    or even just view it on your local machine directly via the filesystem, without manually downloading:

    $ acrobat /ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf

    There are a whole lot of possibilities that open up when you have better names for content, having links on the Web that don’t go 404 is just the start.


  • That sounds insanely more resource heavy than just hosting the document itself on one instance somewhere.

    It really isn’t. Most content out there is already immutable, you don’t see people uploading the same Youtube video five times with minor changes or editing their images after the upload, most services don’t even allow that for users, at best you can delete and upload a new video.

    Furthermore, the blockchain would only contain metadata, not the actual data, so it’s automatically thousands of times easier to store than the data itself.

    Mirroring that content is a complete separate and optional part of the problem, the important part is having content named in such a way that I can go to a mirror and ask “do you have XYZ” and get an answer that you can trust. With URLs that’s impossible, as they can show different content whenever they want.

    Also this isn’t exactly a new idea, that’s how most software development already works these days. A Git repository stores a copy of every little change, and every download retrieves that complete history. What’s missing is some infrastructure on top of that that links all the different repositories together into one namespace (GitHub kind of does that internally, but that’s of no help for repositories hosted elsewhere).


  • Ultimately this is a problem that’s never going away until we replace URLs. The HTTP approach to find documents by URL, i.e. server/path, is fundamentally brittle. Doesn’t matter how careful you are, doesn’t matter how much best practice you follow, that URL is going to be dead in a few years. The problem is made worse by DNS, which in turn makes URLs expensive and expire.

    There are approaches like IPFS, which uses content-based addressing (i.e. fancy file hashes), but that’s note enough either, as it provide no good way to update a resource.

    The best™ solution would be some kind of global blockchain thing that keeps record of what people publish, giving each document a unique id, hash, and some way to update that resource in a non-destructive way (i.e. the version history is preserved). Hosting itself would still need to be done by other parties, but a global log file that lists out all the stuff humans have published would make it much easier and reliable to mirror it.

    The end result should be “Internet as globally distributed immutable data structure”.

    Bit frustrating that this whole problem isn’t getting the attention it deserves.


  • Ultimately this is a problem that’s never going away until we replace URLs. The HTTP approach to find documents by URL, i.e. server/path, is fundamentally brittle. Doesn’t matter how careful you are, doesn’t matter how much best practice you follow, that URL is going to be dead in a few years. The problem is made worse by DNS, which in turn makes URLs expensive and expire.

    There are approaches like IPFS, which uses content-based addressing (i.e. fancy file hashes), but that’s note enough either, as it provide no good way to update a resource.

    The best™ solution would be some kind of global blockchain thing that keeps record of what people publish, giving each document a unique id, hash, and some way to update that resource in a non-destructive way (i.e. the version history is preserved). Hosting itself would still need to be done by other parties, but a global log file that lists out all the stuff humans have published would make it much easier and reliable to mirror it.

    The end result should be “Internet as globally distributed immutable data structure”.

    Bit frustrating that this whole problem isn’t getting the attention it deserves.



  • Wouldn’t count on that. Those techniques will help indie developers a lot, but AAA gaming is a constant race of trying to deliver more and more. AAA games are always hopelessly over engineered and once you throw AI into the mix they just raise the bar that AAA games have to hit. Expect ChatGPT flavor-text on every empty beer can you can find in the world. Auto generated quest lines and a whole lot of more stuff.

    Indie developer in contrast can focus much more on actually delivering a game, with story, characters and game play. But AAA games are just ginormous piles of meaningless content and AI will help them get even bigger.