require all participating communities to store ALL of the data.
Wait, what? No, not at all. There is no reason for them to redundantly store all the data.
Imagine the same concept but the data is just being aggregated. The purpose is that content gets more exposure and engagement not to create an archive.
Is that so different than how the fediverse currently works? Subscribed content is already being federated across instances I’m just asking it to be organized together. When your instance federates with a community on another instance it doesn’t get the entire “5-year” backlog sent to it; only new posts and old content that someone interacts with is sent.
I think there are limits to the scalability of the fediverse, in general, I just don’t see how organizing the data differently is breaking anything. Only the most limited servers are going to be impacted from receiving content from three /c/butterflies instead of one. Most people are probably subscribed to the duplicate communities already; I certainly am.