Appreciated Daniel reaching out to the team about this! Hosting blobs is one of those things that will inevitably go through iterations as we understand the abuse vectors more and more, but for now it's really fun to see this kind of usage in action. The PDS is meant to be a database host in the same sense that a webserver is a website host.
Doesn't the potential for abuse reduce when content is linked through user's own domain rather than a particular appview like bsky? Bsky already supports a user's domain ALIASed to redirect.bsky.com: https://bsky.app/profile/jacob.gold/post/3kh6rnpdzmp2v
I don’t have a well-considered answer, but a) I imagine being able to host a phishing site on an official domain from them using their SSL cert is problematic, and b) my gut says that as soon as you start hosting arbitrary files— e.g. zip files— and browser executable JavaScript with your domain in there, that’s a different level of possible content. I guess the question is whether or not the disposition of a social media network makes that more problematic than it does with, say, Google drive.
It’s not possible for me, a non Google employee to create a file that’s hosted on Google.com, or any Google domain and have it read in the browser as text/html, bypassing many a firewall, for example
Agreed. I assume this will open up Bluesky to a lot of potential legal problems. But will it be any different from accesing the content using the app as the content is anyway hosted.
That said, just the other day I was thinking, is the reverse possible. I have a web site/blog. Use RSS and then the RSS updates are posted to a handle on Bluesky. I would assume that's a lot more useful?
> That said, just the other day I was thinking, is the reverse possible. I have a web site/blog. Use RSS and then the RSS updates are posted to a handle on Bluesky. I would assume that's a lot more useful?
This is trivial, I'm currently doing this for https://bsky.app/profile/aemet-bot.bsky.social which reads a bunch of RSS feeds from AEMET (Spain's national weather service basically) and posts warnings to the feed if there is any warning above Yellow.
The code for managing this is about ~200 lines of Rust code.
Not right now, no. It's fairly simple (login to Bluesky, read RSS, save item IDs to a text file, post if there are any new feed items, close program then systemd timers run this every N minutes) + pretty specific to AEMET and their formats.
If there is interest I guess I could spend some hours to make it a bit more generic and publish the source.
The meat is this, using bsky-sdk + atrium-api from crates:
let agent = BskyAgent::builder().build().await?;
agent
.login("username.bsky.social", "X")
.await?;
let rt = RichText::new_with_detect_facets(msg).await?;
let record = RecordData {
text: rt.text,
created_at: atrium_api::types::string::Datetime::now(),
facets: rt.facets,
};
agent.create_record(record).await?;
Works well enough. It's a bit on the lower end of the "abstraction ladder", there might be more user-friendly libraries for doing this even easier now.
This is cool. Actually if bluesky can do this automatically where it can fetch RSS and show as handle updates it will be really useful. Will help a bunch of people who wouldn't need to maintain the services.
"access-control-allow-origin: *" is interesting - it means you can access content hosted in this way using fetch() from JavaScript on any web page on any other domain.
"content-security-policy: default-src 'none'; sandbox" is very restrictive (which is good) - content hosted here won't be able to load additional scripts or images, and the sandbox tag means it can't run JavaScript either: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Co...
Blocking/allowlisting all JavaScript is the only way [1] to have a CSP fully contain an app (no exfiltration) [2] and with prefetch that might not be enough. The author is correct at the end to suggest using WebAssembly. (Also, it still has the issue of clicking links, which can be limited to certain domains or even data: by wrapping the untrusted code in an iframe and using child-src on the parent of the iframe)
They haven't. That in the spec stops short of actually saying that it will stop all exfiltration. What it will do is make it harder because you'd have to put the data in a subdomain or in a username/password. It also could make it hard to deny that an attempt to exfiltrate was deliberate.
Right, but what would be the security impact of that compared to just plain HTML? I guess it allows for some form of view counting or IP exfiltration, but other than that anything you can do with an external request you could do with an embedded data URI.
As far as I understand CSP, since it’s set to `none`, no URIs are allowed, not even `data`. Inline scripts and stylesheets are not allowed either, since `unsafe-inline` (or nonces/hashes) is missing.
I'm very hopeful for the possibility of using bluesky for blob data.
A friend and I had considered looking into storing DOOM WADs on bluesky so that "map packs" could be shared in the same way posts are. Follow an account, a list, or a starter pack, and you could theoretically modify GZDoom or some other client to know how to search and view any WADs posted by those accounts. Like how the Steam Workshop works, except it's via bluesky. :D
This is a cool idea, are you thinking of self-hosting this or on their servers? Have they mentioned anything about any guidelines for what their blob storage can be used for? I know doom is small and that's a great idea, a workshop is a perfect example of how this can be used. I'm just wondering if this can be abused to outsource large server space.
I wasn't around for this specific era, but the way users of BlueSky are able to dive deep into technological waters reminds me of how people talk about learning HTML for the first time while using MySpace. Social media is a more saturated market now than before, but I wonder if we'll see a new generation of programmers sprout from BlueSky.
MySpace and old forums walked you up a ladder of abstraction from, I'm adding some text into a box and it shows up on the webpage -> I'm adding some images as well -> I'm adding some BBCode/Markup and now things look really custom -> I'm writing HTML and CSS -> I'm writing complete scripts.
Bluesky does the first step and then it's a great big leap from there imo.
I see a dozen links I suspect are required reading to try out new ideas.
With HTTP/HTML you can show somebody who knows only the most basic Python, or any other PL, how to build a server from scratch in those 14 minutes.
I'm convinced we need hash-addressable communication protocols to redefine the relationship we have with the tech-giants, and stop them exploiting our communities.
I'm not convinced the ATProtocol has hit the mark well enough to kick start a revolution like HTTP did.
One of the points that is made is that since the PDS that's being interacted with here is part of a 'Personal Data Server' rather than the Bluesky product, it ends up able to offer infinite free data storage.
This seems like one of the things that might be part of the references the bluesky team has made at time to introducing a subscription service - providing more space / bandwidth / higher quality video on your PDS seems like the type of hosting that could be offered at a premium tier.
> 'Personal Data Server' rather than the Bluesky product
If I understood correctly, the PDS was hosted on Bluesky. I assume it could be hosted somewhere else, so yeah it could be interacted with more than Bluesky.
There should really be a name for this phenomenon; put basically anything on the internet, and sooner or later people will try to host arbitrary files on it.
>and sooner or later people will try to host arbitrary files on it.
I'm pretty sure that's a key reason that google accounts are limited to 15gb now until you pay for more storage. When it was unlimited there were all these opensource projects coming up with ideas to backup your filesystem to gmail and such which got even worse when Drive came about. These free services need to foresee that that will be an issue and put in some basic limits.
Streisand may also be "the more you try to hide something, the more attention it receives", and then by Johnson's Law, the bigger its area of impact becomes.
The recent API changes in Strava reminded me of how limited our access is to the data stored on their platform. As a dominant player in the fitness space, they could gradually lock features behind a subscription wall.
While this might raise privacy or safety concerns, could the AT Protocol be a suitable platform for storing GPX or FIT files?
I’d love a federated Strava replacement. Unfortunately I don’t believe that the AT Protocol supports private or limited visibility posts yet, which I think is a pretty key feature for Strava’s use case.
Once atproto has first party support for private records I'm definitely expecting a massive increase in interest. It would open so many doors and is probably the main thing holding back many potential use cases as of now.
I have a lot of hope for AT. I'm sure there's lots of smart people on HN that have done great things with the Fediverse, but this whole paradigm just seems more sustainable + realistic. Basically it gives us centralization by default, but with real decentralized support when you need it / for power users.
As far as sustainability goes I'm hoping for a better business model than "accept funds from Blockchain Capital" [0], some return on investment in mirroring the firehouse. I can muse, a Discord alternative where some users pay to host longer videos (current limit is 60sec [1]) or Patreon where a relay takes a cut in exchange for managing access/decryption keys, or Bandcamp or some other kind of social marketplace - as it is theres no reason I couldn't do this, it is an open platform after all.
Im hoping that most of the infra costs get amortized by people bringing on their PDS’s, while some of the core stuff (app layer, iOS app etc.) is maintained by a small team funded by donations/subscriptions.
Yeah I’m also worried about profitability, tho not particularly concerned about that particular investor, personally; all VCs are inherently amoral profit generators. They are a “benefit corporation” like anthropic, which gives them some leeway to deny shareholder requests in the name of public good. Which is nice!
In general I feel like social media is in the perfect spot for a huge shakeup as display ads breathe their last breath. Even if Google wins/draws out its Display Ads antitrust case and successfully implements some new interest-tagging system, I think anyone with a calculator and a newspaper subscription can read the leaves at this point; people are concerned about their data, and the money it generates is peanuts compared to more traditional advertising schemes. All of this is of course not even mentioning what I think intuitive algorithms will do (cynical or no, there’s lots of credentialed scientists saying that AGI (!!) is within reach in the coming decade, if not the coming few year).
All that to say: I feel like they can find a way to make it work. Revenue doesn’t need to be as high anyway if you a) don’t have 1000 devs optimizing Display Ad A/B tests all day, and b) have the support of the open source community.
If they can get ~100k subs to a $10/mo premium service similar to discord nitro, they are probably close to breaking even at the current scale and ops methodology. Which seems feasible.
is there any hosting site that isn't? feels like a computing law at this point; if you build a hosting site, someone will try to use it for malicious purposes.
Lack of moderation combined with an offical-sounding domain name.
This would have to get the user to follow a link or call a phone number or something though. These are plausible. It's too bad the content-security-policy can't prevent following links.
Bluesky seems to use a lot of totally different domain names for each part of their infrastructure, maybe for this reason. e.g. this one is bsky.network
While they're nowhere close on volume, they're certainly beating microsoft in terms of the rate they're adding similar looking official URLs.
I guess bsky.net and bluesky.net were taken. What’s weird is why ICANN allowed .network TLD at all when .net already existed, was shorter, and meant for that.
I mean, the way AT Proto is designed, moderation primarily happens on the app layer, not the protocol layer. So on an app like Bluesky, you can have a lot of moderation. But the protocol itself allows hosting arbitrary content in a distributed/decentralized way.
As long as content is authored by the administrator of the server, I don't see where there is a security issue.
It's like if you point to your own Apache server in your own domain where you host a scam page and say there's a security issue with Apache because you could do that.
Or are you saying that you can make this person's server serve third-party content?
The CSP headers didn't used to be there, which I used to pop an alert(), way back. (at the time there was also a MIME whitelist, but that whitelist included image/svg+xml, which allows script execution)
Ah this is super cool! I’ve been thinking about doing this with my website, but was going to leverage the whtwind lexicon, since my site is mostly a blog. But for the front page, and anything else, I may have wanted something else.
This is more of an unstructured approach, which is cool because it needs less specialized tooling. It has the disadvantage of being… well, just a blob. No semantic information there.
I think the AT protocol is versatile in that users can acces each others data once authenticated without any centralized service (granted the aggregators and some other things may still be centralized).
Is there any auth necessary to pull data from a PDS? I know the main relay is a public firehouse so I would be surprised, but maybe the PDS can put relay servers on an allowlist?
So the recent push to artists to move there to protect their rights against AI training is not only false but a trap since anyone can point their cannons to train data on Bluesky.
I'm wondering whether a third-party PDS implementation should support other protocols as well. Would a combined git/PDS repo make any sense at all? (That is, it's a PDS, but it also implements enough of git to do read-only access via git commands.)
Based on https://bsky.social/about/support/tos#user-content , I would answer yes. While it's not expressly called out (permitted or forbidden), my reading of the above would indicate that it's not forbidden per se, and probably permitted ("Modify or otherwise utilize User Content in any media. This includes reproducing, preparing derivative works, distributing, performing, and displaying your User Content."). I believe training an LLM falls under "utilize" and "preparing derivative works".
What I remember about that whole affair is that I'd really respected Jack for starting Bluesky, allowing it to be independent of Twitter (and Jay deserves a heaping of credit for pushing that!), and then losing that respect when he seemed to totally misunderstand what Bluesky had gone on to achieve.
Jack was pushing Nostr at the time which... seems ok if you're into that. But his arguments in his interview with Mike Solana really didn't make sense to me.
Bluesky’s attitude seems logical and their reasoning aligns with my thoughts exactly.
If techdirt’s article is to be believed, Dorsey’s departure has to do with going from an extreme to an extreme—from a traditional social monolith to a pure protocol—whereas Bluesky chose to pursue not only the protocol, but also “the app” as the face of that protocol for the ordinary user, and let’s face it: the ordinary user does not really care about protocols.
My speculation about him suggesting people “stay on Twitter” is that Nostr (which he apparently is invested in now) and Twitter are orthogonal, so there is no conflict there, but Bluesky competes with both.
Not a Bluesky user (the invite-only period has put me off for a while), but if they do not compromise on the protocol part (and there are no shenanigans unfolding, who knows, maybe Dorsey found something) their attitude seems to me to be the most reasonable for a mainstream social platform.
> Bluesky is lefty twitter now and I want no business with that platform.
I love hearing people say this, because in reality Bluesky covers most of the political dimensions one wants to subdivide a population by except the most toxic of participants. Also, most of the academics have moved to Bluesky because Twitter became toxic / suppressed speech dramatically and at the whims of one Mr. Musk. As per usual, where the "lefties" are the "righties" follow (to use the parlance of the prior comment) be is social media, good policy, you name it.
Plenty of conservatives are there, such as Lincoln project folks, right libertarians, and even National Review & Reason IIRC. But I guess these folks don't count these days as conservative (despite definitionally being so, just not aligned with modern US Republican policy planks)? Not sure.
Anyhow, I'm enjoying Bluesky for what it is -- a new social media platform that isn't fully encumbered by bots and nonsense for a bit.
Meanwhile Twitter is now openly suppressing links off-site. For financial reasons rather than ideological ones (although the latter may also be occurring).
I build my own with Jinja2 templates my custom python script + mistune library to parse markdown to html, and a YAML file in similar format to Hugo (the previous generator i used to use)
I found building my own custom one with python3, much more freeing in all sorts of interesting ways, I also exposed the static site generator with a FastAPI based API to auto build my website from my notes, my cooking recipes, database records, financials, git commits, etc to build me a private protected website (via nginx auth) from anywhere, whether via sending a text message to my telegram bot, or running a Shortcuts command on my iPad, or just directly running a command from my terminal.
It took barely a day to setup, and allows me to run interesting custom extensions in all sorts of interesting ways, and builds me a personal website curated to my interest, where the primary viewer is supposed to be me. and it exposes a public barebones website with barely any content for everyone else.
One of these days I think i’ll expose more of it to the world.
I maintain a blog on Hugo but also host a couple of Astro ones. I think Hugo is great but to my eyes at least Astro has more active development behind it, and I also enjoy it more (probably because I know Typescript more than golang)
Have you found a decent bare bones starter theme? I've been using MkDocs Material, and I find the theme too complicated (HTML etc) - hoping to find a super simple one that looks decent - plain - and is a good base for theming / styling. Thanks & take care.
Why was it decided not to build on any existing content-addressable networking system (IPFS or whatever)?
November 1, 2024 at 12:39 PM
Leo R. Comerford @leocomerford.bsky.social
·
23d
(Not implying that this was the wrong decision, it’s a genuine question.)
dan @danabra.mov
·
23d
actually not sure i can answer this well. paging @bnewbold.net or maybe @why.bsky.team (who worked on IPFS btw)
dan @danabra.mov
·
23d
my guess is that we’d want data hosting to be under direct control of the user (same as web hosting) rather than peer-to-peer, want instant deletion/edits at the source, need ability to move to a different host or take content down, need grouping into collections. not sure how much IPFS could adapt
dan @danabra.mov
·
23d
we do use some pieces from IPFS through (aside from the actual peer to peer mechanism)
bryan newbold @bnewbold.net
·
4mo
you can basically ignore it, we don't use "IPFS" proper anywhere.
there are strong social connections, and we borrow some tech components like CIDs (flexible hash/digest syntax) and DAG-CBOR (more-deterministic subset of CBOR, good for signing+hashing)
Bumblefudge @bumblefudge.com
·
1d
yeah this is all accurate. bluesky remixed a lot of IPFS components and patterns in interesting ways, but the monolithic global IPFS network (with chatty DHT distribution) wouldn't make sense here, BS made an infinitely more efficient/performant distribution of bytes tailored to its use case.
Bumblefudge @bumblefudge.com
·
1d
FWIW the IPFS foundation is working on making IPFS more modular and easily remixed for future BlueSkies, but it's a big task decomposing the monolith and reorienting the documentation and ergonomics...
[a second reply to the first skeet:]
Uai @why.bsky.team
·
23d
As far as im concerned (and i led ipfs development for a number of years) we are using ipfs, just a specific streamlined implementation of it.
All your repo data can be imported into an ipfs node and addressed via cid
Uai @why.bsky.team
·
23d
We dont use libp2p because for a consumer mobile app we didnt want to futz with nat traversal and connectivity and the like, but its definitely possible to build a p2p version of bluesky
"skeet" is such a terrible term for this. It's like mastodon "toot"s.
Using bodily functions as core infra terminology is off-putting and feels like a bit like a juvenile boy's club. I get that some people find it funny, but it alienates people. We should just call these "posts".
Sure, whatever: I had certainly given it approximately no thought in this case, and my personal investment in 'sk**t' is zero. I'd edit my post but I seem to have hit the timeout. I will also say that I don't think this is the most interesting or on-topic thread to pull on from my comment.
Hard agree -- this one is especially bad because it's gendered. We'll see what happens, but I'd put my money on "post" winning out. There's some people on Bluesky who feel absurdly strong about this because of the history (the CEO asked them not to use it so they used it more often as a joke), but they're simply outnumbered already. Such is exponential growth...
Appreciated Daniel reaching out to the team about this! Hosting blobs is one of those things that will inevitably go through iterations as we understand the abuse vectors more and more, but for now it's really fun to see this kind of usage in action. The PDS is meant to be a database host in the same sense that a webserver is a website host.
Are you ever going to bring back Beaker Browser? Used to love playing around with that! Didn't realize you'd gone on to Bluesky, very neat.
Thanks! Probably won’t revive it, but it was a great experience. Wrote some notes on it here: https://github.com/beakerbrowser/beaker/blob/master/archive-...
Doesn't the potential for abuse reduce when content is linked through user's own domain rather than a particular appview like bsky? Bsky already supports a user's domain ALIASed to redirect.bsky.com: https://bsky.app/profile/jacob.gold/post/3kh6rnpdzmp2v
If people use BlueSky as a magnet link for illegal content it will quickly become a problem.
Congrats on finding a role at Bluesky. Beaker was such an amazing project to follow, that experience must be so useful.
You're walking headfirst into the copyright, CSAM, pornography hole of content moderation here.
How is this any different than the regular hole of content moderation they're already in?
I don’t have a well-considered answer, but a) I imagine being able to host a phishing site on an official domain from them using their SSL cert is problematic, and b) my gut says that as soon as you start hosting arbitrary files— e.g. zip files— and browser executable JavaScript with your domain in there, that’s a different level of possible content. I guess the question is whether or not the disposition of a social media network makes that more problematic than it does with, say, Google drive.
It’s not possible for me, a non Google employee to create a file that’s hosted on Google.com, or any Google domain and have it read in the browser as text/html, bypassing many a firewall, for example
Yes it is. Via sites.google.com or Google Docs.
These are abused all the time for phishing and malicious threat actors.
Agreed. I assume this will open up Bluesky to a lot of potential legal problems. But will it be any different from accesing the content using the app as the content is anyway hosted.
That said, just the other day I was thinking, is the reverse possible. I have a web site/blog. Use RSS and then the RSS updates are posted to a handle on Bluesky. I would assume that's a lot more useful?
> That said, just the other day I was thinking, is the reverse possible. I have a web site/blog. Use RSS and then the RSS updates are posted to a handle on Bluesky. I would assume that's a lot more useful?
This is trivial, I'm currently doing this for https://bsky.app/profile/aemet-bot.bsky.social which reads a bunch of RSS feeds from AEMET (Spain's national weather service basically) and posts warnings to the feed if there is any warning above Yellow.
The code for managing this is about ~200 lines of Rust code.
Do you have that code posted somewhere by chance? I would be interested in browsing through it!
Not right now, no. It's fairly simple (login to Bluesky, read RSS, save item IDs to a text file, post if there are any new feed items, close program then systemd timers run this every N minutes) + pretty specific to AEMET and their formats.
If there is interest I guess I could spend some hours to make it a bit more generic and publish the source.
Cool yeah. I think the thing I was most interested in was interacting with Bluesky through rust. Is there a decent sdk for that?
The meat is this, using bsky-sdk + atrium-api from crates:
Works well enough. It's a bit on the lower end of the "abstraction ladder", there might be more user-friendly libraries for doing this even easier now.This is cool. Actually if bluesky can do this automatically where it can fetch RSS and show as handle updates it will be really useful. Will help a bunch of people who wouldn't need to maintain the services.
Or maybe there is a potential for a SaaS service?
As opposed to running a social network? What else is new
I was curious as to the security context this runs in:
Here are the headers I got back: Presumably that ratelimit is against your IP?"access-control-allow-origin: *" is interesting - it means you can access content hosted in this way using fetch() from JavaScript on any web page on any other domain.
"content-security-policy: default-src 'none'; sandbox" is very restrictive (which is good) - content hosted here won't be able to load additional scripts or images, and the sandbox tag means it can't run JavaScript either: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Co...
Blocking/allowlisting all JavaScript is the only way [1] to have a CSP fully contain an app (no exfiltration) [2] and with prefetch that might not be enough. The author is correct at the end to suggest using WebAssembly. (Also, it still has the issue of clicking links, which can be limited to certain domains or even data: by wrapping the untrusted code in an iframe and using child-src on the parent of the iframe)
1: https://github.com/w3c/webappsec/issues/656#issuecomment-246...
2: https://www.w3.org/TR/CSP3/#exfiltration
I didn't realize you could use CSP for preventing exhilaration now! How did they close the WebRTC loopholes?
They haven't. That in the spec stops short of actually saying that it will stop all exfiltration. What it will do is make it harder because you'd have to put the data in a subdomain or in a username/password. It also could make it hard to deny that an attempt to exfiltrate was deliberate.
Why would WebAssembly provide more protection against exfiltration than JavaScript in this case?
By default WebAssembly doesn't have access to the DOM or JavaScript globals. You have full control of how it can access these things.
is the default-src necessary if you're using sandbox or is it redundant?
`sandbox` doesn’t affect making requests via HTML (images, stylesheets, etc.).
Right, but what would be the security impact of that compared to just plain HTML? I guess it allows for some form of view counting or IP exfiltration, but other than that anything you can do with an external request you could do with an embedded data URI.
As far as I understand CSP, since it’s set to `none`, no URIs are allowed, not even `data`. Inline scripts and stylesheets are not allowed either, since `unsafe-inline` (or nonces/hashes) is missing.
I'm very hopeful for the possibility of using bluesky for blob data.
A friend and I had considered looking into storing DOOM WADs on bluesky so that "map packs" could be shared in the same way posts are. Follow an account, a list, or a starter pack, and you could theoretically modify GZDoom or some other client to know how to search and view any WADs posted by those accounts. Like how the Steam Workshop works, except it's via bluesky. :D
This is a cool idea, are you thinking of self-hosting this or on their servers? Have they mentioned anything about any guidelines for what their blob storage can be used for? I know doom is small and that's a great idea, a workshop is a perfect example of how this can be used. I'm just wondering if this can be abused to outsource large server space.
Look into Lexicon on https://atproto.com
You can define custom records for basically anything
A PDS would be a point in the network that decides if abuse is happening, also a place where competition can occur
So, basically using Bluesky as an RSS feed for arbitrary data? Kind of?
RSS is already an RSS feed for arbitrary data. :D
I joke but RSS does work for his use case just as well minus the distributed/federated points.
I wasn't around for this specific era, but the way users of BlueSky are able to dive deep into technological waters reminds me of how people talk about learning HTML for the first time while using MySpace. Social media is a more saturated market now than before, but I wonder if we'll see a new generation of programmers sprout from BlueSky.
MySpace and old forums walked you up a ladder of abstraction from, I'm adding some text into a box and it shows up on the webpage -> I'm adding some images as well -> I'm adding some BBCode/Markup and now things look really custom -> I'm writing HTML and CSS -> I'm writing complete scripts.
Bluesky does the first step and then it's a great big leap from there imo.
> 14-minute read
I see a dozen links I suspect are required reading to try out new ideas.
With HTTP/HTML you can show somebody who knows only the most basic Python, or any other PL, how to build a server from scratch in those 14 minutes.
I'm convinced we need hash-addressable communication protocols to redefine the relationship we have with the tech-giants, and stop them exploiting our communities.
I'm not convinced the ATProtocol has hit the mark well enough to kick start a revolution like HTTP did.
One of the points that is made is that since the PDS that's being interacted with here is part of a 'Personal Data Server' rather than the Bluesky product, it ends up able to offer infinite free data storage.
This seems like one of the things that might be part of the references the bluesky team has made at time to introducing a subscription service - providing more space / bandwidth / higher quality video on your PDS seems like the type of hosting that could be offered at a premium tier.
> 'Personal Data Server' rather than the Bluesky product
If I understood correctly, the PDS was hosted on Bluesky. I assume it could be hosted somewhere else, so yeah it could be interacted with more than Bluesky.
There should really be a name for this phenomenon; put basically anything on the internet, and sooner or later people will try to host arbitrary files on it.
>and sooner or later people will try to host arbitrary files on it.
I'm pretty sure that's a key reason that google accounts are limited to 15gb now until you pay for more storage. When it was unlimited there were all these opensource projects coming up with ideas to backup your filesystem to gmail and such which got even worse when Drive came about. These free services need to foresee that that will be an issue and put in some basic limits.
There's already "parasitic computation" so we could probably go for "parasitic data storage"
Johnson's Law: The more attention something receives, the bigger it's area of impact becomes.
I thought that was the Streisand effect?
The Streisand is "the more you try to hide something, the bigger its area of impact becomes".
Streisand may also be "the more you try to hide something, the more attention it receives", and then by Johnson's Law, the bigger its area of impact becomes.
Is this comment intended to have a bawdy subtext or am I just reading to much into it?
I think you're reading into it just the right amount.
Inner Platform effect
https://en.wikipedia.org/wiki/Inner-platform_effect
If this sort of thing interests you, check out atfile: https://github.com/electricduck/atfile
The recent API changes in Strava reminded me of how limited our access is to the data stored on their platform. As a dominant player in the fitness space, they could gradually lock features behind a subscription wall.
While this might raise privacy or safety concerns, could the AT Protocol be a suitable platform for storing GPX or FIT files?
I’d love a federated Strava replacement. Unfortunately I don’t believe that the AT Protocol supports private or limited visibility posts yet, which I think is a pretty key feature for Strava’s use case.
Once atproto has first party support for private records I'm definitely expecting a massive increase in interest. It would open so many doors and is probably the main thing holding back many potential use cases as of now.
Pretty awesome! Convenience link to the fascinating github issue linked at the bottom, featuring Bluesky celebrity pfrazee: https://github.com/bluesky-social/atproto/issues/523
I have a lot of hope for AT. I'm sure there's lots of smart people on HN that have done great things with the Fediverse, but this whole paradigm just seems more sustainable + realistic. Basically it gives us centralization by default, but with real decentralized support when you need it / for power users.
As far as sustainability goes I'm hoping for a better business model than "accept funds from Blockchain Capital" [0], some return on investment in mirroring the firehouse. I can muse, a Discord alternative where some users pay to host longer videos (current limit is 60sec [1]) or Patreon where a relay takes a cut in exchange for managing access/decryption keys, or Bandcamp or some other kind of social marketplace - as it is theres no reason I couldn't do this, it is an open platform after all.
[0] https://www.blockchaincapital.com/blog/bluesky-13m-users-and...
[1] https://bsky.social/about/blog/09-11-2024-video
Im hoping that most of the infra costs get amortized by people bringing on their PDS’s, while some of the core stuff (app layer, iOS app etc.) is maintained by a small team funded by donations/subscriptions.
Yeah I’m also worried about profitability, tho not particularly concerned about that particular investor, personally; all VCs are inherently amoral profit generators. They are a “benefit corporation” like anthropic, which gives them some leeway to deny shareholder requests in the name of public good. Which is nice!
In general I feel like social media is in the perfect spot for a huge shakeup as display ads breathe their last breath. Even if Google wins/draws out its Display Ads antitrust case and successfully implements some new interest-tagging system, I think anyone with a calculator and a newspaper subscription can read the leaves at this point; people are concerned about their data, and the money it generates is peanuts compared to more traditional advertising schemes. All of this is of course not even mentioning what I think intuitive algorithms will do (cynical or no, there’s lots of credentialed scientists saying that AGI (!!) is within reach in the coming decade, if not the coming few year).
All that to say: I feel like they can find a way to make it work. Revenue doesn’t need to be as high anyway if you a) don’t have 1000 devs optimizing Display Ad A/B tests all day, and b) have the support of the open source community.
If they can get ~100k subs to a $10/mo premium service similar to discord nitro, they are probably close to breaking even at the current scale and ops methodology. Which seems feasible.
Anyone else feels like this will be abused for phishing and/or malware distribution?
is there any hosting site that isn't? feels like a computing law at this point; if you build a hosting site, someone will try to use it for malicious purposes.
Can’t you just make the hosting site features only be for real purposes?
Like a link shortener which only forwards to a domain that matches the subdomain? Or only for watching videos and collecting metrics etc.
Any file upload can be used for unintended purposes, eg encoding files into static to upload to youtube and all other sorts of tomfoolery: https://github.com/boehs/awesome-cloud-storage-abuse
It will be. We had the same issue with Matrix attachments.
got fixed by https://github.com/matrix-org/matrix-spec-proposals/blob/mai... fwiw
I noticed^^
Tbh, I still haven't figured out how my IRC client is supposed to fetch avatars of bridged matrix users now.
Previously I was able to special case bridged matrix users and access their avatars through
I believe the bridges should host a proxy (per-bridge) to expose content: https://github.com/matrix-org/matrix-appservice-irc/pull/180...
But does that proxy actually expose avatars/profile pictures? From what I can tell they only proxy attachments.
avatars pictures /are/ just attachments tho?
The bridge only transforms images attached to events to new media proxy links.
If a bridged matrix user joins a channel, as IRC client I see the following information:
With the mxid I can call /_matrix/client/r0/profile/{name}/avatar_url and get the mxc url. In the past that was enough, I could just call /_matrix/media/r0/download/.With authenticated media, I would need to get a URL with a signed JWT from the bridge's media proxy such as
But what endpoint would I call to get that? From what I can tell there's no way to get the bridge to give me a users' avatar.I'd expect to have an special endpoint such as /snoonet/avatar/{mxid} that'd redirect me to the /snoonet/media/v1/media/download URL.
It'll take about 5 mins for that to happen and then for *.bsky.network to start getting blocked by Google Safe Browsing, Palo Alto, Bluecoat etc.
I don't see how. This is a direct link to the author's bluesky server (PDS) so of course it is controlled by them.
The link in question (linked from the the sumbitted link) is `porcini.us-east.host.bsky.network`. That's hosted by bsky, isn't it?
Lack of moderation combined with an offical-sounding domain name.
This would have to get the user to follow a link or call a phone number or something though. These are plausible. It's too bad the content-security-policy can't prevent following links.
Bluesky seems to use a lot of totally different domain names for each part of their infrastructure, maybe for this reason. e.g. this one is bsky.network
While they're nowhere close on volume, they're certainly beating microsoft in terms of the rate they're adding similar looking official URLs.
> bsky.network
Shortening your brand to 4 letters when your chosen TLD is the same length as your full brand name is such a weird choice.
I guess bsky.net and bluesky.net were taken. What’s weird is why ICANN allowed .network TLD at all when .net already existed, was shorter, and meant for that.
I can't be the only person who visited bluesky.com, assuming that was the thing everyone was talking about.
This is why you and I aren't in charge of marketing I reckon.
I mean, the way AT Proto is designed, moderation primarily happens on the app layer, not the protocol layer. So on an app like Bluesky, you can have a lot of moderation. But the protocol itself allows hosting arbitrary content in a distributed/decentralized way.
Phish could be this:
$inane_marketing_trope
...
Click here to Unsubscribe from Bluesky
https://porcini.us-east.host.bsky.network/xrpc/com.atproto.s...
...
Redirects to bad site.
As long as content is authored by the administrator of the server, I don't see where there is a security issue.
It's like if you point to your own Apache server in your own domain where you host a scam page and say there's a security issue with Apache because you could do that.
Or are you saying that you can make this person's server serve third-party content?
> Or are you saying that you can make this person's server serve third-party content?
Http: yes see OP
Email: not sure. Hopefully not. But spoofing happens.
hehehe. I pinned it to the top research ideas. I'll get back to you on this
Could some awesome person possibly summarise any limitations or use cases where this might not work well?
The example provided is quite basic static text, so I'm wondering if there's a reason for that?
The CSP headers didn't used to be there, which I used to pop an alert(), way back. (at the time there was also a MIME whitelist, but that whitelist included image/svg+xml, which allows script execution)
Ah this is super cool! I’ve been thinking about doing this with my website, but was going to leverage the whtwind lexicon, since my site is mostly a blog. But for the front page, and anything else, I may have wanted something else.
This is more of an unstructured approach, which is cool because it needs less specialized tooling. It has the disadvantage of being… well, just a blob. No semantic information there.
I think the AT protocol is versatile in that users can acces each others data once authenticated without any centralized service (granted the aggregators and some other things may still be centralized).
Is there any auth necessary to pull data from a PDS? I know the main relay is a public firehouse so I would be surprised, but maybe the PDS can put relay servers on an allowlist?
As far as I can tell, all content on ATProto is fully public without auth
Does it federate or anything? Wonder what up to date summaries exist of it’s capabilities
If by federate you mean "is stored on content addressed, signed merkle trees that can be mirrored and served from more than one domain" then yes
Also it's uh, atproto.com
So the recent push to artists to move there to protect their rights against AI training is not only false but a trap since anyone can point their cannons to train data on Bluesky.
https://atproto.com/guides/glossary
How exactly is the personal data server used? Examples and such?
The link gives a nice high level explanation but I still am not sure of its purpose.
My first reaction was like -- wow, a site that runs on a reverb pedal.
this website has the toan
I'm wondering whether a third-party PDS implementation should support other protocols as well. Would a combined git/PDS repo make any sense at all? (That is, it's a PDS, but it also implements enough of git to do read-only access via git commands.)
What other protocols would make sense?
https://github.com/anacrolix/btlink
I guess pkdns is a newer, actively maintained version of the same thing? https://github.com/pubky/pkdns
“Hosting websites” has been possible on nostr for some time already with npub.pro …
What's the license for the Bluesky data btw? Is it something free to mirror and train LLMs on?
So the ToS explicitly says Bluesky does NOT own your data.
However, data on AT Proto is fully public and it’d be trivial for someone to extract the data for AI to train.
For example, this app shows you entries hosted on the protocol: https://atproto-browser.vercel.app/at/nytimes.com
Based on https://bsky.social/about/support/tos#user-content , I would answer yes. While it's not expressly called out (permitted or forbidden), my reading of the above would indicate that it's not forbidden per se, and probably permitted ("Modify or otherwise utilize User Content in any media. This includes reproducing, preparing derivative works, distributing, performing, and displaying your User Content."). I believe training an LLM falls under "utilize" and "preparing derivative works".
That's about your user content, not others'.
Whenever I hear about Bluesky I think about Jack Dorsey quitting their board and asked people to stay on Twittet/X.
https://amp.theguardian.com/technology/article/2024/may/07/j...
What do you think about it?
What I remember about that whole affair is that I'd really respected Jack for starting Bluesky, allowing it to be independent of Twitter (and Jay deserves a heaping of credit for pushing that!), and then losing that respect when he seemed to totally misunderstand what Bluesky had gone on to achieve.
https://www.techdirt.com/2024/05/13/bluesky-is-building-the-...
Jack was pushing Nostr at the time which... seems ok if you're into that. But his arguments in his interview with Mike Solana really didn't make sense to me.
Bluesky’s attitude seems logical and their reasoning aligns with my thoughts exactly.
If techdirt’s article is to be believed, Dorsey’s departure has to do with going from an extreme to an extreme—from a traditional social monolith to a pure protocol—whereas Bluesky chose to pursue not only the protocol, but also “the app” as the face of that protocol for the ordinary user, and let’s face it: the ordinary user does not really care about protocols.
My speculation about him suggesting people “stay on Twitter” is that Nostr (which he apparently is invested in now) and Twitter are orthogonal, so there is no conflict there, but Bluesky competes with both.
Not a Bluesky user (the invite-only period has put me off for a while), but if they do not compromise on the protocol part (and there are no shenanigans unfolding, who knows, maybe Dorsey found something) their attitude seems to me to be the most reasonable for a mainstream social platform.
What's your issue with invite-only periods? Is there a better way to throttle signups while you scale a system early on?
But if you use web scale tech you can scale to infinity on day one right? :eye-roll:
The invite-only system established the main Bluesky instance as a big circlejerk.
It worked with Orkut back in the day where the internet was new and untainted by culture wars.
Bluesky is lefty twitter now and I want no business with that platform.
> Bluesky is lefty twitter now and I want no business with that platform.
I love hearing people say this, because in reality Bluesky covers most of the political dimensions one wants to subdivide a population by except the most toxic of participants. Also, most of the academics have moved to Bluesky because Twitter became toxic / suppressed speech dramatically and at the whims of one Mr. Musk. As per usual, where the "lefties" are the "righties" follow (to use the parlance of the prior comment) be is social media, good policy, you name it.
Plenty of conservatives are there, such as Lincoln project folks, right libertarians, and even National Review & Reason IIRC. But I guess these folks don't count these days as conservative (despite definitionally being so, just not aligned with modern US Republican policy planks)? Not sure.
Anyhow, I'm enjoying Bluesky for what it is -- a new social media platform that isn't fully encumbered by bots and nonsense for a bit.
> Twitter became toxic / suppressed speech dramatically
But what kind of speech is supressed nowadays on X? what about Bluesky? does Bluesky not supress any speech?
Sure. CSAM.
Meanwhile Twitter is now openly suppressing links off-site. For financial reasons rather than ideological ones (although the latter may also be occurring).
I mean, honestly, losing Dorsey was probably a big part of its success.
Right now it's the only page under site:bsky.network if you search for that. Hilarious and awesome! https://www.google.com/search?q=site%3Absky.network Daniel is a great hacker.
Just a (very unserious) reminder that you can host +7kb of data in a single tweet using data URIs + gzip.
Here's Pong (HTML + JS) and the Epic of Gilgamesh: https://x.com/rafalpast/status/1316836397903474688
(brought to you by the ad tracking pixel parameters ignoring the tweet length limit)
More links + the "Twitter CDN" editor™: https://sonnet.io/projects#:~:text=Laconic!%20(a%20Twitter%2...
unrelated probably, but it made me realize how I don't really see Hugo/Jekyll type websites anymore.
How do you even know? Don't those both just generate static html?
Footer. also Jekyll/Hugo sites use generator so you can mostly find it in the meta generator tag.
Next.js sites are also a super easy find like this.
You can trivially remove it e.g. `disableHugoGeneratorInject = true` in `config.toml`.
It says "Powered by Hugo" at the bottom of the page.
Depending on the theme.
I build my own themes and don’t include that either
Same here
I build my own with Jinja2 templates my custom python script + mistune library to parse markdown to html, and a YAML file in similar format to Hugo (the previous generator i used to use)
I found building my own custom one with python3, much more freeing in all sorts of interesting ways, I also exposed the static site generator with a FastAPI based API to auto build my website from my notes, my cooking recipes, database records, financials, git commits, etc to build me a private protected website (via nginx auth) from anywhere, whether via sending a text message to my telegram bot, or running a Shortcuts command on my iPad, or just directly running a command from my terminal.
It took barely a day to setup, and allows me to run interesting custom extensions in all sorts of interesting ways, and builds me a personal website curated to my interest, where the primary viewer is supposed to be me. and it exposes a public barebones website with barely any content for everyone else.
One of these days I think i’ll expose more of it to the world.
I see plenty of blogs generated from Markdown with tools like that.
Has something overtaken Hugo and Jekyll in that space?
If you like JS/TS, then Astro.
I maintain a blog on Hugo but also host a couple of Astro ones. I think Hugo is great but to my eyes at least Astro has more active development behind it, and I also enjoy it more (probably because I know Typescript more than golang)
We use Github to Jekyll to host a few websites. Works awesome.
I just use mkdocs for everything.
Have you found a decent bare bones starter theme? I've been using MkDocs Material, and I find the theme too complicated (HTML etc) - hoping to find a super simple one that looks decent - plain - and is a good base for theming / styling. Thanks & take care.
I use the readthedocs theme: https://www.mkdocs.org/user-guide/choosing-your-theme/#readt...
Not sure if that fits the bill for you, but I like it.
https://bsky.app/profile/leocomerford.bsky.social/post/3l7v6... To help the hard of clicking, this time I have pasted it all for you:
Leo R. Comerford @leocomerford.bsky.social
Why was it decided not to build on any existing content-addressable networking system (IPFS or whatever)?
November 1, 2024 at 12:39 PM
Leo R. Comerford @leocomerford.bsky.social · 23d
(Not implying that this was the wrong decision, it’s a genuine question.)
dan @danabra.mov · 23d
actually not sure i can answer this well. paging @bnewbold.net or maybe @why.bsky.team (who worked on IPFS btw)
dan @danabra.mov · 23d
my guess is that we’d want data hosting to be under direct control of the user (same as web hosting) rather than peer-to-peer, want instant deletion/edits at the source, need ability to move to a different host or take content down, need grouping into collections. not sure how much IPFS could adapt
dan @danabra.mov · 23d
we do use some pieces from IPFS through (aside from the actual peer to peer mechanism) bryan newbold @bnewbold.net · 4mo
you can basically ignore it, we don't use "IPFS" proper anywhere.
there are strong social connections, and we borrow some tech components like CIDs (flexible hash/digest syntax) and DAG-CBOR (more-deterministic subset of CBOR, good for signing+hashing)
Bumblefudge @bumblefudge.com · 1d
yeah this is all accurate. bluesky remixed a lot of IPFS components and patterns in interesting ways, but the monolithic global IPFS network (with chatty DHT distribution) wouldn't make sense here, BS made an infinitely more efficient/performant distribution of bytes tailored to its use case.
Bumblefudge @bumblefudge.com · 1d
FWIW the IPFS foundation is working on making IPFS more modular and easily remixed for future BlueSkies, but it's a big task decomposing the monolith and reorienting the documentation and ergonomics...
[a second reply to the first skeet:]
Uai @why.bsky.team · 23d
As far as im concerned (and i led ipfs development for a number of years) we are using ipfs, just a specific streamlined implementation of it. All your repo data can be imported into an ipfs node and addressed via cid
Uai @why.bsky.team · 23d
We dont use libp2p because for a consumer mobile app we didnt want to futz with nat traversal and connectivity and the like, but its definitely possible to build a p2p version of bluesky
"skeet" is such a terrible term for this. It's like mastodon "toot"s.
Using bodily functions as core infra terminology is off-putting and feels like a bit like a juvenile boy's club. I get that some people find it funny, but it alienates people. We should just call these "posts".
Same thing with names like CockroachDB and GIMP.
The official Bluesky FAQ says this:
>What is a post on Bluesky called?
>The official term is “post.”
https://bsky.social/about/blog/5-19-2023-user-faq
Eevn better: call them tweets. That's what they are.
Sure, whatever: I had certainly given it approximately no thought in this case, and my personal investment in 'sk**t' is zero. I'd edit my post but I seem to have hit the timeout. I will also say that I don't think this is the most interesting or on-topic thread to pull on from my comment.
Hard agree -- this one is especially bad because it's gendered. We'll see what happens, but I'd put my money on "post" winning out. There's some people on Bluesky who feel absurdly strong about this because of the history (the CEO asked them not to use it so they used it more often as a joke), but they're simply outnumbered already. Such is exponential growth...
Huh, I thought it was a reference to shooting: fling your hot take into the sky in front of an audience ready to blow it to smithereens.
Someone implementing a file hosting service on top of Bluesky would explain a steep growth in user accounts. ;)
This video has nothing to do with Bluesky, but I think it provides a more likely explanation. Don't let the title fool you, it covers more than bots.
https://www.youtube.com/watch?v=GZ5XN_mJE8Y