wake.st is one of the many independent Mastodon servers you can use to participate in the fediverse.
the personal instance of Liaizon Wakest

Administered by:

Server stats:

1
active users

Discovered this morning that Maven heymaven.com (a social media startup who's CEO is ex OpenAI "Ken Stanley: leading the Open-Endedness Team at OpenAI") is mass importing public posts from the with no links back to the original and no way to delete them. It seems there is no Opt-out or Opt-in mechanism at all. It also has posts from pulled in via @bsky.brid.gy that are also not linked back to the original.

Here's an example: app.heymaven.com/profile/66927

1.12 million fediverse posts scraped by AI startup Maven founded by ex OpenAI lead...

confirmation by Maven CTO Jimmy Secretan app.heymaven.com/discover/1190

app.heymaven.comMavenMaven: Follow interests, not influencers

UPDATE: Looks like its a bit more complex (isn't it always)
So the CTO is here at @jsecretan and has clarified that they are in the process of implementing bidirectional , but in the meantime ingested the "federated timeline" of Mastodon.social
You can look at their AP response here: staging.maven.ly/mastodon/acto though it doesn't seem to be live on their main domain.

UPDATE 2: so it looks like @jsecretan is deleting the entire 1.12 million scraped posts off of Maven after this thread blew up. So cool I guess? But also sorta totally comes off as "whoopsies" we had no idea what would happen if you scrape millions of posts with no link back to the original. I hope to see an official post mortum on this incident from Maven

wakest ⁂

UPDATE 3: CTO Jimmy (@jsecretan) says "We have paused everything related to our Fediverse ingestion for now and we are removing everything ingested. To be honest, the extreme negative reaction was a surprise to me, as I thought interaction between disparate systems was the entire point, but clearly we didn't navigate the culture correctly." - app.heymaven.com/discover/1190

And @deadsuperhero wrote an article mostly from this thread for @wedistribute.org now live at wedistribute.org/2024/06/maven

app.heymaven.comMavenMaven: Follow interests, not influencers

@liaizon@social.wake.st @jsecretan@mastodon.social @deadsuperhero@social.wedistribute.org @wedistribute.org@bsky.brid.gy It's amazing that someone can be like "let's scrape 1 million posts from this service with no consent or community engagement" and then be like "oh no, turns out these people are actually big meanies who don't want us in their supposedly open network".

I feel like we need a little intro video for anyone thinking about scraping the FediVerse for their techbro project showing all the other geniuses that had that idea and how it worked out for them.

@spots1000 @onepict @liaizon @jsecretan @deadsuperhero @wedistribute.org What makes you think the one you caught was the only one that did it?

@dogzilla @spots1000 @onepict @liaizon @jsecretan @deadsuperhero @wedistribute.org I'm curious if anyone checked the legal side of this. Before reading the overwhelmingly negative reactions in this🧵, I'd have assumed: public domain, non-issue.

To protect copyright, you think before you publish, and not after someone does something with the publication. Scraping is perfectly legal in itself, no?

I'd say a video is not nearly as useful as a class action, but I would be surprised if that flies.

@iwein @dogzilla @spots1000 @onepict @liaizon @jsecretan The thing is…yeah, you can do lots of things with public content under that assumption. But, if you don’t talk to anybody about it, and then try to make a big gesture about being part of a community…don’t expect any of the community members to like you.

@deadsuperhero @iwein @spots1000 @onepict @liaizon @jsecretan Who exactly would you talk to on a platform that’s decentralized by design? Who did Google talk to before creating web crawlers?

These are not rhetorical questions.

@dogzilla@masto.deluma.biz @deadsuperhero@social.wedistribute.org @iwein@mas.to @onepict@chaos.social @liaizon@social.wake.st @jsecretan@mastodon.social Well I think the obvious approach, as many other have said, would be to make a proper ActivityPub integration and share that integration and your company with the community.

By scraping everything you can find (and not giving credit) you're denying instances the right to engage with your integration on their terms. Even Meta has the decency to publish their endpoints and accept that some people just don't want to share their instance with them.

The FediVerse being decentralized doesn't mean there are no rules, it means that you need to engage with each and every instance you want to engage with individually and on their terms. If you choose not to do that you end up in the same position as this guy, where everyone only cares about blocking you as fast as possible.

In terms of the legal side of things, many of this instances are hosted under EU law and the data posted on them needs to remain in compliance with EU data protection laws. If a company chooses to scrape that data without credit and then refused to remove it when requested they can certainly be on the EU's bad side in a hurry.

@iwein @spots1000 @onepict @liaizon @jsecretan @deadsuperhero @wedistribute.org I suspect the people opposing this haven’t thought very far. I agree with you: this is perfectly legal.

I thought putting things out there so that others can freely reuse them was the core principle of open source, the web, *and* the Fediverse. If you don’t agree, just don’t publish it here, and stick to walled gardens

@liaizon @deadsuperhero What even is this? It seems like a LLM startup trying to be a social networking platform. How is that supposed to work? Like is it a "social" network for LLMs to "communicate" with eachother?

@liaizon @jsecretan @deadsuperhero What an utter ass. "We have paused ... ingestion for now [for his pathetic AI]"

" I thought interaction between disparate systems was the entire point"

What part of "interaction" does this turkey not understand? Get a dictionary of synonyms. "Ripoff" is not one of them.

@liaizon @jsecretan @deadsuperhero @wedistribute.org

I'm pretty sure they feel very bad about the whole development*

* (getting caught)

@liaizon thanks for covering this topic 👍

@liaizon@social.wake.st @jsecretan@mastodon.social @deadsuperhero@social.wedistribute.org @wedistribute.org@bsky.brid.gy

I thought interaction between disparate systems was the entire point
Yeah, but that involves people talking to other people, not you turning what I said into a news article

@zink @liaizon @deadsuperhero @jsecretan it's really partially playing the blame game. Interaction doesn't imply Ingestion. IANAL, but here in Germany we have a copyright by default which would be violated by this.

@liaizon @jsecretan @deadsuperhero @wedistribute.org

Sorry, but... "To be honest, the extreme negative reaction was a surprise to me, as I thought interaction between disparate systems was the entire point, but clearly we didn't navigate the culture correctly."

YOU THINK?!?! 🤣

@liaizon @jsecretan @deadsuperhero @wedistribute.org Re: the DM thing yes the way Mastodon chose to implement DMs is garbage and worse than useless since it gives the appearance but not reality of some sort of privacy.