All the proceedings go to ch… AI training.

Reddit went public — becoming publicly traded, that is — and this means making more money to raise their market valuation.

Under the new management, all their content will be sold to third parties, such as in the deal with Google which will allow them to use Reddit’s content for training AIs. This comes as no surprise, since it’s highly likely that they were already doing that, but paying nothing to Reddit.

Reddit’s content is valuable because it shows how real humans interact in question asking and answering. It’s not only the type of the questions that is important for AI training; rather, it’s the style used by humans informally explain things to others, and how a conversation can naturally flow from this interaction. Reddit has several prime examples on how this is done which are an almost perfect fit for the kind of ‘naturality’ that Google expects to capture and train their AIs with.

But there are other sources for this kind of human-to-human communication. Such as Second Life.

By the end of 2023, Linden Lab announced that 2024 would see the “coming of AI” to Second Life, without going in many details. These were soon revealed: in order to reduce the churn rate of brand new users, Linden Lab partnered with Convai to deploy help bots on the Welcome Areas, with mixed results — but LL always explained that this was still very ‘experimental’.

But note that these bots are not only ‘helpful’ in the sense that they can answer sophisticated questions posed by newbies and walk them through the required steps. They also work in the other connection — capturing precious conversational data, which the Convai experts can then further process and refine to have their bots be even more ‘fluent’.

Again, this doesn’t stop here.

It is a well-established fact that Second Life is perhaps one of the last platforms that lacks encryption in chat communications, especially in direct IMs and group chat. It’s almost trivial to connect a bot, join a popular group, and start recording all transcripts, for the purpose of doing more training — but at the cost of intruding in the resident’s privacy. You’d be hard-pressed to figure out if a bot were listening in to your conversation or not.

The nice, plain, simple chatbot greeting you on your favourite shop, and which is always eager to send you landmarks to the latest and greatest items on sale could also be recording everything you say in public chat, adding it to the vast database that AI makers require to train their models.

Of course, to be honest, it’s far easier to embed a chat repeater on your favourite pair of shoes… or inside your Bento mesh. Hmm. Anyway…

Ok, so, text chat is easy to grab. But what about voice chat?

Enter the new development being done by Linden Lab: voice-over-IP (VoIP) chat using WebRTC. This is announced as a way for Linden Lab to drop their support for Vivox (the current VoIP provider for Second Life and many OpenSimulator Grids), and, instead, move voice chat to standard Web protocols, which, allegedly, should allow Linden Lab to integrate more functionality and improve quality overall.

That’s all true and correct, but the attentive observer might have noticed that this allows Linden Lab to effectively receive and locally process all voice communications, which, according to their specifications, do not appear to be heavily encrypted, if at all.

Thus, Linden Lab, besides supplying AI developers with Terabytes of text chat, may soon complement that data with the much more interesting voice data. And why is it so interesting? Because humans do not speak in the same way they write, especially, of course, in informal and casual conversations.

Most VoIP providers, however, guarantee their customers that their voice calls will be encrypted end-to-end. Therefore, these are useless for being intercepted for extracting AI training data.

Linden Lab, however, might not encrypt such data, and, as a consequence, it will be able to feed them all directly to Convai or any other AI provider requiring lots of good-quality voice communications for training purposes. An added bonus, on public chat, is that positioning information (location and direction the speaker is turned to) is also captured and stored, since this is used to produce spatial audio with extreme accuracy. But even direct calls and group calls may be captured and processed by AIs, reflecting the difference between being on an intimate call (one-to-one), a closed group conference, and, well, talking in public.

Evidently, Linden Lab will not relay any personal information to third parties. Voice streams will be totally anonymous, without even referencing the avatar it comes from, and stripped from any IP addresses — all requirements to abide by US and EU law. For training purposes, such data is nevertheless incredibly valuable, meaning that Linden Lab might be able to get a good price for it. Perhaps not as much as Reddit gets from text — the audience is considerably more limited — but nevertheless a nice income supplement to cover the costs of developing a whole new voice system from scratch.

The consequences?

Don’t be surprised if the next time you interact with a conversational ‘bot in Second Life, it will answer in your own voice. ?