Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Calla – spatial video conferencing software based on Jitsi Meet (github.com/capnmidnight)
200 points by thekyle on Oct 7, 2020 | hide | past | favorite | 85 comments


Does anyone remember Metaplace,[1] circa 2009? It was an isometric 2.5D-ish "virtual world" that, if it shipped today, would probably be pitched as a sort of Roblox competitor for building games and interactive installations within a virtual space. (At the time it compared itself to Second Life, which either baffled people who didn't know what it was or turned off people who did.)

I used it briefly in 2009 and one of the things we tried to do with it was exactly this — spatial voicechat in a virtual environment with contrived "physical" modeling of spaces to adjust how sound travels. (This also reminds me of TeamSpeak's spatial integrations from around the same era.)

I know it's the pandemic that's inspired a ton of these projects, but they've been around in some form since live voice-over-IP came about and haven't really taken off in non-game applications.

What keeps blocking them from mainstreaming? I suspect maybe it's adding contrived physical interactions to the already high relative overhead of meeting virtually, but I'm not sure if that's just a problem for me. I could see myself enjoying something like this but not many others on my team seeing enough value to take the efficiency hit of having either zero or multiple open voice channels that do away with the abstraction.

[1] https://www.engadget.com/2008-10-22-massively-interviews-rap...


I never tried Metaplace but it reminded me of The Palace which I did spent many hours of my youth on. It would be interesting to see if these 2d chats have work-place benefits over the current Slack/IRC experiences.

https://en.wikipedia.org/wiki/The_Palace_(computer_program)


Yeah, I remember that one as well. Real fun. Its also why I never bothered with Second Life ("been there, done that"). I just checked and saw someone even made a Linux client for The Palace. Neat!


Notable game designer Raph Koster was involved in Metaplace and his blog included a lot of making of/behind the scenes articles and discussions from around that time, including some post-mortem thoughts.

> What keeps blocking them from mainstreaming?

Most likely it is business model. Getting the spatial model correct is a lot of work, and making it generic enough to be reusable is even harder. (Both Second Life and Metaplace flirted with idea of being platforms that other services could build on.) It's easier to get some financing for it as a "game" (it's also useful to note a bunch of Metaplace ideas came out of experiments in failed MMO Star Wars Galaxy), but it's harder to sell it as a general chat application (much less platform) or especially as an "Enterprise tool" if it looks too much like a toy or a game. On the flipside it's harder to build it as an Enterprise tool first, because the impulse to keep things sparse/spartan/"profesional" also leaves a air of "lifelessness" with not enough things to do outside of/between meetings/during downtime inside of meetings, not enough of a feeling that it is a space to "inhabit" rather than visit briefly because a meeting required it.

These business model bootstrap hurdles are a criticism I often bring to a lot of the conversations about why the Cyberpunk "ideal" of a single shared VR space (including the one most recently popularized by Ready Player One) seems so extremely unlikely in the real world.

(Aside: Ready Player One especially suffers from a lot of basic game economy mistakes and likely shouldn't have survived very long in its world for very long, much less have taken over nearly everything including education and enterprise so much as it did. My reading of the book was that the game economy directly caused a lot of the collapse and dystopian economy outside of the game, which made the book better than the author seemed to have intended as further reading and other evidence suggests the author was not aware of how terrible the economics of the game were, they were just fun ideas thrown out for verisimilitude.)


IT had not rung a bell until I clicked your link and as soon as I saw the picture I remembered immediately - thanks for the reminisce.

I was thinking that the old 2d / move around 3-d kind of space chat from waaay back - worldsChat - you had an avatar and moved around a space station chatting.. I always wondered what held that back from being more popular / more used by more people..

Which brings me to the time of having all sorts of friends and family pinging on yahoo messenger for a while - then that was killed off.

Perhaps if worldsChat had this spatial talking kind of option it would of gotten more traction. I'm sure moderating and other factors also apply.


Can't wait to try this with some friends on the weekend, thanks!

I think it's beautiful how it's "just" a wrapper on top of Jitsi Meet as it demonstrates how Free Software can (also) be used to experiment with and adapt user interfaces to specific and/or special needs instead of locking you into ever-more-bloated proprietary apps <3


This is cool and actually something I have been considered missing during, e.g., academic conferences. The "hallway track" is often the most interesting one! Being able to wander around and overhear various conversations until you find one that interests you or you can add to is great.

A Zoom breakout room is nothing like that experience. This, or something like this, could very well be what is needed.


Funnily enough, "calla" means "shut up" in Spanish.

Can't wait to try this with my co-workers. One of the big barriers for remote casual conversation is the spatiality, since all video conference systems are focused on a single speaker.


Yeah, someone told me that a few weeks ago.

Calla is a type of lilly. I name all of my projects after plants. Originally, I named this project "Lozya", which is the Hungarian word for "vines", because Jitsi is the Hungarian word for "wires". But nobody could pronounce it correctly, so I renamed it.

Incidentally, I work for a foreign language instruction company. I brought it up with one of our teachers and she didn't think it was an issue. She said that, in context, it wouldn't be read as "shut up", that it's kind of like "lead" (to guide something) and "lead" (a metal) in English.


"Calla" is probably not an issue, but it's impossible for me to read it otherwise as a Spanish speaker. Not offensive or anything, just funny that a conference app tells me to shut up!


speak now or calla forever


Love seeing all the activity in this space. For https://js.la we've been using https://rambly.app and it's been amazing for transitioning our meetup to online.


Would love to use this for post-conference hangouts. How many users have you had logged in at one time before issues kick in?

Edit: And is it possible to self host?


~50 users, beyond that and you'd want to use separate instances.

No self-hosting planned, but private rooms with moderation controls are on the roadmap.


Very exciting, thanks!


We've a few spatial chat apps pop up in the past few months. It's part of the reason we set out to build https://align.link/, an extensible video chat platform.

Products like these (and others like https://www.macro.io/) could exist as meeting apps, allowing people to to tailor rooms specific scenarios like one on ones, retrospectives, happy hours, etc.


Seems similar to https://gather.town but self-hostable.


I've used Gather for a couple virtual conferences this year and I was surprised how much I liked it. It's obviously still no substitute for in-person gatherings, but it does recreate some of the atmosphere of being in a room full of people while still being able to have meaningful small group conversations.


I recently open-sourced another Jitsi based plateform, and also implementing a custom interface.

This is more oriented toward professional meeting, but is part of the trend of doing more that just plain video conference. code https://github.com/retrolution/open-fishbowl and version on https://retrolution.github.io/open-fishbowl/?tableViewEnable...


Kind of crazy to see this on the front page. This is my project.

I started this project as an experiment, half to see if it would work for my saturday morning tech meetup, half to see if I could make spatialized audio conferencing work in the browser at all, as at the time I was considering switching my VR app at work from Unity to WebXR (which, ultimately, I did).

The repository has been a little neglected in the last few weeks, but that's because I've been working on redesigning a few parts to make it work better in both 3D and 2D and haven't settled enough yet to commit the work.


Jitsi team member here. This looks awesome, kudos and thanks for sharing!


I know the webcam isn't working for some folks. I kinda don't care. I've got a lot more important things to focus on.

I think the video stream in teleconferencing has absolutely zero redeeming qualities. People think it's there to be able to convey facial expressions and facilitate non-verbal communication, but I think it's a complete failure at that task.

For one thing, few listeners are actually looking at the speaker as the speaker is talking. They're most likely looking at themselves or some other thing going on with their computer, so their facial reactions are not really based on what the speaker is saying.

On the flip side, the speaker never gets to see people looking at them. Almost nobody looks at their camera instead of something on the screen, so the speaker never gets that "eye contact" feeling. Best-case scenario, you get a group of people trained to move the speaker video feed directly under the camera lens and they are diligent about making sure they are looking at the speaker. Even then, there is still a "20 yard stare" look to everyone. It also causes exhaustion as it puts you into a feeling that you're in an interrogation of some kind.

Additionally, it's such a narrow field of view for the camera. Non-verbal communication is more than just facial expressions, it's also body posture and standing distance. There are facial ticks that are also lost in the low-quality of the webcam feed, and the non-uniformity of every user's personal lighting settings creates an unnatural scenario where every person is lit differently than you'd expect, or from each other.

And finally, while teleconferencing has a lot of trouble with latency between when a person speaks and when the other people hear them, there is also a lot of latency between when you hear a person speak and when you see them. The audio and video feeds are not synced correctly.

By completely eliminating the video feed, conversations actually work a lot better. I get so many people who demand to have that video feed for the reasons they've been indoctrinated on, with little to no effort to even try audio-only conferencing.

And frankly, as a listener, I don't want the speaker to see my reflex reactions. I don't want them to see into my room. I really only want them to see the what I choose to let them see.

Thus, the avatars and the emoji reactions.


I've been wanting this kind of thing, but for Minecraft (Education Edition in my case; I think there is/was a mod for Java edition to do this, but MinecraftEdu is not based on Java edition). I can easily get a list of users and their in-game co-ordinates, so need a spatial conference system that has an API rather than it's own game. Any suggestions?



Thanks; will have a look. Most of my students are using Chromebooks so will have to investigate if the Android client has support for this.

Would be nicer to just be able to point them to a webpage, but gotta take what you can get :)


After a long search, we found two workable 'office neutral' solutions: Virbela and its successor FrameVR. I'll be online on FrameVR in my own personal frame at: https://framevr.io/testerdetest Please use headphones, because the echo-cancelation is a bit iffy.


I like https://rambly.app/ which https://js.la/ has been using after its monthly meetups.


Very cool idea!

I tried gather.town the other day for something similar but couldn't get it to work on Linux, couldn't detect my mic and camera correctly.

I am able to hear some background noise on the Calla room as I approach some others but I do not hear any talking and I do not know if others can hear me. I also can't get my webcam working, just a black screen and my webcam doesn't activate. It activated the first time I joined but now seems to have chosen a different camera.

I think camera/mic selection options need to be added for systems with multiple devices.

Really hope this gets refined!


There is a camera/mic selection option. Lower-left corner, next to the mute-audio/video buttons, there is a button labeled "change".


I wonder if your map of the room has to map the other people's map of the room?

sort of like ... can you have a map that blocks someone without letting them know you just distanced yourself? :)


This is cool! I've been working on something similar: https://cyberparty.io/


Please couple this with VR + avatars + realistic settings and the future will finally be here. Business meetings will never again require flying


That's actually what I'm doing for my day job and part of why I originally built this. I built the 2D map for Calla because I was still in the middle of converting all my VR work from Unity to WebXR. But pretty soon, VR will be possible.

Technically, Calla is just the library for driving Jitsi and adding spatialization. You can do whatever graphics you want with it. The graphical elements and the interactions are all up to you to build, separately from the teleconferencing.


Very interesting! I've had this idea in my mind for ages but never see how it could catch up with enterprise-y type customers, but I suppose a pandemic can really be a motivating factor



Those are really cool, but not exactly what I was imagining. I was thinking more of a temporary virtual space like a zoom meeting / boardroom type setting. Gotta be packaged like an app / solution to take off


Have we not been living through the same pandemic? Nobody in the US has a reliable internet connection, services are controlled by third parties and randomly black out, infrastructure is overloaded, meetings are insecure, people step away from their keyboards and are unreachable, there are screaming kids in the background, etc.? I would happily board a plane for work travel right now, this is like living in a parody of my previous job.


I suppose my experience has been different! Everyone at work is expected to respond to any e-mail in 15 minutes at most (1 hour on weekends), I have no kids and honestly haven't seen a single one in zoom calls (I hear this one person's dogs barking a lot if they're in the meeting, though), calls are airtight and password protected, and most of my contacts have pretty good internet


Wow, do you get paid for being on call during the weekends?


I get paid enough to not complain about it... part of the job description, I suppose


I’m sorry to say so, but I find these kind of conditions discriminating to those who have a family.


Lots of us have families. It's a trade-off, not discrimination. There are other jobs out there that don't require the same degree of availability. We get paid like doctors, so we're on call like doctors


Another similar project (16-bit map, Jitsi):

https://workadventu.re/


This is so cool... thank you very much :) I will use it during my next workshop sessions to help my students in case they run into problems following the instructions of the lab: looks like it is far more natural than the "raise hand" feature. And they just need to group themselves to enjoy the equivalent of a private room for group exercises!


shameless plug: we saw a similar problem, but our solution is spatial video calls in 3D instead of 2D: https://laptopsinspace.com it's also built on top of the jitsi meet api but uses an extra backend for managing the game data.


I had originally thought of 3D. This isn't my first time making a spatialized audio chat system. I built one for WebVR some 4 or 5 years ago. And I'm actually building one at work right now (with Calla as the underpinning). But for when I started this project (at the beginning of the pandemic) and for what purpose I built it (testing the current temperature of the waters in WebRTC land, as I hadn't done a lot with it since that last app, plus trying to support a weekend tech meetup that was finding Zoom to be a bit annoying, plus seeing if FOSS WebRTC libraries were at a level that I could use for my day job), I wanted to make the graphics portion of the app a simple as possible, support the most people as possible, and not get in the way.

One thing I've noticed with a lot of the "competitor"[0] apps out there is that they're more focused on the graphics than the audio. They're more focused on making a 90's-era RPG than on audio conferencing. I'm the opposite. I'm more focused on conferencing than the game. I get a lot of requests in the github repo for functionality in the game. The "game" is beside the point. It's just an exercise of the audio conferencing.

[0] I don't really see myself as in competition with gather.town or High Fidelity or any of the other apps because I'm not building Calla to be a startup. I love my day job and I won't be leaving it. Calla is just a component of a much larger thing I'm building.


Cool concept. I think my none gamer colleagues would be put off by the game-like interface though. Maybe just drag and droppable icons would work just as well.

That server cost seems very high, the same offering (2 vcpu, 4GB RAM) is €5.58/month at Hetzner for example.


You can build whatever interaction system you want on top of the thing. Calla is a wrapper around Jitsi to add the spatialization and communicate user state between users.

https://github.com/capnmidnight/Calla/blob/master/Calla/doc/...

As for the server costs, I'm not going to use anything other than Azure. Most of the cost is bandwidth, but it's worth more to me to keep everything in one place and not have to worry about learning another PaaS system. From the other offerings I've seen, Azure is usually only more expensive at the most lowest of tiers. Once you start scaling up, the other vendors get just as expensive. So I'm fine with shelling out a few more bucks here and there to avoid having to use something else.


He better hopes no one will start using it. The biggest cost factor is the bandwidth. 30 people can easily generate a steady 100 Mbps usage. A conference for 15 people is about $3/h on any cloud provider


I know the Jitsi project rejected using p2p webrtc for more than 2 participants a while back due to exploding upload bandwidth requirements at the clients, but I wonder if this video-game style interface would work OK with a single low bandwidth video stream of each participant (instead of switchable low/med/hi res ones), and maybe support 5 or 6 users over P2P instead of needing that server-side bandwidth?


Well, I built exactly that thing at cyberparty.io!

The video is 320x240 and one stream is about 600 kbit/s and audio is another 30 kbit/s.

So yeah, that works fine for a few users that have maybe 5 mbit/s upstream.


True, video chat is very expensive especially with the extreme bandwidth costs of the big cloud providers.

I built something similar as OP and chose a mesh solution for this reason even though it's inferior.

It appears that only big video chat companies like Zoom or Skype can afford to have a generous free tier, subsidized by their business offerings.


They build their own DCs, and use cloud as a peak-smoother.

No one doing serious amounts of bandwidth pay outrageous big cloud prices. See: netflix.


My home internet is 200Mbps symmetric, and this is not uncommon in my area. I could upgrade to 1Gbps, for about $20/mo more. Bandwidth in the cloud is highway robbery. Bandwidth to home is flat rate.


It is until you read the terms and conditions and see that you have a 1TB data cap, or a more generic: we'll cut your access if you have "excessive" data usage.


I don't have a data cap though. In fact I specifically picked a provider without a data cap (out of the 2 available at my location). My understanding is data caps are more popular in e.g. Canada and Australia and such, and are hardly a thing anywhere else. At least not to the point where they'd be enforced.


Once everyone using your provider start filling up their connection, it won't be long until they implement one.

What you got is a 200Mbps connection to your provider (and still that's probably a lie, it's probably shared before reaching their endpoint), afterward it's fully shared with every other customer... that's just how the internet is made, you can't have a dedicated 1 gbps to every single server, that just doesn't make sense.

Thing is, the higher the requirements, the more expensive it is to support, that's simple math... If you got 10 000 clients that download 1 gbps, you need 10 tbps, it's even worse if they are all on the same service, that connection won't support this, believe me.


ISPs in the USA have been sneakily adding them for years. It's almost standard now.


My contract contains a few terabytes of data transfer. I sincerely doubt you’d get through that any time soon.


So... not a UI inspired by Seele's VC system? (https://wiki.evageeks.org/Seele)


1996 is many years to late, maybe 118 years[1], maybe 14 years[2] but late anyway. Hard to say [3], but to answer your question: "no, probably not"

1: The Telephonoscope was already imagined in 1878: https://upload.wikimedia.org/wikipedia/commons/8/8e/Telephon...

2: William Gibsons is often credit with the idea of "cyberspace" as in a virtual environment where people choose their own avatar and move around interacting with each other, with systems or data. https://en.wikipedia.org/wiki/Burning_Chrome#Reception

3: Gibson might have played MUD in the 70s: https://en.wikipedia.org/wiki/MUD1


I have no idea what you're talking about


Is this a statement disguised as a question?


Aren't they all?


No, probably not.


I've used a couple of these kind of apps now. From my perspective, they are kind of worse versions of Zoom because everyone clumps together into one group anyway.


That's mostly not been my experience, so I guess it depends on the group of people you're with. But when it does happen, I still think it's better than Zoom because you still get the directionality of voices and you don't have the useless video stream distracting the view.


I agree that video conferencing as it is is not as nice as being at a party or meetup or just at the office in terms of your proximity to others and the ways it affords talking to those close around you BUT! ........

I can only imagine that the charismatic or attractive person so going to get bothered/hit on/harassed 10x more in a virtual space where there's effectively unlimited space to be "near" to them and zero effort to do so.


Question a bit apart from the key features of the product, but how self-hosted open-source product usually makes money?


It doesn't. I'm not looking to make a living off of Calla. I have a day job that I really love. Calla is a component of it.


Often they don't, or they offer support like Wordpress. Drupal, etc,



Really cool!

Slightly off topic, is there any video conferencing software that doesn't use 100% of the CPU?


Yeah, especially on the browser it’s like setting off a bomb.


This is your browser vendor's fault for blacklisting your GPU to prevent hardware rendering.


This is awesome! Hopefully when VR is more widespread they can add that in as well :)


This reminds me of VRChat.


This is one of those genius ideas that are "obvious in retrospect" and you think "how come nobody thought of this before". I bet spatialization of audio could massively improve intelligibility of video conferences with more than 2 participants. In retrospect it seems nuts that we're not really using stereo imaging. All the tech for spatial audio has been available for about three decades now, and much of it is out of patent already. Newer stuff even takes the position of your head into account for added realism.

I hope Google/Microsoft/Zoom pay attention to this as well. And also Logitech, Bose, etc, so that we see headphones with accelerometers and gyros built in.


This is one of those ideas that just found its time. I was by no means the first to make a spatialized audio chat system. I might have been very close to the first to release something after the start of the pandemic, but it's also not even the first time I've built spatialized audio chat into a system.

My first experience with spatialized audio in a chat system was AltspaceVR, when I first got into working in the VR space, when they interviewed me for a job (sadly, they didn't understand their own product enough and insisted that I'd have to move coasts to work with them). That was easily 6 years ago.

A couple of years later, I built my own VR-based spatialized chat system. It was way too early. WebVR was way too much friction for users[0].

Incidentally, Calla has all the necessary parts to be usable in a VR system. And I'm using it for that in my day job (VR environments for roleplay scenarios in learning foreign languages).

I don't remember why exactly I open sourced this project. It just seemed like the thing to do. I guess I hoped that other developers would find it useful and contribute bug fixes. That largely hasn't happened, so I'm mostly just focused on building the day job project for now. Once I get the next milestone done, I'll consider working on more features for Calla, but for now I need to pay the bills.

[0] WebXR today still kind of is, but it's a lot better than it used to be and I think some clever design in UX with something akin to 2-factor auth can help get around it.


The concept is great, but I don't really find it worth pursuing because it's based on Jitsi.

I do really wanted to love Jitsi, but I've had frequent bad experiences. Every attempt I've had at a meeting has at least one person who's audio and video is bad to the extent that the rest of us can't really communicate with them.

I really do want to use it, but it always ends up with "Screw this, let's use Zoom"


And I have only had good experiences once we enforced _not_ using Firefox there were no problems.

'Use zoom and get screwed.'


I believe Firefox issues in Jitsi were a problem before but are sorted out now.

https://github.com/jitsi/jitsi-meet/issues/4758#issuecomment...


I believe Firefox's market share is now less than 4%, so I'm not really keen on wasting time supporting it.


Ah, so it's not a web app, it's a Chrome app.


Giving up on Firefox's bugs is not the same thing as not supporting Web standards. The problem is that Firefox is the one that doesn't correctly implement Web standards.

It's not my fault that Mozilla is a shitty company that makes a shitty browser and that every other company that used to make a browser is now skinning Chromium. I just live in this world, I didn't make it.

You want better Firefox support? Put up or shut up. It's open source.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: