Ejabberd has supported STUN/TURN for quite some time now, this in conjunction with client support can be used to implement one on one audio and video calls.
Since version 2.8.0 Conversations Android client added audio and video call functionality by leveraging on STUN/TURN and XEP-0215.
The rest of the XMPP world is following the route opened by them, so I expect to see IOS and regular computer XMPP clients to finally implement these new features too in the upcoming months.
Enabling audio and video calls in Ejabberd is actually pretty simple.
Provided you have installed the latest release (version 20.04), edit ejabberd.yml:

listen:
  ...
  -
    port: 3478
    transport: udp
    module: ejabberd_stun
    use_turn: true
    turn_ip: <stun_turn_server_ip_address>
    auth_type: user
  ...

modules:
  ....
  mod_stun_disco: {}
  ...

The other step is opening the required port, UDP 3478, on the firewall.

.:. DNS records

For the service to actually work it should be able to advertise the existence of a STUN/TURN server to the world.
This is done by adding the proper DNS records:

| TYPE     | HOST   | IP           |
|----------|:------:|:------------:|
| A RECORD | stun   | <server_ip>  |
| A RECORD | turn   | <server_ip>  |

and then it is also required to add two SRV records:

| _service._proto.name. | class | SRV | TTL | priority | weight | port | target.           |
|:---------------------:|:-----:|:---:|:---:|:--------:|:------:|:----:|:-----------------:|
| _stun._udp            | IN    | SRV | 30  | 0        | 0      | 3478 | stun.example.com. |
| _turn._udp            | IN    | SRV | 30  | 0        | 0      | 3478 | turn.example.com. |

Clients tend to use a very low amount of bandwidth, setting up DNS round robin is also pretty trivial in case more than one TURN server is required.

.:. Security considerations

  • The tech behind this feature is WEBRTC and XMPP used for out-of-band key encryption keys negotiation.
    If the chat with the other user is end-to-end encrypted, the key negotiation also happens securely.
  • Another aspect to keep in mind is that calls are essentially peer-to-peer, this means that both peers know each other public IP address and possibly also some other informations obtained by exchaing the output of the STUN query ( LAN IP address, NAT type).
    For this reason I would not pick up calls coming from strangers.