You’re not alone. It looks like there might be an issue with internal transfers from Auto Attendants to users when using Direct Routing, which is causing a bit of frustration for some Australian and New Zealand customers.
On the face of it, it appears to be a Microsoft issue with sip.pstnhub.microsoft.com. Let’s take a looks at how Lightwire’s Voice Engineering team have narrowed down the issue.
They start by looking at our own Session Border Controllers (These are the devices that connect your Office 365 Tenancies to our Voice Core). They hold entries to contact three Microsoft Servers:
- sip.pstnhub.microsoft.com the Global FQDN that when the SBC sends a request to resolve this name, the Microsoft Azure DNS servers return an IP address pointing to the primary Azure datacenter assigned to the SBC
- sip2.pstnhub.microsoft.com this geographically maps to the second priority region.
- sip3.pstnhub.microsoft.com this geographically maps to the third priority region.
In the SBC we can see a Proxy alarm saying the 220.127.116.11 is down as shown below.
Proxy Set Alarm Proxy Set 1 (Teams): Server 18.104.22.168:5061 is down
If we ping this, we can see that this is meant to resolve to Australia East, the current primary Azure Data centre, and the other IP Addresses for sip2 and sip3 which we will use to diagnose the issue.
So what happens when we get a time out or down alarm? Well naturally, the system routes calls to the second of the three proxies in the list. As we can see below, the call is trying sip2.pstnhub.microsoft.com or IP 22.214.171.124.
But here’s where the problem arises. When Microsoft does the transfer and sends us the REFER message back, it looks a bit like this:
So what’s wrong with the above? Well remember we’re sending traffic to sip2 now, but you’ll notice in the line we bolded above that it’s returning a REFER-TO message that contains sip.pstnhub.microsoft.com which we have already worked out is down.
We can’t force our system to send it to another system, say SIP2 because we’re following the SIP protocol standards that govern all SIP usage and must comply with.
So it looks like Microsoft have an issue with their main server sip.pstnhub.microsoft.com, but how does this issue get resolved/fixed?
As this is beyond our services’ demarcation and withing Microsoft ecosystem, you will need to raise a support ticket with Microsoft from within your Office 365 Tenancy and follow their standard support processes to get the issue fixed.
They will likely ask for SBC logs and other technical information that you’ll need to provide. We’re happy to assist our customers with getting you the relevant information.
You can contact our team on 0800 534 567, or 1300 016 678 and select option 2 for support, or email email@example.com if you need assistance with that.
What can you do as a workaround, until this issue is resolved? You could allocate the number on the Auto Attendant to a user temporally, or divert the entire trunk to mobile phone.
Hopefully, this information provides some insights in what’s going on with your voice system and if you’re wondering why we can’t log these requests with Microsoft on your behalf? Well, that’s because we treat Teams like any other third-party PBX and have no access to it.
But if you’re looking for a Voice Solution that backed with SLA Restroration Targets, let’s have a chat about Lightwire’s Hosted and Managed 3CX Unified Communications Solution.
Update 12 Jan: It appears the issue has been resolved, at least temporally.
If you’re interested in why we say at least temporally, here are some insights on what Microsoft’s engineering team have done.
Remember how the primary DNS for sip.pstnhub.microsoft.com resolved to sip-du-a-auea.australiaeast.cloudapp.azure.com (126.96.36.199) or Australia East (as shown below) as the closest geographic region?
Now it appears, that Southeast Asia is the new primary location for Australian and New Zealand customers.
So, what does this mean and what’s the impact of this? By moving the primary SIP services from Australia to Southeast Asia, it has temporally resolved the issue with the Australia East Server. In doing so, it has also introduced a significantly longer path that the voice traffic now needs to take, otherwise known as latency.
So what’s wrong with latency? In the voice space, latency can be closely tied to quality and customer satisfaction. The further the distance, the longer the delay between you saying something and the person on the other end of the call hearing it.
For now, great that the issue is resolved but let’s hope Microsoft fix the AU East server and can transition the service back, closer to home sometime soon.
Update 19 Jan: and... broken again.
I did say temporally fixed, didn’t I.
Looks like the primary DNS for sip.pstnhub.microsoft.com was transferred back to pointing at sip-du-a-auea.australiaeast.cloudapp.azure.com (188.8.131.52) (or Australia East for short) today after a week of running pointing to Southeast Asia as a workaround.
Surely this means the original issue been fixed, one might think… Sadly not, and back to failing call transfers we go.
So what are we doing? Proactively contacting our customer to let them know the issues back and giving them the same options, we put I place to work around the issue as last time.
As a huge Teams fan, pretty disappointed to see this issue reoccur.
Update 20 Jan: A new day and a new region – temporally fixed again.
Now for some insights.
East Asia is located in Hong Kong, and for full disclosure, South East Asia is in Singapore, and Australia East is in New South Wales, in case you were wondering.
With East Asia even further away as a region naturally latency is going to increase even further.
In fact, the round trip time between the Azure Regions is:
- Australia East to South East Asia is 94ms
- Australia East to East Asia is 118ms
You can add more time onto those for the trip between the end customers location to those regions.
As usual, we’re glad it’s working again and looking forward to seeing it back in Australia (and working properly).
Will we ever know the real cause of these issues, likely not, but does raise focus on the importance of SLA’s for voice and Unified Communications Solutions.
Want some help?
If you have a Microsoft ticket open and it’s not going anywhere, or don’t accept our explanation, please ask them to contact our Head of Voice, Juan van Rooyen (firstname.lastname@example.org) directly.
He’s more than happy to provide detailed logs and show our work.