I’ve been pretty pleased with results of my experimental entry into
the world of VoIP, because it had been working without a hitch. Up
until tonight, anyway.
I noticed the problem when I went to call the new home VoIP number
from my cellphone, and got a “Not available” message from Callcentric.
I know immediately something was not right, because that shouldn’t
ever happen (unless the power was out or Internet service was
interrupted). When I got home I logged into the router’s
configuration page, and discovered that the line was no longer
registered with Callcentric’s servers.
I started off by fixing the obvious things, including network
connections and a power cycle. I made sure I could ping Callcentric,
so no problems there. The configuration on the ATA matched their
website (plus, it had been working fine for a week), so hopefully no
problems there. To rule out NAT issues, I put the ATA temporarily in
the LAN DMZ. Still no dice.
Getting a little more desperate, I turned on the SPA-2102’s syslog
feature, turned the debug verbosity up, and started tailing the output
on my PC. The result was mildly enlightening:
Jun 12 00:33:25 192.168.1.150 system request reboot
Jun 12 00:33:25 192.168.1.150 fu:0:45af, 0038 043c 0445 0001
Jun 12 00:33:25 192.168.1.150 fu:0:4605, 03e4 05b0 0001
Jun 12 00:33:30 192.168.1.150 System started: ip@192.168.1.150, reboot reason:C4
Jun 12 00:33:30 192.168.1.150 System started: ip@192.168.1.150, reboot reason:C4
Jun 12 00:33:30 192.168.1.150 subnet mask: 255.255.255.0
Jun 12 00:33:30 192.168.1.150 gateway ip: 192.168.1.1
Jun 12 00:33:30 192.168.1.150 dns servers(2):
Jun 12 00:33:30 192.168.1.150 192.168.1.1
Jun 12 00:33:30 192.168.1.150 71.170.11.156
Jun 12 00:33:30 192.168.1.150
Jun 12 00:33:30 192.168.1.150 fu:0:4648, 03f6 0001
Jun 12 00:33:30 192.168.1.150 RSE_DEBUG: reference domain:_sip._udp.callcentric.com
Jun 12 00:33:30 192.168.1.150 [0]Reg Addr Change(0) 0:0->cc0bc017:5080
Jun 12 00:33:30 192.168.1.150 [0]Reg Addr Change(0) 0:0->cc0bc017:5080
Jun 12 00:33:38 192.168.1.150 IDBG: st-0
Jun 12 00:33:38 192.168.1.150 fs:10648:10720:65536
Jun 12 00:33:38 192.168.1.150 fls:af:1:0:0
Jun 12 00:33:38 192.168.1.150 fbr:0:3000:3000:04605:0002:0001:3.3.6
Jun 12 00:33:38 192.168.1.150 fhs:01:0:0001:upg:app:0:3.3.6
Jun 12 00:33:38 192.168.1.150 fhs:02:0:0002:upg:app:1:3.3.6
Jun 12 00:33:38 192.168.1.150 fhs:03:0:0003:upg:app:2:3.3.6
Jun 12 00:33:39 192.168.1.150 fu:0:465a, 0003 0001
Jun 12 00:34:02 192.168.1.150 RSE_DEBUG: getting alternate from domain:_sip._udp.callcentric.com
Jun 12 00:34:02 192.168.1.150 [0]Reg Addr Change(0) cc0bc017:5080->cc0bc022:5080
Jun 12 00:34:02 192.168.1.150 [0]Reg Addr Change(0) cc0bc017:5080->cc0bc022:5080
Jun 12 00:34:34 192.168.1.150 RSE_DEBUG: getting alternate from domain:_sip._udp.callcentric.com
Jun 12 00:34:34 192.168.1.150 [0]RegFail. Retry in 30
After that, there are just a lot of “unref domain” errors, repeated
over and over every 30 seconds, as the 2102 tries to register and
can’t. (Can we hear it for the guy at Linksys who got them to keep
the remote logging feature?)
From this we can tell a few things. It looks like the 2102 is booting
up, and then it’s looking for Callcentric’s SIP server, by querying
the DNS SRV record. This is as it should be. However, for some
reason it’s apparently not getting back the right server to use.
Just as a first shot to eliminate DNS issues, I changed out the DNS
server values in the 2102 configuration (normally, I use my
gateway/router, which lives at 192.168.1.1) with my ISP’s DNS
servers. No improvement. Then, I decided to try pulling the SRV
records manually, to see if there was an obvious misconfiguration on
Callcentric’s part, or if they weren’t returning DNS SRVs.
Without getting into a whole sidetrack on how DNS SRV records work,
the way to pull them is via dig
. To get the server and port for SIP
traffic carried on UDP for the Callcentric.com domain, you would run
$ dig _sip._udp.callcentric.com SRV
; <<>> DiG 9.3.2 <<>> _sip._udp.callcentric.com SRV
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11397
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 3, ADDITIONAL: 5
;; QUESTION SECTION:
;_sip._udp.callcentric.com. IN SRV
;; ANSWER SECTION:
_sip._udp.callcentric.com. 1800 IN SRV 5 5 5080 alpha4.callcentric.com.
_sip._udp.callcentric.com. 1800 IN SRV 5 5 5080 alpha2.callcentric.com.
This tells us that UDP SIP traffic should be directed to either
alpha2.callcentric.com or alpha4.callcentric.com, both on port 5080.
The servers have equal priority so either one can be used. Running a
quick host alpha2.callcentric.com
gives the A record for that
server, which turns out to be 204.11.192.23
.
What we’ve accomplished at this point is what the SPA-2102 is supposed
to do every time it tries to register with Callcentric. Query the
domain-level SRV record to get the particular server for SIP traffic,
and then query that server’s record for its IP address, and then
connect to it. We just did that, and now have an IP and port.
To see if that server worked, I put it into the SPA’s incoming and
outgoing proxy fields, and turned “Use DNS SRV” off. Lo and behold,
after I rebooted it, I was back online.
For the moment, anyway, things are working again. However, they’re
not working the way they’re supposed to. If Callcentric decides to
change its server’s IP address, I’ll no longer be able to connect.
Ditto if that particular server gets overloaded. All the benefits of
DNS are lost when you go this route. Therefore, it’s not really a
satisfactory long-term solution.
I’ve opened a trouble ticket with Callcentric and will see what they
say. Googling terms like “RSE_DEBUG” and “unref domain” produce some
results — I’m apparently not the only person to have experienced this
problem! — but no good solutions. It’s obviously a DNS problem, but
who’s exactly to blame isn’t clear. I suspect Callcentric is going to
blame either the ATA configuration or my LAN setup, and in their
defense, their DNS records seem to be correct. However I can’t see
how the problem can be misconfiguration when it worked well for more
than a week. I suspect I’ll probably end up on the phone with Linksys
eventually.
If I do figure out some sort of solution, or even a satisfactory
explanation, I’ll be sure to post it. In the meantime, if anyone
happens to come across this page because they’re experiencing the same
problem, the only workaround I’ve found is to manually query the SIP
server IP and put that into the 3102’s configuration. (And pray your
VoIP provider’s IP address assignments are relatively stable.)
Any thoughts or suggestions are, as always, appreciated.
FOLLOWUP: I got a form response back from Callcentric noting
that my device was registered again, and blaming the problem on my
Internet connection. (Of course, it was back up because I put the IP
address in directly.) However, when I went back to using DNS SRV, it
seemed to work fine … which really annoys me, because if there’s one
thing I hate more than stuff that doesn’t work, it’s a stuff that
breaks unpredictably and for no reason.