I’ve been pretty pleased with results of my experimental entry into the world of VoIP, because it had been working without a hitch. Up until tonight, anyway.

I noticed the problem when I went to call the new home VoIP number from my cellphone, and got a “Not available” message from Callcentric. I know immediately something was not right, because that shouldn’t ever happen (unless the power was out or Internet service was interrupted). When I got home I logged into the router’s configuration page, and discovered that the line was no longer registered with Callcentric’s servers.

I started off by fixing the obvious things, including network connections and a power cycle. I made sure I could ping Callcentric, so no problems there. The configuration on the ATA matched their website (plus, it had been working fine for a week), so hopefully no problems there. To rule out NAT issues, I put the ATA temporarily in the LAN DMZ. Still no dice.

Getting a little more desperate, I turned on the SPA-2102’s syslog feature, turned the debug verbosity up, and started tailing the output on my PC. The result was mildly enlightening:

 Jun 12 00:33:25 192.168.1.150 system request reboot
 Jun 12 00:33:25 192.168.1.150 fu:0:45af, 0038 043c 0445 0001
 Jun 12 00:33:25 192.168.1.150 fu:0:4605, 03e4 05b0 0001
 Jun 12 00:33:30 192.168.1.150 System started: ip@192.168.1.150, reboot reason:C4
 Jun 12 00:33:30 192.168.1.150 System started: ip@192.168.1.150, reboot reason:C4
 Jun 12 00:33:30 192.168.1.150   subnet mask:    255.255.255.0
 Jun 12 00:33:30 192.168.1.150   gateway ip:     192.168.1.1
 Jun 12 00:33:30 192.168.1.150   dns servers(2): 
 Jun 12 00:33:30 192.168.1.150 192.168.1.1 
 Jun 12 00:33:30 192.168.1.150 71.170.11.156 
 Jun 12 00:33:30 192.168.1.150 
 Jun 12 00:33:30 192.168.1.150 fu:0:4648, 03f6 0001
 Jun 12 00:33:30 192.168.1.150 RSE_DEBUG: reference domain:_sip._udp.callcentric.com
 Jun 12 00:33:30 192.168.1.150 [0]Reg Addr Change(0) 0:0->cc0bc017:5080
 Jun 12 00:33:30 192.168.1.150 [0]Reg Addr Change(0) 0:0->cc0bc017:5080
 Jun 12 00:33:38 192.168.1.150 IDBG: st-0
 Jun 12 00:33:38 192.168.1.150 fs:10648:10720:65536
 Jun 12 00:33:38 192.168.1.150 fls:af:1:0:0
 Jun 12 00:33:38 192.168.1.150 fbr:0:3000:3000:04605:0002:0001:3.3.6
 Jun 12 00:33:38 192.168.1.150 fhs:01:0:0001:upg:app:0:3.3.6
 Jun 12 00:33:38 192.168.1.150 fhs:02:0:0002:upg:app:1:3.3.6
 Jun 12 00:33:38 192.168.1.150 fhs:03:0:0003:upg:app:2:3.3.6
 Jun 12 00:33:39 192.168.1.150 fu:0:465a, 0003 0001
 Jun 12 00:34:02 192.168.1.150 RSE_DEBUG: getting alternate from domain:_sip._udp.callcentric.com
 Jun 12 00:34:02 192.168.1.150 [0]Reg Addr Change(0) cc0bc017:5080->cc0bc022:5080
 Jun 12 00:34:02 192.168.1.150 [0]Reg Addr Change(0) cc0bc017:5080->cc0bc022:5080
 Jun 12 00:34:34 192.168.1.150 RSE_DEBUG: getting alternate from domain:_sip._udp.callcentric.com
 Jun 12 00:34:34 192.168.1.150 [0]RegFail. Retry in 30

After that, there are just a lot of “unref domain” errors, repeated over and over every 30 seconds, as the 2102 tries to register and can’t. (Can we hear it for the guy at Linksys who got them to keep the remote logging feature?)

From this we can tell a few things. It looks like the 2102 is booting up, and then it’s looking for Callcentric’s SIP server, by querying the DNS SRV record. This is as it should be. However, for some reason it’s apparently not getting back the right server to use.

Just as a first shot to eliminate DNS issues, I changed out the DNS server values in the 2102 configuration (normally, I use my gateway/router, which lives at 192.168.1.1) with my ISP’s DNS servers. No improvement. Then, I decided to try pulling the SRV records manually, to see if there was an obvious misconfiguration on Callcentric’s part, or if they weren’t returning DNS SRVs.

Without getting into a whole sidetrack on how DNS SRV records work, the way to pull them is via dig. To get the server and port for SIP traffic carried on UDP for the Callcentric.com domain, you would run

 $ dig _sip._udp.callcentric.com SRV

 ; <<>> DiG 9.3.2 <<>> _sip._udp.callcentric.com SRV
 ;; global options:  printcmd
 ;; Got answer:
 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11397
 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 3, ADDITIONAL: 5

 ;; QUESTION SECTION:
 ;_sip._udp.callcentric.com.     IN      SRV

 ;; ANSWER SECTION:
 _sip._udp.callcentric.com. 1800 IN      SRV     5 5 5080 alpha4.callcentric.com.
 _sip._udp.callcentric.com. 1800 IN      SRV     5 5 5080 alpha2.callcentric.com.

This tells us that UDP SIP traffic should be directed to either alpha2.callcentric.com or alpha4.callcentric.com, both on port 5080. The servers have equal priority so either one can be used. Running a quick host alpha2.callcentric.com gives the A record for that server, which turns out to be 204.11.192.23.

What we’ve accomplished at this point is what the SPA-2102 is supposed to do every time it tries to register with Callcentric. Query the domain-level SRV record to get the particular server for SIP traffic, and then query that server’s record for its IP address, and then connect to it. We just did that, and now have an IP and port.

To see if that server worked, I put it into the SPA’s incoming and outgoing proxy fields, and turned “Use DNS SRV” off. Lo and behold, after I rebooted it, I was back online.

For the moment, anyway, things are working again. However, they’re not working the way they’re supposed to. If Callcentric decides to change its server’s IP address, I’ll no longer be able to connect. Ditto if that particular server gets overloaded. All the benefits of DNS are lost when you go this route. Therefore, it’s not really a satisfactory long-term solution.

I’ve opened a trouble ticket with Callcentric and will see what they say. Googling terms like “RSE_DEBUG” and “unref domain” produce some results — I’m apparently not the only person to have experienced this problem! — but no good solutions. It’s obviously a DNS problem, but who’s exactly to blame isn’t clear. I suspect Callcentric is going to blame either the ATA configuration or my LAN setup, and in their defense, their DNS records seem to be correct. However I can’t see how the problem can be misconfiguration when it worked well for more than a week. I suspect I’ll probably end up on the phone with Linksys eventually.

If I do figure out some sort of solution, or even a satisfactory explanation, I’ll be sure to post it. In the meantime, if anyone happens to come across this page because they’re experiencing the same problem, the only workaround I’ve found is to manually query the SIP server IP and put that into the 3102’s configuration. (And pray your VoIP provider’s IP address assignments are relatively stable.)

Any thoughts or suggestions are, as always, appreciated.

FOLLOWUP: I got a form response back from Callcentric noting that my device was registered again, and blaming the problem on my Internet connection. (Of course, it was back up because I put the IP address in directly.) However, when I went back to using DNS SRV, it seemed to work fine … which really annoys me, because if there’s one thing I hate more than stuff that doesn’t work, it’s a stuff that breaks unpredictably and for no reason.