I’ve always had a fascination with phone systems, having a phone gives you this feeling of being immediately connected to the world. Telephones were the first communication network to have such a sense of immediacy – of course we all take this completely for granted now, and with the internet, the old fashioned POTS (plain old telephone system) is looking rather outdated. In order to understand where we’re headed, we must first understand where we have been – the purpose of this entry is to discuss telephony technologies past, present and future.
Here’s the episode on the telephone from my favourite TV series; Tim Hunkin’s Secret Life of Machines:
There will always be a need for people to have voice conversations with each other. Even now that video-calling has become cheap and fairly reliable, most people would still rather not be seen – you simply don’t want to have to worry about your appearance every time you answer the phone. Text-based chat (SMS/E-mail etc) may be gaining popularity at the moment, but I seriously doubt they will ever completely replace the need for voice conversation.
So we’ll always need voice calls, so we’ll always be stuck with the current phone network, right? well, the need to have a voice conversation is still there, but the communication network that carries it is changing significantly.
The original analog phone system was built on the concept of having a real analog circuit open between two lines – the audio signal is carried as a variation in voltage on the line, much in the same way audio is carried from your iPod to your headphones, or from your hi-fi to its speakers.
Back in the early days of the phone network, the entire system was built on analog circuits, originaly they were ’switched’ manually by operators – the operators would have a plug board in front of them, when you wanted to make a call you would pick up your phone and ask the operator to connect you, connecting you in this case really was a physical connection of one plug on the board to another via a piece of cable not unlike a headphone jack -> headphone jack cable. If you wanted to place a long-distance call, your local operator would contact the operator at the remote exchange and request that she connect her plug board to the destination party. The ringing of the phone was initiated manually by the operator at the destination end of the call.
Eventually complex relays became available that made the operators obsolete. However, these mechanical devices still operated on much the same principal as the operators – a physical analog connection was made between two lines by selecting out of a number of possible contacts. Instead of the operators making the connections by hand, the relays simply make them using electromagnets to move a selector arm around a 2-dimensional array of metal contacts. As well as switching, there’s another important concept at work here – the exchanges need to be told how to switch – information about the destination of each call needs to be passed from the original caller all the way through to the destination exchange – this is called signalling. In the early days of analog exchanges, this signalling was done by one person literally asking another to switch their call – first by the original caller asking their local operator to connect them to their destination, and in the case of a long-distance call, their local operator would then ask one or more intermediary operators to pass the call along. When relays replaced operators, this signalling was done by electrical pulses sent over the line rather than actual humans talking to each other.
When transistors replaced relays, the signalling was done with tones rather than electrical pulses – there was a transition period where phones were available with a switch to use either tone or pulse dialling. Pulse dialling was originally achieved using a rotor that reset its position back to zero when you stopped turning it. This way the number 9 would result in 10 electrical pulses, the number zero would result in one pulse.
The pulses that are sent down the line are actually exactly the same as hanging up the phone – a phone line consists of two wires, and when the phone is on the hook; those wires are simply conected together in a loop. On old rotary dial phones, the rotor is basically hanging up the phone and then picking it up again in a series of pulses. New phone exchanges still support the old dialling process (so old phones still work), and because of this even on the most recent normal fixed-line telephones it’s possible to dial the operator by simply tapping the hangup button twice in quick succession, then again two more times at a slower rate – the pulse code for the number 100. You could theoretically dial any number like this, but it would be rather laborious and error-prone.
When the signalling changed from one person asking another to connect the call to a series of automatic pulses; one pretty major mistake was made: The signalling was still done over the same connection as the actual voice call. This is fine when it’s my telephone signalling the exchange which number it wishes to dial, but it’s a little bit more of a problem when the signalling when it’s the signalling between exchanges.
When we moved from electronic relays to transistor-based switching the same mistake was made, the signalling was ‘in-band’, even between exchanges; this means that when one exchange is asking another to make a connection, it does it by sending a series of tones down the line, much in the same way your normal landline telephone signals the exchange by sending DTMF tones. DTMF stands for dual-tone multi-frequency – these are the tones that normal phones make when you dial a number, each tone basically consists of two different frequencies. The exchange interprets the combinations of frequencies as numbers and routes the call through a series of transistors accordingly. It’s basically the same process as with the relays, but transistors are used for switching instead.
One such inter-exchange signalling system is CCITT5 – this was used commonly on early transistor exchanges, the exchanges basically signal the calls to each other in much the same way as your normal land line phone signals the exchange now; through a series of tones with specific frequencies. The two key frequencies used for much of the signalling are there are 2400Hz (sieze) and 2600Hz (clear to send). The number 2600 has become so ingrained in hacking culture that you’ll see it pop-up all over the place (including Windows build numbers!).
The CCITT5 (or C5) handshaking process works like this; imagine a line between two exchanges that is not in use – in this state the line is completely silent. Now a call comes in on one exchange (we’ll call this the originating exchange) that needs to be routed via this line to the other exchange (we’ll call this the destination exchange). The originating exchange first sends 2400Hz (sieze) to indicate that it wishes to open the line. The destination exchange acknowledges the sieze request by sending 2600Hz (clear-to-send) – when the originating exchange hears 2600Hz it knows the line is ready and stops sending the sieze tone. The destination exchange recognises that the sieze is complete by ceasing the clear-to-send tone. The line is then silent again, but the destination exchange is now in a state where it’s ready to receive a destination number – this number could be for the telephone of a person who is connected to the destination exchange, or it could be the number of another exchange – in this way a call can hop from one exchange to another, and indeed this is how calls are routed long-distance. The originating exchange dials the destination number using a combination of two different tones for each digit in exactly the same way a modern phone. When the call is over a clear tone is sent, both 2400Hz and 2600Hz are sent together (this is called clear-forward) to indicate that the line is now idle and ready for another call.
Phone hackers learned to use this in-band signalling to their advantage – if you send clear-forward to a C5 exchange whilst you are in the middle of a call (even from your home phone or mobile), it thinks you’re another exchange signalling that the call is over and the line is dead, it will then accept a sieze followed by any dialling command from you, as if you were the originating exchange. This basically gives you the same control over the network that an operator would have had in the days of plug-boards. You can route your call anywhere, do anything – you’re a trusted user of the system. And the way you control it is with a blue-box – this is the name for the DIY dialling devices that phone hackers made in order to take advantage of this flaw in the signalling. It’s very much like the keypad on a normal phone (with a few extra keys), each key plays a combination of tones that can be used to perform any signalling operation on the remote exchange. The process of using a blue-box to sieze control of a phone exchange (possibly for the purposes of committing toll fraud) is called blue boxing.
As a phone hacker, the tone that you’d be looking out for is a sort of high pitched ‘cheep’ or ‘chirp’ noise – if you heard this tone immediately after the number you called had picked-up, you knew you had a C5 exchange on the other end of the line. The tone you’re listening for is actually a 2400Hz tone (in this context just called ‘answer’) – in other words; the destination exchange also uses 2400Hz to signal that a call has been successfully connected. Upon hearing 2400Hz from the destination exchange, you’d then play your own clear-forward tone (2400Hz+2600Hz), followed a sieze (2400Hz) and hopefully hear a 2600Hz clear-to-send response, after this you’re free to use the C5 tones to dial any destination number on the network (including another exchange).
For the sake of completeness, here’s a table of the different combinations of 2400 and 2600 tones:
| Signal |
Frequency 1 |
Frequency 2 |
| |
| Sieze |
2400Hz |
|
| Clear-to-send |
|
2600Hz |
| |
| Answer |
2400Hz |
|
| Acknowledge |
2400Hz |
|
| |
| Busy-flash |
|
2600Hz |
| Acknowledge |
2400Hz |
|
| |
| Clear-back |
|
2600Hz |
| Acknowledge |
2400Hz |
|
| |
| Clear-forward |
2400Hz |
2600Hz |
| Acknowledge |
2400Hz |
2600Hz |
These are the CCITT5 signalling tones that are used for actually dialling numbers:
| Digit |
Frequency 1 |
Frequency 2 |
Length |
| 1 |
700Hz |
900Hz |
55ms |
| 2 |
700Hz |
1100Hz |
55ms |
| 3 |
900Hz |
1100Hz |
55ms |
| 4 |
700Hz |
1300Hz |
55ms |
| 5 |
900Hz |
1300Hz |
55ms |
| 6 |
1100Hz |
1300Hz |
55ms |
| 7 |
700Hz |
1500Hz |
55ms |
| 8 |
900Hz |
1500Hz |
55ms |
| 9 |
1100Hz |
1500Hz |
55ms |
| 0 |
1300Hz |
1500Hz |
55ms |
| Code 11 |
700Hz |
1700Hz |
55ms |
| Code 12 |
900Hz |
1700Hz |
55ms |
| KP1 |
1100Hz |
1700Hz |
100ms |
| KP2 |
1300Hz |
1700Hz |
100ms |
| ST |
1500Hz |
1700Hz |
55ms |
A blue box is simply a home-made tone generator device with keys for each of the tone combinations listed above. Tone generator integrated circuits have been available for many years, this put blue boxing within the capabilities of any determined amateur with a soldering iron and a lot of patience. Using a blue box it is possible to place long distance calls for free, and indeed many people did use it for this purpose, however, blue boxing is more than that – blue boxing is phone hacking in its original and purest form.
You’ll notice there’s a few extra keys you wouldn’t have on a normal phone handset – KP1 and KP2 are used for selecting national or international (transit) routing respectively (KP stands for key prefix). Code 11 and 12 are used for calling the operator, and the ST tone is used to signal the end of dialing. Dialling starts with an optional key prefix, followed by a call options digit (used for example to select operator language), followed by the actual number to dial and finally ST to indicate the end of dialling. If you wanted to dial the national number 715-2550255, a C5 dialling sequence might look like this:
KP1-0-715-2550255-ST
Last I checked (~2001), blueboxing was still alive and well. In 2001, exchanges that used C5 signalling were still in existence in various parts of the world (mainly developing countries). There will always be ways to place calls to these parts of the world for free, in other words – while CCITT5 signalling still exists anywhere on the phone network, blueboxing will still be possible, although increasingly difficult. So difficult now that I doubt all but the most oldsk00l of phone hackers even attempt it. However, there are opensource implementations of the C5 standard available, so these days you can install your own phone server and bluebox legally to your heart’s content. Project MF is a C5 implementation for the Asterisk phone server, and you can dial into their test system on (US) +1-630-485-2995. There is also an implementation for Linux-Call-Router, and a test system is avilable from Blueboxing.org – you can dial into this one on (Germany) +49-4644-9737083 – try it, listen for the 2400Hz ‘answer’ tone.
As of 2001, finding exchanges in the wild that used C5 wasn’t too much of a problem – the problem was in getting the C5 signalling tones past intermediate exchanges. All the tones necessary for C5 signalling are blocked with increasing sophistication as soon as the call leaves the country – when you’ve found a C5 exchange, getting your signalling tones through to it in a form that will still trigger its switching behaviour is pretty much impossible without serious black magic. I don’t doubt that some are capable, not me. It won’t be long before C5 is completely gone from the network and blueboxing made impossible anyway.
Now pretty much all signalling is done out-of-band; this means the information about how the call should be routed is sent over a different channel to the actual call itself. The way that this normally works is that you have a separate line dedicated to doing the signalling for multiple actual phone lines – each signalling instruction also contains a reference to which line it refers. This is called common channel signalling – using a dedicated channel for signalling like this prevents you (as a telco customer) from being able to inject your own signalling information into the network and change the way your call flows through it. You can see why telcos would be reluctant to implement this – it requires that they have an additional line that cannot be used for actual voice calls. In the early days, that separate signalling channel must have seemed like an unnecessary hit to the telco’s bottom line. It was only once toll fraud as a result of blue boxing became a serious problem that common channel signalling saw wide deployment.
Common channel signalling is still in widespread use today – the example you’re most likely to have heard of is ISDN. No, I’m not talking about the type of ISDN that you used to be fobbed off with if you asked for a decent internet connection – that died with the dawn of broadband. ISDN-PRI is actually a signalling protocol still in common use by companies with many phone lines. In Europe, ISDN-PRI is typically carried over an E-carrier – commonly an E1 line. An E1 line consists of two physical pairs of wires (ie two normal phone lines, or four actual wires) to your local phone exchange. An E1 is a digital data link to your local phone exchange which gives a total of 2.048 Mbit/s of bandwidth in both directions (full-duplex), higher E numbers are also available which give more bandwidth.
In the US, a similar system is used called a T-carrier – this works on the same basic principals but gives a total bandwidth of 1.544 Mbit/s. You can read Wikipedia if you want to know the differences between the two standards.
In the days prior to common availability of broadband internet services, E1 and T1 lines were commonly used to carry internet data for companies who required faster than 56k/s internet access. They can also be used to carry point-to-point data – for example when a company needs to connect two offices together with a “high speed” link – rather than having two 56k modems connected to phone lines and having one modem dial the other, you can have an E1 line at both locations and transfer data between them at 2.048 Mbit/s rather than 56k. However, now that much faster internet speeds are available cheaply via DSL and cable, these uses for T and E carriers are all-but obsolete.
However, T and E carriers continue to survive for their original purpose – to carry voice calls. When they’re used for this purpose, the protocol that runs over the digital link is called ISDN-PRI. ISDN-PRI divides the link bandwidth up into a number of separate channels – in ISDN terminology these are either D-channels (signalling) or B-channels (bearer) – When ISDN-PRI is run over an E1, you get 30 B-channels and 1 D-channel – in other words the 2.048 Mbit/s of bandwidth is split into 30 separate voice channels of 64kbit/s and one common signalling channel also at 64kbit/s. Over T1 you get 23 B-channels and 1 D-channel. The bearer channels can be used to carry data or voice, but most commonly they carry voice using G.711 encoding. A single 64kbit/s channel in telecoms terminology is called a DS0.
The signalling channel (the D-channel in ISDN) runs a signalling protocol called Q.931 – although this is a digital protocol with perhaps more in common with modern internet protocols, there are definitely some similarities to the old C5 way. Here’s a list of some of the Q.931 messages copied from Wikipedia:
- SETUP (indicating the establishment of a connection)
- CALL PROCEEDING (indicating that the call is being processed by the destination terminal)
- ALERTING (tells the calling party that the destination terminal is ringing)
- CONNECT (sent back to the calling party indicating that the intended destination has answered the call)
- DISCONNECT (sent to indicate a request to terminate the connection, by the end that seeks to terminate)
- RELEASE (sent in response to the disconnect request indicating that the call is to be terminated).
- RELEASE COMPLETE (sent by the receiver of the release to complete the handshake).
Although these messages are sent digitally over the D-channel rather than as audible tones, you can see that they’re remarkably similar to the functions provided by the 2400Hz and 2600Hz tones in CCITT5.
In England at least, E1 lines running ISDN-PRI are the primary way most small/medium sized companies connect to the phone network – the same applies in America with T1 and Japan with J1. Most companies will normally have one or more E1 lines – in England, BT brand these as ‘ISDN30′, because you get 30 voice channels – it’s the equivalent of having 30 standard analog phone lines (except that they’re carried digitally over four wires instead of the 60 wires that would be required to carry 30 analog lines). However, that doesn’t stop BT charging you basically the same amount they would charge if you did have 30 individual analog phone lines.
In order make use of an ISDN30, a business needs to attach it to a PBX (Private Branch eXchange). This is normally a special piece of hardware which basically works like a minature telephone exchange – it’s the PBX that gives you local telephone extensions so that members of staff can call each other using a 3 or 4 digit extension number. PBXs often also provide voicemail, call forwarding, conferencing and various other features.
It’s the PBX that’s responsible for providing the physical sockets that each person’s desk phone would plug into. However, these are not actual phone lines in the normal sense of the term, in fact, often, although they may resemble phone sockets, they may be running a communication protocol which doesn’t even closely resemble the one that a normal analog telephone would use. Because a PBX’s internal extensions and the phones that go with them are normally made by the same company as the PBX itself, there’s no requirement that they be compatible with normal phones, some are, but many are not. In fact, many modern PBXs don’t even provide phone sockets at all – the phones simply plug into the same network sockets as the computers, and all of the company’s phone calls pass over the same local area network as their internet traffic (but more on that later).
Having a PBX has many advantages; a company may only have 30 channels available on its ISDN line, but this does not limit them to only having 30 extensions on their PBX, you may have as many internal extensions and phones as you require. In fact, it’s conceivable that you could run a PBX without ever connecting it to the proper phone network – however, on such a system you would only ever be able to place internal calls. ISDN channels are only used when a call to (or from) the outside world is actually in progress – in other words, you may have 1000 internal extensions, and theoretically if they were all used for internal calls you could use all of them at once (although most PBXs would probably die if you tried that), however, if you only had one E1, a maximum of 30 of your staff could be on calls to the outside world at once – any more than that and you get a busy tone.
The other thing to note is that ISDN lines are completely decoupled from actual phone numbers – you could have 30 ISDN channels but only one actual incoming number (common in call centres), or indeed only 30 channels but many hundreds of incoming numbers (common in large companies where each member of staff has their own direct dial number).
So that’s the state of play currently – most companies have one or more E1 lines and a dedicated, specialist piece of hardware called a PBX that handles all of their internal phone lines. PBXs are great things and the business benefits of having one are numerous. However, there’s a problem – PBXs are sophisticated bits of hardware, and they’re expensive, very expensive. The least you can reasonably expect to spend on one is about £10,000 and that would be for a very basic system and a company of approx 15 staff with phones. As the size of the system increases; the cost rapidly escalates, a PBX is a major infrastructure cost for most businesses.
This brings me nicely onto the future of telephony – VoIP. However, at the heart of VoIP is perhaps something even more important. The move to VoIP is a product of the fact that it’s now possible to implement an entire PBX system in software and have it run on commodity hardware. The benefits of using commodity hardware are significant for the PBX vendors – rather than having to spend lots of money on RnD for hardware designs, a big section of the hardware development team can be replaced with software development; it’s a lot cheaper to do software development than hardware development, and implementing the majority of the system in software allows the PBX vendor to rapidly respond to changing requirements.
One side-effect of this move to software-based PBXs is that it suddenly makes a lot more sense to carry the actual internal calls over the company’s LAN rather than have the PBX provide its own array of phone sockets. Normally on an old style PBX, the PBX itself would have a row of sockets on the front of it which would be connected via patch panels to the actual wall sockets by people’s desks. These wall sockets are almost always wired with Cat5 cable and terminated with RJ45 sockets – this setup is good for both phone lines and network sockets (a US style RJ11 phone jack will still fit into the larger RJ45 network sockets), the difference being that phone sockets were patched in to the company PBX and the network sockets would be patched into a normal ethernet network hub or switch. When your PBX is running software, it suddenly becomes much more sensible to just have all of your sockets wired in to an ethernet network and have the PBX communicate with the phones over that. This simplifies the actual physical wiring of the network and removes the need to have any bespoke switching hardware in the PBX itself – the PBX is effectively becomes just a normal server in a rackmount enclosure.
At one of the companies I’ve previously worked at, when I arrived the system they were running was a normal server in a tower case with an E1 interface card – this was just a PCI card that allowed the server to connect directly to the incoming ISDN30 line. In a lot of cases this is all you need to effectively provide everything that an old PBX would do. Many companies will opt to purchase separate power-over-ethernet (PoE) hubs to use for the phone sockets – using a PoE hub will allow you to power each desk phone via the ethernet cable rather than having to plug-in separate power adaptors. The ethernet sockets that a PoE hub provide are exactly the same as normal network sockets except for the fact that they also carry a significant amount of power. If an entirely separate ethernet switch isn’t used for the phones, VLAN tagging often will be – good security design dictates that the phones should be separated in some way from the normal PC network.
The phones themselves on a VoIP network are actually proper network devices with their own IP address and in many cases a web server and control panel application on each phone to allow you to configure them. Both the signalling and the actual voice call are carried over the internet protocol (IP) network. This is an important step away from the original architecture of the public switched telephone network (the PSTN). Until now, the PSTN has always been built on the concept of a circuit-switched network – in other words, when a call is placed from one point to another, an actual open circuit exists between those two places allowing voice communication in either direction. Setting up a call consists of a process of actually switching circuits like in the days of plug boards, it’s just done digitally these days. However, the move towards VoIP means that our voice calls are going to be carried over ethernet, and ethernet is a packet-switched network.
If there is one feature of the core technology of the internet that played the biggest part in its success, it’s the packet switching. The idea with packet switching is that everything is sent over the network as a small packet of data. Each packet starts with a header containing the source and destination of the packet and various other meta-information – this is then followed by the packet payload; the data itself. If the amount of data that needs to be sent exceeds the size of a packet, it is split into multiple packets such that the size of an individual packet never exceeds a fixed amount. If a stream of data needs to be sent (as in the case of voice telephone call) the stream of data is still divided into a set of small packets which are sent out one after another.
The benefits of sending your data encoded in packets like this aren’t immediately obvious until you consider the impact this has on the way that switching can be done. With a circuit-switched network like the PSTN, switching only happens at the beginning and end of each call, for the duration of the call itself the circuit remains open and no switching happens. However, with packet switching, because each packet of data can be treated independently; switching can happen on a per-packet basis millions of times per second. This has the spectacular side-effect of allowing you to use the same line to be connected to many different destinations at once – it’s packet switching that’s allowing you to have an instant messenger and a web browser running at the same time and sharing the same internet connection despite the fact that they’re connecting to different destinations – doing that on the phone network alone is not easy. The really great thing about packet switching is that you can still send normal streams of data over it – it’s like being able to be on the phone to 100 different places at once using only one phone.
Packet switching also allows you to make better use of your bandwidth because an entire line is rarely being fully-tied up by a single connection in the way that it is with a circuit switched network. It’s these properties that made packet switching one of the key enabling technologies for the evolution of the internet, and lead to ethernet becoming ubiquitous on internal company networks. Because ethernet is now so ubiquitous, using it for normal office desk phones instead of dedicated phone sockets is a very sensible and real possibility – the hardware inside the phones has advanced to the point where it’s quite reasonable to expect them to run a web server and other layer 7 protocols – VoIP phones are now comprable in price terms to their equivalent non-VoIP counterparts. For example at the time of writing it is possible to get a decent VoIP desk phone for approx £75 and a decent basic handset for £40 – this is about what you can expect to pay for non-VoIP phones with equivalent features.
Once internal calls are being carried over ethernet the next logical step is to rid yourself of the E1 line and have telephone calls routed over the same line as the internet connection. This effectively allows a business to replace its expensive E1 or T1 leased-line with a normal business class DSL line. DSL will typically handle more concurrent calls than an E1 as well because it’s not limited to 2.048 Mbit/s. Because of the internet, we’re no longer limited to routing our calls to whichever telco happens to provide E1 lines in our area – as long as we have a decent connection to the internet, we can route our telephone calls over that to one of many VoIP providers. Routing the call over the internet allows it to be sent back onto the PSTN at the closest possible point to the destination number, meaning you can often benefit from local-rate calls in the vast majority of countries in the world (this is achieved with least-cost-routing).
A VoIP provider will take care of least-cost-routing for you (although you can use multiple VoIP providers and set up your own least-cost-routing to use whichever’s cheapest for each call). A VoIP provider will also offer incoming numbers from a variety of countries and various different incoming local rate, toll free, premium rate numbers as well. The way this works is that when a call is made to one of your incoming numbers, the VoIP provider makes a connection over the internet to your phone server and streams the call over that. When you want to dial out to the phone network, your phone server makes a connection to the VoIP provider, says “can I dial this number please?”, sends the number and then waits to be connected, both the signalling and the voice call itself are carried as streams over the internet.
Notice how I’ve switched from saying PBX to “phone server” – that’s because modern PBXs aren’t really specialist hardware anymore, they’re just normal servers like any other – if we’ve done away with the E1 interface because calls are being sent over the internet, we’ve done away with the need to have an array of sockets on the front because the phones are connecting to the server over the LAN instead of being connected directly to it – there’s actually no specialist hardware left in the PBX at all, it’s all done in software over a normal IP network.
There are a number of software-only PBX systems cropping up, some of which are free and open-source. The most complete and well-known of these is Asterisk. Asterisk basically provides pretty much all of the features you’d get on an enterprise phone system costing tens or even hundreds of thousands of pounds, but Asterisk is free and can be installed on any normal server. In fact, an entire server isn’t even necessary these days – at Digital Crocus we’re currently running our phone system as an Asterisk installation within its own Ubuntu Server installation which is running as a virtual machine under VirtualBox on one of our virtualization nodes. We were originally using Solaris as our host operating system for VirtualBox but have recently (happily) moved to Ubuntu Server for this purpose. We’re still using good old FreeBSD for our normal web hosting environment however.
Running Asterisk under virtualization probably isn’t to be recommended at this stage, neither Asterisk nor virtualisation technology are mature enough to work flawlessly with each other – we had some pretty severe timing issues under a Solaris host which haven’t gone entirely by moving to the Ubuntu host although they are significantly improved – we also got some joy by tweaking the kernel timer options under the guest operating system.
Annoyingly, even if you aren’t planning on hooking up your Asterisk server to an ISDN termination and consequently don’t actually need any specialist hardware drivers, you still seem to need a load of Asterisk stuff loaded into the kernel in order to make the timing work properly. I’m guessing this is a legacy from the fact that Asterisk was built to always have the Zaptel (now ‘DAHDI’) ISDN drivers loaded and rely on them for timing information, then when the Asterisk developers came to realise they’d want to run Asterisk without having a physical ISDN interface; their code already relied heavily on the ISDN timing information which can’t be easily emulated without putting some code in kernel space.
Anyway, the upshot of it seems to be that Asterisk relies on a kernel module for accurate timing information and without it (or with it being wrong) any playback of audio sounds garbled and shit. Asterisk will run without the kernel timing driver but doesn’t seem to work very well, certainly not under virtualisation. With the Asterisk DAHDI driver loaded, under an old version of VirtualBox on Solaris the timing information was completely bolloxed and sounded worse than using no kernel timing driver at all. Some joy was had by tweaking the kernel clock source but playback quality was still poor. Moving to the latest version of VirtualBox and an Ubuntu host OS improved things drastically and playback quality is still jittery but within acceptable limits.
There are other problems with Asterisk, some of which are the kind that a certain kind of geek revels in; There’s no decent graphical configuration engine – you can either configure it with a very basic web control panel, or it’s manually editing config files I’m afraid. Asterisk’s config files are sorta medium complex – not really any worse than most daemons like Apache or Bind for example. Asterisk’s configuration isn’t quite as bad as Exim’s or (eew) Sendmail’s – in other words you don’t need to be a total guru to set up a simple working installation of Asterisk.
However, the real troubles come when you inevitably encounter some bizarre and unexplained problem – these kinds of problems crop up on every system, but Asterisk certainly has its fair share. The problem with Asterisk is that it doesn’t make debugging problems easy – that’s not to say it doesn’t give you plenty of debugging output, it can and will, it’s just that interpreting this debugging output is pretty difficult to read for the novice – it’s a hybrid of computer terminology and phone terminology – because these two technologies have grown up semi-separately, they’ve developed their own sets of techie lingo. In Asterisk the two are sorta mashed together in such a way as to make it necessary to understand both sets of lingo fairly well, as well as have a fairly good understanding of how Asterisk actually works internally before you really stand a chance at understanding any of the debugging output.
What I’m getting at really is that in order to get the best out of Asterisk, you basically need to become an Asterisk developer – you need to have a good understanding of how it works and not be afraid of fiddling with it until you get the results you need. That’s not to say that it’s particularly hard – I would think that any fairly experienced system administrator would be able to gain a good understanding of Asterisk within a few months of playing with it. However, it does require a significant investment of time.
If you’ve got the time to learn it, or you have someone on the team who already knows Asterisk, you can basically have an enterprise-class phone system for the cost of one server and a few hours work, and then a nominal fee for each handset. Asterisk puts enterprise-class phone systems into the hands of tiny companies, if they have the necessary expertise. That’s basically what I’ve set up for Digital Crocus – an Asterisk server in a virtual machine with various incoming numbers (0845s in the UK as well as a couple of regional numbers). My direct line is 01273929209 – if you call this number your call will first be routed to our Asterisk server in telehouse, our Asterisk server will then try first my VoIP phone, and then if that’s uncontactable or there’s no answer, it’ll try my mobile. All of this is done transparently and all you’ll hear while this is happening is a ringing tone. If you call one of the 0845 numbers for Digital Crocus, you get hold music.
As for phone handsets, there’s two VoIP handsets that I’m familiar with, they are the Grandstream GXP2000 – this is a pretty standard office desk phone – it supports up to 4 concurrent calls, has a fairly decent speakerphone and offers all the functionality you’d expect from a desk phone and costs about the same as you’d expect to pay for an equivalent non-VoIP phone. There are some known issues with Grandstream firmware and I did find myself having to flash these phones quite a lot, however, I don’t have much of a problem with that – they’re complex devices, VoIP is new technology and it’s not quite stable yet.

Grandstream GXP2000
I’ve also recently bought myself a Siemens A580IP DECT phone for home use – this is a standard cordless phone – it comes in three parts; the phone itself, the charging station and the IP base-station. This is handy because it allows me to plug in the base-station part to my ADSL router downstairs but still have the charging station upstairs where I want the actual phone. The handset itself is an unremarkable DECT cordless handset – it feels fairly well-built and the call quality seems fine.

Siemens A580IP
However, any VoIP phone will suffer if it’s sharing bandwidth with P2P file sharing software like Bittorrent – under such conditions call quality will degrade noticably. Fortunately this isn’t too much of a problem for me because it’s rare that I use file sharing software – on the occasions when I do need to make a VoIP call and my ADSL line is saturated, I can always quit whatever’s using it until the call is over. However, a better solution to this problem is to use a broadband router that supports priority queuing – basically the ADSL or cable modem needs to know that it should prioritise VoIP traffic over everything else. In other words, no matter what else is in the queue, VoIP packets should always jump to the front of the queue. The easiest way to achieve this is to use a router with VoIP built-in – Draytek offer a number of excellent routers with VoIP capabilities.