Keeping Broken Voice Data

Corrupted Speech Considered Useful is a research paper that talks about VoIP and transmission errors. 2003 may seem like a long time ago (especially in the life of VoIP) but it’s still highly relevant in referencing AMR, a codec which is now integrated into 3G mobile phones.

AMR is clever in that it is resilient to corruption. And, in the world of wireless technology that we now inhabit, corruption through bit errors is inevitable because the radio waves are a noisy place and wireless networks are much less reliable than wired networks.

AMR has a data-partitioning scheme, where parts of the voice stream can be classified as A, B and C. The most sensitive parts of the voice stream are put in A, and less sensitive in B and C. If A is corrupted, then most likely the whole packet of speech will be irretrievably damaged, but if the errors happen in B or C, all that will happen is a momentary drop in quality.

VoIP systems mostly use UDP, a system for sending datagrams across IP networks. Unlike TCP (used by the web, email and so on), UDP packets can get discarded along the way if the network gets congested. However, there’s a problem with this for multimedia, since you want the speech fragments or video frames to arrive in a timely manner. A video frame that arrives eventually (say after a few seconds) is of no use to the receiver, because the video has already moved on.Alexander Graham Bell making a call. Caption: "I don't think UDP-Lite is widely adopted yet"

In the paper, they talk about UDP-Lite, which takes the idea of UDP to the extreme. Normally, the sender does a checksum calculation over the whole packet contents, and adds this to the packet. The receiver does the same calculation and if it gets a different answer it then discards the packet, because one or more bits changed en route and it’s therefore corrupted. UDP-Lite doesn’t do this check, but rather passes the packet to the application (or codec), broken or not. In the context of AMR, this is a good thing, since it knows that the ‘A’ bits are most sensitive and can do its own checks to see whether the speech is ‘good’.

In short, UDP would drop the whole packet for one single damaged bit, whereas UDP-Lite allows a slightly broken packet to be received and processed. It is likely in most cases that the speech will sound better with a handful of corrupted bits rather than 400 missing bits. Although modern codecs have packet concealment algorithms using interpolation, having more of the real voice data is generally better.

This is a pretty powerful idea, and it also extends to video, where now commonplace codecs like H.264 and MPEG4 also have error-resilience features built-in. UDP-Lite doesn’t seem to be widely in use, even after the 8 elapsed years since this paper, but surely now is its golden moment, given the importance of wireless and multimedia?

As you probably guessed, this is an area that Voxygen is actively exploring…

Posted in Digital Communication, Mobile, Multimedia, Voxygen Tech

150 Years of Digital Communication

Morse Code, possibly the most famous communication code ever, was part of an early telegraph system invented by Samuel Morse, described in US patent number 1647, and developed further through the 1840s.

Morse Code exploits the fact that letters of the alphabet do not occur with equal probability in written language. Legend has it that Morse and his colleagues analyzed a local newspaper in Morristown, NJ to find the statistical probabilities of letters in written English. By picking common letters to have short codes, it is possible to gain some compression in sending a message, a principle that is now used in entropy compression algorithms like Huffman. Morse represents the most common characters, E and T, as . and -, where, for example, the less common Y appears as -.–.

Wireless Station from the 1940sMorse Code proved remarkably adaptable, and went on from wires to radio and light. It has been adopted in automated systems as well as for human-to-human communication. Its applications have included news reporting, military and life-saving. This year is the centenary of the sinking of the Titanic, which famously reported its distress using the disaster signal CQD (-.-. –.- -..).

Direct-dial voice communication is now commonplace and digital systems have moved on to much more sophisticated and faster signalling systems. Since 2003, the ITU has not required radio operators to be familiar with Morse, which reflects the unlikelihood that you would happen upon a distress call while searching the radio bands. Although Radio Amateurs can (and do) use Morse, it is no longer a condition of the license to undergo a Morse test.

Morse Code’s window is closing now, but for any technology to survive for over 150 years, especially in communications, is quite something.

Posted in Digital Communication, Industry

VoIP, Hogs and Chopped Corn

At Voxygen, our business is very much in the centre of making communications work, so technologies that help to measure and visualize effectiveness are always of interest. A few days ago we got to spend some time with the splendid staff at Malden Electronics, and receive some coaching on their speech quality measurement system, MultiDSLA.

The task in hand was to inject some audio into a smartphone during a VoIP call, and measure the output in terms of the PESQ and POLQA quality measurements. PESQ has been used for decades to measure speech quality, so is widely understood by telco engineers. POLQA is a newer standard, which has the benefit of better understanding telecom innovations like packetized voice (e.g. VoIP) and wideband audio.

Here you can see here a typical plot, where the input (reference) signal is at the top; output in the middle, and the bottom is the error or difference signal between them. The audio is made up of standardized recorded phrases (known as Harvard sentences), so that you can recreate the same test, and even automate a test set. For example, a phrase used in these tests was “The hogs were fed chopped corn and garbage.”

We looked at a couple of distortion effects that were bringing the quality figures down: one seemed to be due to a missing fragment/packet, and the other more complex to do with automatic gain control. Missing packets are a fact of life in VoIP, which tends to use unreliable (UDP) transport, therefore packets can be dropped when competition for bandwidth gets high. Voice and video are real-time streams, where it’s more important that the packets are delivered in a timely fashion than that 100% of the packets arrive (or arrive in the right order). Packets that are late are junk that needs to be discarded. Gaps can often be covered up by error concealment processing in the receiver, and in fact modern codecs like AMR (used in 3G) and G.718 send extra information with the speech to help the receiver to do a more effective job if and when it needs to conceal a gap.

One point of note is that the POLQA figures were consistently higher than PESQ for packet audio, as POLQA knows about gaps from missing packets, and is able to realign the data after a gap. PESQ on the other hand, dating from the days before digital,
doesn’t understand gaps, and treats the slight misalignment between the reference signal and the output as a continuing error through to the end of the call.

The call quality (to the ear) was actually excellent for the tests, even the ones where we were analyzing the reasons for a lower quality score. Quality as perceived by the user is actually the “gold standard” in audio (and in video for that matter), but you can’t always have humans on hand to check the quality. The job of automated tools is to try (as far as possible) to get a machine to create an objective score that is similar enough to the quality as perceived by a human.

Posted in Industry, Mobile, SIP, Voxygen Projects

Visualization: from Cholera to Google APIs

If you haven’t seen any of Hans Rosling’s TED Talks, you should take a look. He’s an entertaining speaker, but importantly he uses a very powerful visual tool to get his point across, the Trendalyzer, that was created by his own organisation, Gapminder.

Broad St cholera clusters (annotated map)

Broad Street Cholera Clusters

Data visualization has a surprisingly long history. Dr John Snow was said to have created the science of epidemiology when he used a map to show how cholera deaths clustered around the water pump in Broad St, London in 1854.

This visual aid speaks to the power of visualization: the human brain is built to understand pictures, and Snow had found a way to communicate with decision-makers in a way that they could understand in an instant: the cholera was coming from the water.

William Playfair had invented both the Bar Chart and the Pie Chart in around 1800, but it wasn’t until the Crimean War (1853-56) that Florence Nightingale popularized the use of the Pie Chart, using it to tell the story of the causes of mortality. It seems that the 1850s were some kind of “golden age” for visualization; it’s interesting how some ideas are of their time, and things will get invented, if not by one person then by another.

Returning to Hans Rosling: take a look at this talk about the developing world. This illustrates a couple of interesting things about data visualization, namely that: (1) Hypothesis creation and testing is easier with a visual tool, and (2) pictures can tell a story to a wider audience beyond the researchers.

Rosling is taking multivariate data here (i.e. data with multiple variables at each snapshot), and using graphics and animation to help us to understand the underlying story. One dataset shows us life expectancy vs. average income over time for multiple countries, with visualized population size. But it didn’t look hard the way he presented it, did it?

Magic Table chart from Google

Magic Table chart from Google

If that chart grabs you, you’ll be happy to hear that it’s now part of the Google Chart API as the motion chart. With a few lines of Javascript it’s possible to construct all kinds of powerful alternatives to the classic pie chart or Cartesian graph: from the mystifying parallel coordinates graph to the rather beautiful magic table that magnifies the data under the cursor with a fisheye lens.

If you’ve got a complex story to tell, it’s well worth seeking out a novel visualization to help cut through the fog. The human brain has an amazing facility with pictures, and far from the triviality of “adding colour” to a presentation, visualization can mean the difference between getting the point and glazing over.

Posted in Industry

Battery Power and Security

In mobile phones, power is everything. There’s a fixed power budget defined by the size of the battery, and if your battery is getting low, there are only two choices: use less or find more. People demand a lot from their smartphones, and it’s a common sight to find lost souls at the world’s airports looking for a power socket so that they can suck out a few more ergs to send them on their way.

Security is also a concern, and since you’re communicating over wireless you want to know that your voice (or email) is secure at least against casual interception. Encryption uses power, so it is necessary in the mobile environment to balance power usage against security.

An interesting paper that discusses this is Battery Power-Aware Encryption by Chandramouli, Bapatla, Subbalakshmi and Uma. Here the researchers experimented with the DES encryption algorithm, allowing a different number of rounds of encryption to run depending on how much power was available. In DES, you can get higher levels of security by doing more ‘rounds’ of calculations, but each round of course uses CPU, which in turn means power. There is a direct trade-off between encryption strength and power usage. So this brings the interesting idea that within a fixed power budget, it’s possible to share out the available encryption rounds across the data stream (or voice conversation). At a particularly sensitive moment in a conversation, you could apply more rounds, and at other moments use less: an unequal protection scheme.

The research shows that measuring an unequal protection scheme against the a ‘fair’ scheme (where the rounds are shared out equally across the whole stream), they can increase the security level by a factor of 28, or 256, against a “brute force” attack, where the attacker/interceptor does not know anything about the encryption key.

Away from the pure research lab, you would need to create an application that in some way understands the sensitivities within the data, so that it can ask for varying levels of security at different times. At the simplest level, you can imagine having a “secure button” to press at crucial moments, but how about varying the security on a second-by-second basis?

There are some cool application ideas in this that Voxygen is currently working on (and more on these later).

Posted in Design Thinking, Industry, Mobile
← Older posts