The Wild West – Using the Public Internet as a Video Backhaul Solution

October 28, 2013, 12:54 by Chris Perry

(Chris Perry is TelVue’s Director for Systems Engineering)

Over the past six or so months I’ve spent an increasingly large portion of my day talking about how to backhaul video across the public Internet.  Before diving in let’s clarify what I mean by backhaul.  In situations where you are shooting a game, parade, or other event in the field you have to transmit that video back to the station or whatever demarcation can get you back into your broadcast system.  Traditional methods such as Satellite uplink and Microwave transmission have an extensive amount of testing, research, and development behind them; however there are two main downsides: cost and ease of use.  Sat transmission prices start around $5/minute and require a trained engineer to setup and maintain.  Microwave requires either an FCC license (for reliable transmissions) or unlicensed transmitter/receiver (more susceptible to interference); not to mention that you need line-of-sight or multiple hops in order to establish a connection.

Enter the Internet – prolific, speedy, and cheap.  Sounds great, doesn’t it?  And in many cases the Internet can prove to be an important part of a remote backhaul solution where traditional means are not available or affordable.  However the Internet does not come without its own set of challenges for video transmission.  In this blog post I hope to address some of the issues and strengths with using the Internet as a live video backhaul solution, as well as some of the products that I’ve tested recently and my findings from those tests.

Let’s begin with a bit of information about the Internet itself.

Internet Hops

The Internet is a vast collection of ever changing networks that interconnect and pass data from point to point with a varying number of necessary hops in the middle – and it’s a big hot mess.  There are some network administrators that will even tell you that the Internet, as it currently exists, shouldn’t work.  Several projects have attempted to map the public Internet (also sometimes referred to as the commodity Internet) and the results are nothing short of beautiful works of modern art: http://www.opte.org/maps/.  But therein lies the first problem with the Internet as a method of reliable video transmission — It was never built for it.  The “traditional methods” I talked about above are based on a true point-to-point transmission path; the Internet is based off whatever hop is next in succession.  A real world way to see this is using a trace route – for example:

Windows:

Open the Command Prompt: Start -> Run-> “cmd” -> Enter

in the Shell type: tracert google.com

On Mac:

Open the Terminal in Applications -> Utilities

in the Shell type: traceroute google.com

If you are running these prompts correctly you will see a series of servers and their relative “distance” (measured in milliseconds).  “Distance” here is relative – a hop between two points thirty miles apart could have a very high latency, yet a trans-atlantic fiber cable could have very little.  This is due in large part to the other traffic on the Internet causing “congestion” and thus interfering with other traffic by slowing it down.  Here is the output of a traceroute from my Mac while on a Verizon MiFi connection:

——————————————————————————————————–
MacBookAir:~ CPerry$ traceroute google.com

traceroute: Warning: google.com has multiple addresses; using 173.194.43.32
traceroute to google.com (173.194.43.32), 64 hops max, 52 byte packets

1  172.20.10.1 (172.20.10.1)  2.738 ms  2.201 ms  2.005 ms
2  193.sub-66-174-20.myvzw.com (66.174.20.193)  30.770 ms  34.777 ms  32.890 ms
3  49.sub-69-83-13.myvzw.com (69.83.13.49)  30.954 ms  39.276 ms  32.279 ms
4  194.sub-69-83-13.myvzw.com (69.83.13.194)  30.803 ms  33.884 ms  37.763 ms
5  * * *
6  101.sub-66-174-17.myvzw.com (66.174.17.101)  119.669 ms  34.654 ms  26.834 ms
7  * * *
8  0.ge-5-0-0.xl3.bos4.alter.net (152.63.18.193)  125.600 ms  31.569 ms  35.709 ms
9  0.xe-6-1-2.xt1.nyc4.alter.net (152.63.0.166)  41.893 ms  40.676 ms  39.657 ms
10  * * *
11  * * *
12  72.14.238.232 (72.14.238.232)  141.401 ms  39.185 ms  41.594 ms
13  72.14.237.254 (72.14.237.254)  46.898 ms  44.301 ms  47.663 ms
14  lga15s35-in-f0.1e100.net (173.194.43.32)  43.034 ms  45.602 ms  45.080 ms

——————————————————————————————————–

You can see in this instance there are 14 hops from my MacBook Air to a Google edge server.  Some of those hops (5, 7, 10, 11) couldn’t be identified by the traceroute program (most likely because ping would be disabled).  This means that if I were to send a live video stream from my Mac to Google it would have to pass through at least 14 hops!  Far from ideal for live video requiring low latency and high quality.  Take a few minutes and trying doing a traceroute from your computer to an Internet site- you may be surprised by how many hops you have to go through to get there.

With this baseline established we can now look at how Internet backhaul systems can work.  Your basic system end-to-end looks sort of like this:

Video Source —> Encoder –> Internet –> Decoder –> Video Destination

Encoding Formats

As such, it obviously begins at your remote site; you’ve got a camera, or a camera and switcher system setup, and you have a baseband video signal output from that switcher or camera.  In order to transport that across a network you have to encode that signal.  An encoder is generally a hardware box that has an A/V or SDI based baseband input and a network port for video output; some systems such as TriCaster have built-in live encoders that don’t require the use of an external box for certain streaming formats.  Regardless, these encoders are going to digitize the video into a transportable format.  The most easily transportable format for Internet video is the widely available H.264.  Based off a mid-latency, high compression, high quality encoding theory, H.264 is by far the most common encoding format for video on the Internet.  However once you encode the video it then must be wrapped into some sort of container for actual transport.  Containers include: Transport Stream over UDP (MPEG-2), RTMP (Flash), RTP, HLS (HTTP Live Streaming), HDS (HTTP Dynamic Streaming), among others.  The encoding and wrapping steps are often one in the same – however I want to make a distinction here as I will be discussing re-wrapping later on.  For TelVue’s CloudCast platform we use a combination of RTMP, HLS, and HDS to support live web streams to desktop and mobile players, depending on the application.

Drilling down here:

  • Transport Stream (UDP) does an inherently terrible job with open Internet video transport.  However UDP is low latency and low overhead (actually because it includes no resend capability, which adds both latency and overhead.) These two factors mean that it’s optimal for closed networks such as municipal WAN connections or LAN’s because it be Multicast across a network.  Multicast is an extremely powerful tool for sending a single video stream into a network and being able to pick it up anywhere, but it is not intended for distributing streaming video across the public Internet.
  • RTP is based off a counter that gets added to the video – sort of like timecode – that puts video frames back into the correct order in order to ensure smooth viewing.  RTP can do very well in situations where you have guaranteed bandwidth, but can’t recover streams that flat-out vanish or suddenly encounter periods of major packet loss as it does still technically rely on UDP underneath.
  • RTMP (Real Time Messaging Protocol) can come in a number of different flavors, but typically relies on TCP.  TCP adds a layer of abstraction to the packetized video stream providing an acknowledgement of packet reception that helps ensure that the stream arrives correctly – if a packet is lost the TCP protocol facilitates its retransmission.
  • HLS (HTTP Live Streaming) and HDS (HTTP Dynamic Streaming) are Apple and Adobe’s respective answers to live video transport across the Internet.  The trick here is that these formats are treated much more like VOD (Video on Demand).  HLS and HDS actually make small files, say 5 or 10 seconds long, that are download in succession and stitched back together on the decode device/viewing end.  The result is that the decoder is now responsible for downloading all these little files and if one gets lost you could lose 5 or 10 seconds of video, but not an entire stream.  Smaller file “chunks” mean you would lose a smaller piece of the stream.  These formats also typically mean higher latency, which is not a problem if you are sending a fully produced feed back to your station/demarcation, but if you need to have a “reporter talkback” with someone in the studio, 30 seconds of latency would be an insurmountable delay to overcome.  HLS and HDS also have a format for ABR (Adaptive Bitrate).  When you encode ABR based HLS or HDS you are actually doing multiple simultaneous encodes at different bitrates/resolutions/qualities – those are “key-frame aligned” allowing the decoder/viewer to jump dynamically between the various streams as their download bitrate fluctuates.  As a result HLS and HDS have become the standards for delivery to mobile devices as bandwidth on cellular networks tends to fluctuate dramatically based on a number of factors including: concurrent usage, coverage areas, and signal strength.  ABR streaming usually requires higher-end encoders, but can provide a better overall quality of experience for the viewer.  See the diagram below.

HLS-HDS

Enter FEC or “Forward Error Correction“.  FEC is effectively a stream wrapper that contains a method of recovering lost packets.  FEC is to packets what RAID is to disks; by adding some additional data (usually on the order of 10-20%) to the stream of packets, it makes it possible to lose a packet and still recover all the original data mathematically.  Just like RAID5 makes it possible to lose one disk in many and recover all the data from that disk, without having to use twice as many disks.  Originally used for satellite distribution models, FEC was introduced to ensure that live video arrives without errors, but more importantly without the use of a “feedback loop” (as a request from a sat receiver to a sat transmitter via the same transmission path would be impossible.)  This again, puts the burden on the decoder to stitch the video back together and deliver a final product – delayed slightly to account for the error correction (although latency is typically very low on FEC).  Sounds great?  While FEC is integral to sat transmission the latency and “bursty-ness” of the Internet causes FEC algorithms to go crazy and would be unable to keep decoding the video in real-time.  There are a few companies that have harnessed FEC in some very creative ways for streaming video across the public Internet – I will discuss those later on in this post.

More Factors

So now that you’ve encoded your video you need to transport it across the Internet.  And since the Internet is everywhere, this should be a piece of cake, right?  Well… yes- and no.  There are all sorts of considerations when connecting to the Internet; you have firewalls, bandwidth constraints, cellular networks, bandwidth caps, and remember, all those hops in between where you are and where the video is going.  Any time that you plan to stream video for an event you should plan on doing a site survey, as well as an end-to-end fax (facilities) check.

  • Firewalls – For the most part pushing a stream through a firewall is not a big deal.  Some retransmission methods require a feedback loop to recover lost packets and thus require you to have control over the firewall you are behind (so a Starbucks WiFi hotspot would be problematic).  You also need to have control over the firewall at the decode end to correctly route the traffic to the decoder/server at the studio/demarcation.
  • Bandwidth – Enough bandwidth available to stream is probably the single biggest part of the streaming equation.  Unfortunately it’s not as simple as saying “well I have a 50Mbps x 50Mbps line from my ISP (Internet Service Provider).”  Here again, we are slaves to the hops in between having enough bandwidth to transport your stream back to the studio/demarcation.  For the most part this isn’t a problem as the links are at least a gigabit (in reality 10 Gbps is the minimum size link for major ISP’s), but during peak hours or special circumstances these links can become saturated and thus latency goes up.  When latency goes up the chance of packet loss also increases, thus the chance your stream will arrive intact is diminished.  For those interested in Internet routing further read up on “Border Gateway Protocol” (BGP).  Sometimes it’s possible for you to find an alternate route with less packet loss to your destination, but this frequently requires days of work, countless phone calls, and hearing lots of “no.”
  • Cellular networks and Bandwidth caps – As I mentioned above cellular networks fluctuate based on a number of factors – not the least of which is usage.  In metropolitan areas where there could be a huge number of people on the cell networks quality of service becomes a huge issue.  A case in point: after the Boston Marathon bombings there were so many people using their cell phones in the city of Boston the network buckled under the pressure and the ability to get a call through was greatly diminished.  This was also seen after Hurricane Sandy (not so much due to usage but due to damage), and many other major events in recent history.  Sports stadiums have added sometimes hundreds of micro cell towers to their facilities to handle demand during games and other events.  All of this said – cell networks in the US are highly tuned for data transport – and provided you have a good signal in an area that is not over-saturated with other data users, cellular can be a great way to “go live from anywhere.”  Of course you have the dreaded ‘data cap’ that the cell companies have introduced to support metered billing; and if you’re a normal smartphone user you know that as long as you don’t watch hundreds of videos on your phone, caps are not normally an issue.  However with video transmission, data caps can become a huge issue.  If you plan on building out a streaming solution based on cellular, take the time to calculate how much data you need.  We have a handy calculator on our website http://www.telvue.com/support/calculator/ .  Quick example: 1 hour of HD video at 2.5 Mbps/720p is about 1.13 number of GB of video.  My two cents: wire-line is always best, if you can get it, as there are usually fewer variables to worry about.

Broadcast Integration

If you’re still with me, your video is encoded, and is now being transported back to your station/demarcation point.  Now it needs to be integrated into the broadcast plant.  There are several ways to do this, some more professional than others.  The “quick and dirty” method is to use a computer running VLC, or some other decode software, take the video full-screen, and take the output of the computer directly into the necessary scan converter/encoder to broadcast the signal.  This method is great if you find yourself in a pinch where something has to be done quickly and there are personnel resources to fix or restart the decoder if something happens, but are not so great if you frequently use Internet based-backhauls need an un-attended solution.  For those we look to stand-alone decoders and media servers that are capable of re-wrapping the incoming stream to a format that can be used on an IPTV network, or for integration with our HyperCaster platform.  Most companies that make encoders for Internet transport have also made some sort of matched pair decoder.  If you are looking for a more seamless solution for integration into an IPTV plant, a Wowza Media Server or Elemental Live is capable of taking an RTMP feed and turning it into UDP.  The Elemental Live has the added benefit that it can transcode that H.264 media over to an MPEG 2 if you are working within an MPEG 2 plant.  There are also methods of un-wrapping FEC and obtaining a UDP transport stream, again mitigating the need for a full decode – re-encode step which can be costly.

Down to the brass tacks.  What products do you need to build something like this?  Solutions range from do-it-yourself to turn-key.  The question to bear in mind here is: “How often will this be used?” as that will justify the cost or associated time involved with building something yourself.

  • Flash Media Live Encoder (FMLE) is a great piece of free software.  Using FMLE and an capture card you can very quickly build a streamer that could publish to a CDN (Content Delivery Network) such as Akamai, or back to a local media server (such as Wowza Media Server).
  • Wowza Media Server is capable of taking in RTP and RTMP feeds and creating UDP which could be distributed onto an IPTV network (and thus can be fed right into our HyperCaster).  Wowza, while pretty nifty, can be challenging to setup if you are unfamiliar with its architecture.
  • The Elemental Live platform recently added the ability to take an RTMP feed in and turn it around digitally into a UDP stream; the Elemental has the added benefit that it can transcode the stream to any other format or type.  Like the Wowza option this pairs great with our HyperCaster platforms.
  • Teradek makes a whole series of encoders and decoders.  Some of the models have Ethernet or WiFi, while others have Cellular.  These streamers are based on an RTMP protocol which makes them good relatively inexpensive devices that are virtually turn-key, but in high packet loss situations RTMP can be problematic.
  • LiveU makes some pretty nifty ways for transporting video across the Internet, with a heavy focus on the ability to use cell networks.  They are one of the few companies doing some incredible things using special types of FEC and cellular bonding.  Cellular bonding is a method of harnessing multiple different cell data cards, typically across multiple carriers, together to create a larger, more stable, Internet pipe suitable for video transport.  At the receiving end they have a special decoder that takes all the various streams and stitches them back together creating a continuous live stream.  The FEC is only applied as necessary and there is a control layer or feedback loop between the encoder and decoder to continuously monitor what’s going on and rapidly respond to a change in the link topology.  LiveU is very heavily used in news gathering circles due to its portability and extremely low latency.
  • QVidium is another company doing some fascinating things with Internet video transmission.  They’ve created a special protocol called “ARQ” (Automatic Re-transmission reQuest), similar to the ideas behind FEC – except it uses a feedback loop.  Using UDP as a protocol for low latency the decoder will call back to the encoder if there is packet loss and request it be sent again.  Like other methods mentioned the burden of stitching the stream back together is on the decoder.  QVidium also has a special linux-based proxy software that can un-wrap the ARQ and turn it directly into UDP – which is great if you need to distribute it right out onto a IPTV based network, which again integrates beautifully with StreamThru on the TelVue HyperCaster.

My two cents: keep an eye out for HLS and HDS options that will likely be coming down the pipe. The adaptive bitrate nature of these formats will be especially powerful in situations where a thirty-second delay is not an issue.

Make Good Choices

So which one is right for you?  Again, bear in mind how often it will be used, and for what application.  The “go live from anywhere” idea has been, and continues to be a very exciting prospect, but investing in an expensive solution that will get used twice a year may not have the cost-benefit you are looking for, yet the do-it-yourself setups could require more time and expertise than you are willing to commit.  It’s a balance between the two, for sure; there are even some rental options out there for products such as LiveU.  Most of these products also have the ability to chose quality over latency (or vice-versa).  By increasing the delay on a stream, the decoder has additional time to fix any possible issues.  I’ve had very positive results out of all the products mentioned above – but all were used in slightly different contexts; and there are yet dozens of other products on the market that I haven’t tested that all accomplish the task at hand.  Most issues I’ve encountered are specifically related to the Internet piece, which is one reason I spent so much time talking about its topology.

All of these encoders and decoders work great on a closed network, it’s the unpredictability on the public Internet that introduces a level of uncertainty about performance; thus tantamount to performance, it’s really how these encoders and decoders can recover from an unknown event precipitated by an unforeseen circumstance. Yet, at the end of the day, a well-crafted solution for Internet backhaul proves that there is more to the public Internet than just memes and cat pictures.

Facebooktwittergoogle_pluslinkedin

2 Comments

  1. Thanks for the info on products especially. I have tested several scenarios and Elemental Live looks good for the final connection. For down and dirty we have used skype/skype mobile with OK results.

  2. cperry says:

    Great point Greg- something that I didn’t really talk about was using Skype. Skype has some terrific advantages over other video calling services- primarily that the emphasis is on high quality audio. In lots of situations viewers will tolerate instances of bad video but the must be clear. Skype also has the advantage that lots of people use it already and thus the interface is simple to use. You can always dedicate a decommissioned Mac Mini with an HDMI output for use as a Skype terminal server.

Leave a Reply