<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=2975524&amp;fmt=gif">
BLOG

Sustaining Full Duplex Line Rate Capture to Disk on 100G Connections

July 26, 2016

Guest blog by Andrew Watters, Raellic CEO/Director

Advances in hardware in recent years have made it possible for anyone with a reasonable level of experience to build a device that captures all traffic on gigabit-level networks without packet loss.

I sell one such device called The Vision™. At today's performance levels, with spinning disks that can each write more than 200 MB/s, it only takes two hard drives in RAID 0 to guarantee full duplex line rate capture to disk on a saturated 1G connection. With three hard drives in RAID 0 topping out at over 600 MB/s, you could potentially tap two full duplex 1G connections and capture both streams to disk at the same time.

With such a low barrier to entry, 1G capture to disk is just not as sexy as it used to be, which begs the question: is there an upper limit on capture to disk performance?

Time Division Packet Steering

Tweet: A #firewall by itself is no longer a comprehensive #NetworkSecurity solution >>Emerging technologies have led me to believe that there is no upper limit, and I propose here the solution I came up with for continuous 100G full duplex line rate capture to disk. Apparently there is no industry-standard term for this strategy, so I suggest calling it "time division packet steering."

The genesis of this project was the Snowden files, which refer repeatedly to the government's ability to tap massive data rate connections such as undersea cables.

I kept ruminating on how I might do it with commercial off the shelf hardware, but I couldn't figure it out until it hit me one day: divide and conquer a high data rate stream into small enough pieces that each machine captures to disk at a rate within its hardware limits. The resulting design is troubling and interesting at the same time because it would enable nation state-level entities to record all of their traffic and replay as many as several days of it before an event of interest. In the U.S., traffic replay would be really useful for investigating cyber incidents and terrorist attacks. In Iran, it would certainly be an instrument of control. I don't claim to have all the answers to such problems.

NSA Prism upstream big data collection

The Vision Omega™

In any event, I believe there are a couple of ways to capture to disk on a 100G connection without packet loss, some of which would require a lot of further research and development (email me about "wavelength division packet steering" if you like). The most straightforward and easiest solution using publicly available current technology appears to be the following:

The Vision Omega Processing Diagram

  1. Insert a Garland passive 100G TAP to copy and send 100% of the network traffic from each direction. Both streams go into a FPGA-based NIC in one of two master machines (one master machine for each direction of traffic);
  2. Send every 20th packet received on the 100G NIC to one of twenty 10G ports on the same machine on a round robin basis;
  3. Send each 10G stream to a cluster unit in a 42U storage cluster;
  4. Capture the 10G stream on each cluster unit using a modified version of tcpdump;
  5. Write the stream to a RAID 0 array of three hard drives; and
  6. Merge the resulting capture files with tcpslice for later analysis. 

Whitepaper Network Connectivity

Alternatively, I could send all packets received on the 100G NIC during a particular 50 ms time period to one of the 10G NICs, again on a round robin basis. In this way, the master machines split each direction of traffic into twenty smaller streams, and each of forty cluster units captures those streams at a maximum of 625 MB/s. This rate is near the maximum write performance of three top-of-the-line 8 TB hard drives in RAID 0. With a 42U cluster I would have forty active capture units plus two hot spares, capturing at the theoretical 100G maximum of 25,000 MB/s in full duplex. I would merge the resulting binary capture files using the standard tcpslice utility with high resolution (4 ns) timestamps.

The system takes up one and a quarter racks and would cost at least a quarter million dollars to build. If I used two storage clusters, one for each direction of traffic, traffic replay would double. I could add more storage clusters and master machines to scale up to any level of traffic without too many changes.

Tweet: A #firewall by itself is no longer a comprehensive #NetworkSecurity solution >>I have a feeling this is a smaller scale version of how the U.S. government captures massive data rate connections, because according to the Snowden files, they have been doing it for many years and there doesn't seem to be another way that makes sense, because:

  • Using SSD's doesn't make sense because their service life would be short in such a demanding environment.
  • Using RAM disks as buffers doesn't make sense because that would not support continuous line rate capture to disk, only bursts.

What would make sense is using the FPGA NICs to drop uninteresting traffic and capture only interesting traffic to disk. But that would defeat the purpose of 100% capture to disk and would not permit full traffic replay.

The system I propose also enables long-term, indexed storage in a database system of the customer's choice.  I suggest iterating over the binary capture files with tshark, converting them to ASCII on the fly, and saving in a high performance database.

The Vision Omega with long-term, searchable storage appears to be very similar to the XKEYSCORE system revealed by Snowden.  According to the XKEYSCORE slide on the FBI's PRISM PowerPoint, they had like 150 sites all around the world with at least 700 servers, implying that each site captured a big portion of the internet but not all of it.  This would probably cost tens of billions of dollars, but it could be done using the approach I propose.

There are, of course, numerous challenges to implementing this system. This is a starting point, not an ending point, and you have to start somewhere. I welcome and encourage comments because I am going to be sweating bullets for a long time if my financial partner puts up the money to build this device, contact me directly at director@raellic.com

See Everything. Secure Everything.

Contact us now to secure and optimized your network operations

Heartbeats Packets Inside the Bypass TAP

If the inline security tool goes off-line, the TAP will bypass the tool and automatically keep the link flowing. The Bypass TAP does this by sending heartbeat packets to the inline security tool. As long as the inline security tool is on-line, the heartbeat packets will be returned to the TAP, and the link traffic will continue to flow through the inline security tool.

If the heartbeat packets are not returned to the TAP (indicating that the inline security tool has gone off-line), the TAP will automatically 'bypass' the inline security tool and keep the link traffic flowing. The TAP also removes the heartbeat packets before sending the network traffic back onto the critical link.

While the TAP is in bypass mode, it continues to send heartbeat packets out to the inline security tool so that once the tool is back on-line, it will begin returning the heartbeat packets back to the TAP indicating that the tool is ready to go back to work. The TAP will then direct the network traffic back through the inline security tool along with the heartbeat packets placing the tool back inline.

Some of you may have noticed a flaw in the logic behind this solution!  You say, “What if the TAP should fail because it is also in-line? Then the link will also fail!” The TAP would now be considered a point of failure. That is a good catch – but in our blog on Bypass vs. Failsafe, I explained that if a TAP were to fail or lose power, it must provide failsafe protection to the link it is attached to. So our network TAP will go into Failsafe mode keeping the link flowing.

Glossary

  1. Single point of failure: a risk to an IT network if one part of the system brings down a larger part of the entire system.

  2. Heartbeat packet: a soft detection technology that monitors the health of inline appliances. Read the heartbeat packet blog here.

  3. Critical link: the connection between two or more network devices or appliances that if the connection fails then the network is disrupted.

NETWORK MANAGEMENT | THE 101 SERIES