From 794e9853b8d947f3f70dcf60d277e8b00032e8f5 Mon Sep 17 00:00:00 2001
From: Jake Read <jake.read@cba.mit.edu>
Date: Wed, 24 Jan 2018 00:44:24 -0500
Subject: [PATCH] more cleanup

---
 README.md          | 151 +++++++++------------------------------------
 network-plotter.md |   1 +
 2 files changed, 29 insertions(+), 123 deletions(-)

diff --git a/README.md b/README.md
index c27bd40..3b0e6ff 100644
--- a/README.md
+++ b/README.md
@@ -1,115 +1,54 @@
 # Tiny Nets
 
-TinyNets presents a networking strategy for distributed robotic control systems. Resilient Stateless Multipath Message Passing for Very Fast Very Small Messages: RSMMRVFVSM ... 
+TinyNets is stateless, resilient multipath message passing for networked control systems. 
 
-## Networked Control Systems (NCS)
-
-The field of Networked Control Systems, or NCS, is unique from many other networking fields. NCS refers to any application where many devices are linked together to perform control of a physical system. They are common in robotics and avionics, where many sensors and actuators work together to perform a common goal (i.e. walking, stabilization, etc), and in manufacturing, where machine degrees of freedom are linked to close positioning control loops, and where multiple machines are linked to coordinate material handling and production scheduling. 
-
-Critically,
-- In NCS, **total throughput is valued but not a key metric.** 
-- Rather, message sizes are typically very small (between three to fifty bytes) and message delay time is the critical metric. Often, messages are only one-packet in length.  
-- **Determinism in Message Delivery Time is critical** - systems must guarantee that certain control loops 'close' within a defined set of time, less they become unstable. 
-- **Robustness is critical** - NCS should not contain any Single Points of Failure
-- **Statelessness is critical** - NCS should not pause operation under any circumstances to re-converge on routing solutions, as this adds fatal indeterminism to message delivery. 
-
-## The State of the Art in NCS
-
-State of the art Networked Control Systems employ simple Switched Ethernet, or proprietary versions thereof, in order to route traffic. Hardware endpoints are fitted with an Ethernet PHY and are connected in a heirarchy of switches. Ethernet MAC addresses are used, and all routing takes place on Layer 2. 
-
-Switched Ethernet has become the industry standard because of its relative interoperability and high speeds. Critically, the last 10 years has seen Switched Ethernet take up large portions of market share because it solves many problems associated with Fieldbusses. Most importantly, adding devices to a Fieldbus always caused a linear increase in message delivery time, as is not the case with Switched Ethernet.
-
-## Dissatisfaction with Switched Ethernet
-
-However, Switched Ethernet was not originally developed for Networked Control Systems, and many in industry have pointed out that it will not fulfill customer needs in the near- and long-term future.
-
-In Switched Ethernet, because a Minimum Spanning Tree is created, nodes in a particular layer compete for link-time on the layer above. Message delay time increases linearly with the probability that peers are transmitting at the same time, and with the number of peers on that layer. 
-
-In addition, Switched Ethernet contains Single Points of Failure, where a broken link or switch means that the network must re-run the Spanning Tree Protocol algorithm - a process that often takes seconds. Because Switched Ethernet graphs are highly heirarchical, it is often the case that failure on a single link can cause entire sections of the network to fail, or become unreachable. 
-
-Device endpoints in NCS are scaling down in size and up in number. Requiring that each endpoint carries with it an RJ45 Magnetic Jack and Ethernet PHY is dubious, and sets a lower limit on the size and complexity that sensors and actuators in an NCS must posses. 
-
-Switched Ethernet is non-programmable. I.E. Switches are black-box ICs and do not allow systems designers to arbitrarily add functions to a system on the networking layer. For example, many NCS designers would like to implement message priorities and load balancing, but this is not possible on Layer 2. 
-
-# Constraints and Cost Functions for TinyNet
-
-## Constraints
-
-In the design of TinyNets, we operate under the following constraints:
-
-- TinyNet should be trivially integrated on device endpoints. I.E. an endpoint should not require any additional hardware circuitry. This allows the network to scale down into micro-robotics applications.
-- TinyNets should run entirely in C or C++ on the processors used on endpoints and routers, meaning that network protocols can be openly modified within an Autonomous System to perform application-specific tasks. TinyNets is Open Source Software.
-- TinyNets should run with no global state. It should not have to re-converge on routing solutions in the face of broken or modified links, additions to the network, or changes in traffic patterns.
-
-## Direct Comparisons
-
-It will be difficult to perform one-to-one comparisons between our network and the state of the art, as we are proposing a completely new solution in response to problems in NCS that we believe cannot be addressed with incremental modifications to existing technologies. 
+See a longer report on the work (for 6.829) [here](https://gitlab.cba.mit.edu/jakeread/tinynets/blob/master/document/6-829_project-jr_dk_ns_pw_Tiny_Nets_finalReport.pdf). 
 
-## Proving our Merit
-
-However, we can offer analysis as to why we believe our approach is substantially better than current offerings - or has a better problem-solution fit than other technologies. 
-
-**Realtime / Convergence Free Multipath Routing in a Distance-Vector Routing Protocol**
-- Existing Multipath Routing Technologies offer multipath routing (which eliminates the switching-bottleneck issues associated with switched ethernet), however, they do so using link-state routing that requires each router to share common knowledge about the complete network graph. In the face of link outages or router failures, networks must re-converge - a process that interrupts flows and causes massive increases, or complete failures, in message deliveries. For example
- - ECMP (Equal Cost Multipath Routing): more of a tool than an actual strategy; simply considers multiple paths when there are multiple best paths, i.e. load balancing mechanism
- - OSPF (Open Shortest Path First): Computes shortest path tree using Dijstra -- must know entire graph; wikipedia states convergence time is on the order of seconds (links to Cisco default parameters that set timeouts to be multiple seconds); specifically for Ethernet and offers a multipath version
- - SPB (Shortest Path Bridging): allows multiple equal cost paths and claims network is unaffected when a node fails except for the path(s) affected by the node failure, i.e. still cannot find another path if there is a unique shortest path containing a broken link
- - TRILL (TRansparent Interconnect of Lots of Links): must know entire graph, otherwise extremely similar to our protocol (uses hop counts and has similar flooding procedure); operates in Layer 2 and uses Fabric Shortest Path First (FSPF) to calculate alternate routes in node failure scenarios
-
-We seek to demonstrate that these re-convergence times would cause operational failure in NCS, thus eliminating ECMP and OSPF as possible solutions to the NCS problem. 
-
-The three protocols in question (OSPF, SPB, TRILL) require knowing the entire graph to perform a global shortest path calculation. All three of them allow for multipath consideration when there are multiple best paths. 200 ms seems to be the lower bound on convergence times as FSPF is quoted to having as good as 200 ms convergence time in the book "IBM SAN Solution Design Best Practices for VMWare" book. The vanilla OSPF protocol that Cisco offers indicates around an order second convergence time while optimized versions offer similar timing to FSPF (see paper).
-
-200 ms sounds like a reasonable convergence time (and is quoted as being extremely fast) so to prove our merit, we need to demonstrate systems that do not have multiple shortest paths using the protocols above. This should highlight the main benefit of our protocol, that being the capability to perform real time alternate path calculations in a reasonable amount of time.
+## Networked Control Systems (NCS)
 
-We propose designing multiple experiments to showcase the benefit of our protocol:
-1. Grid structure network - test latency of corner communication and traffic of network during node failure. This test will serve as a control since there will be many shortest paths (if hop count is used as the metric).
-2. Ring structure network - we already have graphs for this from the other protocol and it will demonstrate the speed at which each protocol finds the only other path in the event of a node failure.
-3. Mesh network - fully connected network that will test our network utilization (ensure ringing doesn't happen or is at least bounded tightly). Test the latency of cross network communication in the event of a node failure. We should expect to see minimal decision time in our protocol and minimal flooding.
+*no one makes networks for machines, for reasons*
 
-**Avoiding Switching Bottlenecks with Multipath Routing**
-- In a careful literature review and analysis, we will show that Layer-2 Solutions (switched ethernet) necessarily cause switching bottlenecks that create Single Points of Failure and increases in Message Delivery Times to NCS.
+We developed TinyNet(s) in response to a lack of purpose-built networking solutions for realtime (or 'just in time') embedded systems. I.E. Robotics, Avionics, and Manufacturing devices. NCS are strange (messages are typically very small, determinism is more important than throughput, hardware resources are limited) and the market is small enough that no technology exists for the niche. 
 
-From [worst-case-packet-delay-time paper], we know that the worst case communication delay over Ethernet occurs when the number of frames attempting to communicate over a single switch is the greatest. For example, when a spanning tree organizes itself such that 24 stations are connected to a single switching hub, a typical 144-bit message with a bit time of 0.1 us would take more than 1.5 ms to finish sending a single packet from all stations. If the packets could be interconnected without the tree structure required by Ethernet/IP, transmission time could be brought down to just over 300 us. 
+## NCS State of the Art
 
-![Effects of Varying Parameters on Communication Time](https://github.com/jakeread/tinynets/blob/master/document/worst-case-ethernet.png)
+*instead they use internet tech because it's available*
 
-The above figure compares strategies for reducing communication time. The parameter which has the largest parameter is the number of frames being communicated over a single switch. This is to be expected, since that parameter will have a multiplicative effect on the queuing delay. By sacrificing the spanning tree topology and leveraging multipath routing without the added processing delays and stateful nature of ECMP or other link-state routing methodologies, we will drastically reduce frame count and, therefore, communication time.
+Most NCS use Switched Ethernet! Critically, Ethernet on its own does not allow multipath routing due to the Spanning Tree Protocol<sup>1</sup>. However, we see Multipath as a must-have for resilient message passing, and a driving strategy for bringing determinism to a network. 
 
-## Our Cost Functions
+These problems exist in Datacenters. Multipathing in a Datacenter is achieved by adding Network Controllers to the infrastructure that implement some other routing scheme<sup>1</sup> by remotely manipulating routers' port-forwarding tables. However, these Network Controllers must know *the entire state* of the network, and take time - between 200ms and 10s - to re-configure forwarding schemes. A sudden 200ms increase in the RTT of a packet is a no-go for embedded systems, where control loops rely on sensor data, say, once every 200*us* to remain stable. 
 
-In addition to proving our merit with a careful literature review, we can devise cost functions and metrics to measure our development. These include:
+## The TinyNets Strategy
 
-**Optimal Network Utilization**
-- Using logic analyzers and in-system programming to track message delay times and routes, we will measure the efforts of TinyNet routing policies to optimize network utilization and minimize message delivery times.
+*so we invent a strategy that does the things we want*
 
-**Deterministic Packet Delivery in the face of Increasing Network Traffic**
-- Similarly, we will measure TinyNet performance (in terms of message delivery times) as total network traffic increases.
+TinyNet routers maintain a port forwarding table and update one-another on their current queue sizes. With this small amount of data, each router makes a semi-intelligent port-forwarding decision for each flow of data. Essentially, the job of the Network Controller is distributed into each router - and routers use *local* information rather than *global* information to perform a distributed 'greedy' path-planning algorithm. This is a simple Distance-Vector scheme.
 
-**Robust Packet Delivery in the face of Lost Links and Routers**
-- We will demonstrate TinyNet adaptively re-routing messages as links are cut, or routers are removed from the network graph, without any network re-convergence.  
+## Results
 
-# Key Contributions
+*it works pretty well* 
 
-A strategy for stateless multipath routing that increases message delivery time determinism and network robustness.
+We show that TinyNets recovers fairly well from the loss of nodes:
 
-A real-time cost function, using next-hop buffer size (i.e. busyness metric) as well as historical hop-count for per-packet dynamic re-routing, that increases packet delivery-time determinism. 
+![failure-recovery](https://gitlab.cba.mit.edu/jakeread/tinynets/blob/master/document/node_failure_recovery.png) 
 
-A software-defined network architecture for arbitrary implementation in any embedded system, where computing, physical space, and time is limited. 
+In addition, we developed TinyNets to run on small microcontrollers, that can simulatenously be used to do 'other stuff' - i.e. motor control, sensor reading, etc. 
 
-# Direct Applications
+![12-routers](https://gitlab.cba.mit.edu/jakeread/tinynets/blob/master/document/hardware-12_routers.jpg) 
 
-#### Micro-Robotics with Complexity
+![one-router](https://gitlab.cba.mit.edu/jakeread/tinynets/blob/master/document/atsam-router-board.png)
 
-NCS via Switched Ethernet is impossible to drive into micro-robotics, where a single RJ45 Jack is larger than many endpoints. TinyNets provides a strategy and implementable software stack that will allow roboticists to bring adaptable, real-time networking to highly interconnected and complicated robotic systems. 
+On the hardware we tested (a 300MHz embedded uc) we record a packet processing time<sup>3</sup> of about 37us. 
 
-#### Avionics
+![p_time](https://gitlab.cba.mit.edu/jakeread/tinynets/blob/master/results/3node-delays/results-D_p.png)
 
-Existing NCS are prone to Single Points of Failure and rapidly scaling complexity and cost. These is the bane of aircraft systems designers. We believe that TinyNet can offer avionics networks a robust, simple, and easily extensible strategy for the design and implementation of NCS in aircraft control systems. 
+With a bitrate of about ~2MBPS we do a total per-hop time of ~ 100us. This means that between two nodes we can run a 10kHz control loop. 
 
-##### Open Source Reconfigurable Hardware Systems
+# Footnotes
 
-In particular, hardware design for embedded systems in the open source (i.e. non-expert) realm. We want to offer a networking solution that allows open source designers to easily integrate their systems. We take the example of a proliferation of 3D Printer Control systems, none of which interop, and the interfaces between are a PITA. Expand on this. 
+1. A few: TRILL, OSPF, ECMP, SPB
+2. STP is necessary to avoid endless message looping on flood packets.
+3. The time it takes for a microcontroller to decide where to forward it's packet.
 
 # TinyNet Protocol & Architecture
 
@@ -215,16 +154,9 @@ New arrivals to network do not announce, they simply begin transmitting. Their a
 ## Withdrawals
 Buffer Depth Updates are Periodic as well as event-based (on buffer-depth change). When no BDU is heard within a 250ms (or other setting) window, the node is considered withdrawn.
 
-# Hardware
-
-![first-board](https://github.com/jakeread/tinynets/blob/master/document/xmega128-fourport-v0-1.png)  
-
-See /circuit 
-See /embedded 
-
 # Next Steps
 
-Presents a good option for wired routing over robust networks, with some complexity pushed into the nodes. Only really advantageous when we want to be able to re-route messages upstream in order to move around bottlenecks. TN does load-balancing without thinking about it... other approaches would require some implementation. Perhaps there's a learning function for network path planning & routing? 
+Presents a good option for wired routing over robust networks, with some complexity pushed into the nodes. Only really advantageous when we want to be able to re-route messages upstream in order to move around bottlenecks. TN does load-balancing without thinking about it... other approaches would require some implementation. Napoleon's Messenger.
 
 If we include flows, needs per-flow, not per-packet, routing - if we are going to hop about per buffers etc. 
 
@@ -234,31 +166,4 @@ Still some question about duplicate message arrivals after message? OR don't pac
 
 TN wins over the blissfully simple APA when we want a *very big* network, say, want 1,000,000 objects to individually address 1,000,000 objects. Here we also need to introduce heirarchichal addressing. 
 
-'Napoleon's Messenger' 
-
-APA is *loads* simpler to implement on FPGA
-
-# Reading
-
-#### Networked Control Systems
-Ethernet in Networked Control, advantages and drawbacks.  
-See especially  
-**/litreview/papers/network-control-systems/survey-on-realtime-via-ethernet**  
-**/litreview/papers/network-control-systems/the-emergence-of-networked-controls**  
-
-##### Packet Delivery Times in the face of Increasing Network Traffic
-
-~ plot w/ x-axis is # nodes, transmitting at some % of time, y axis is per-packet delivery time ~
-~ graph is single branch / spanning tree
-
-Critically, adding multi-path to the graph above decreases the slope of this plot. Twice the total link-time on the layer above is available, or Nx, where N is the number of nodes added to the next. 
-
-* state of the art for this? there is lit review to do here - look up TRILL and SPB (shortest path bridging). Much of it comes from Data Center Networks. 
- - https://datacenteroverlords.com/2011/08/19/multi-path-ethernet-the-flying-cars-of-the-data-center/ 
- - http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6888840
- - https://en.wikipedia.org/wiki/Equal-cost_multi-path_routing
- - https://infoscience.epfl.ch/record/231114/files/main_nr.pdf
-
-Also critically, *it seems that, in a cursory overview* many multi-path techniques in use today basically do dynamic spanning-tree rebuilds. These spanning-tree rebuilds have long convergence times (162ms is excitingly fast). 
-
-However, to increase determinism routes need to be chosen dymanically - so that *very short timescale* changes can be made to a packet, mid-network-traverse, to route around busy switches. Of interest is finding evidence that this is not done in any existing system.
\ No newline at end of file
+Critically, measuring APA vs. TN - when does multi-pathing become really important? 
\ No newline at end of file
diff --git a/network-plotter.md b/network-plotter.md
index be2dcf0..1d40b7f 100644
--- a/network-plotter.md
+++ b/network-plotter.md
@@ -6,3 +6,4 @@ Certainly we have these axis:
  - Bitrate
  - RTT
  - Goodput - but this varies on packet size ?
+ - ? how to quantify design parameters ???
\ No newline at end of file
-- 
GitLab