Flow 1.0.0
Flow project: Full implementation reference.
bandwidth.hpp
Go to the documentation of this file.
1/* Flow
2 * Copyright 2023 Akamai Technologies, Inc.
3 *
4 * Licensed under the Apache License, Version 2.0 (the
5 * "License"); you may not use this file except in
6 * compliance with the License. You may obtain a copy
7 * of the License at
8 *
9 * https://www.apache.org/licenses/LICENSE-2.0
10 *
11 * Unless required by applicable law or agreed to in
12 * writing, software distributed under the License is
13 * distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
14 * CONDITIONS OF ANY KIND, either express or implied.
15 * See the License for the specific language governing
16 * permissions and limitations under the License. */
17
18/// @file
19#pragma once
20
23#include <boost/weak_ptr.hpp>
24
25namespace flow::net_flow
26{
27// Types.
28
29/**
30 * A per-Peer_socket module that tries to estimate the bandwidth available to the outgoing flow.
31 * That is, it tries to get at the # of bytes per unit time the outgoing empty pipe can support,
32 * minus the # of bytes per unit time being transferred by all other flows in that pipe (NetFlow or
33 * otherwise). This is useful primarily for certain congestion control strategies (like
34 * Congestion_control_classic_with_bandwidth_est) but may also be good information to make available
35 * to the application layer (e.g., if the user wants to inform a media player what is a suitable bit
36 * rate to avoid rebuffering). The latter is the reason why Send_bandwidth_estimator is not part of
37 * a congestion control strategy but rather a separate class.
38 *
39 * How can we make this estimate? As a black box, we care about just one event, `on_acks(N)`, where `N`
40 * is the number of bytes that have been cleanly acknowledged by the receiver. The `net_flow` engine
41 * should inform this module about each such event. The running bandwidth estimate is then
42 * available via bandwidth_bytes_per_time().
43 *
44 * ### Object life cycle ###
45 * There is a strict 1-to-1 relationship between one Send_bandwidth_estimator
46 * instance and one Peer_socket. A Send_bandwidth_estimator is created shortly after Peer_socket
47 * is and is saved inside the latter. Conversely a pointer to the Peer_socket is stored inside the
48 * Send_bandwidth_estimator (for read-only access to stats such as SRTT). The containing
49 * Peer_socket must exist at all times while Send_bandwidth_estimator exists. Informally it's
50 * recommended that Peer_socket destructor or other method deletes its Send_bandwidth_estimator
51 * instance when it is no longer needed.
52 *
53 * Relationship with outside world:
54 *
55 * 1. Send_bandwidth_estimator is a black box to Node and Peer_socket code (no
56 * access to internals; access only to constructors/destructor and API).
57 * 2. Send_bandwidth_estimator has `const` (!) `friend` access to Peer_socket
58 * internals.
59 * 3. The programmer of Send_bandwidth_estimator must assume a certain event
60 * model to be followed by Node. This model is to be explicitly explained in the doc headers
61 * for any `on_...()` methods. Node must call the `on_...()` methods as soon as it
62 * detects the appropriate events, and it should aim to detect them as soon as possible after
63 * they occur.
64 * 4. The programmer of Send_bandwidth_estimator subclass may assume the existence
65 * and meaning of certain state of Peer_socket, which she can use to make internal
66 * computations. Any such state (i.e., in addition to the on-event calls, and their
67 * arguments, in (3)) must be explicitly documented in the class or method doc headers.
68 * 5. Informally, it is assumed that the Node is sending data at a rate close to the available
69 * capacity of the pipe at all times (and thus the resulting on_acks() events reflect this
70 * available maximum possible speed). (Of course this can only be a best effort and cannot be
71 * guaranteed; indeed Send_bandwidth_estimator itself can help the congestion control
72 * mechanism in use to maintain a sending rate close to the maximum possible, which in turn
73 * will feed good acknowledgment samples to Send_bandwidth_estimator. It sounds like a
74 * circular technique, but it can work well.)
75 *
76 * Note that the assumption in (5) can also not hold if the sending rate (and thus the rate of
77 * returning acknowledgments) is application-limited; that is the user is not sending data as fast
78 * Node allows, or in particular simply not sending data. In that case the bandwidth readings may
79 * be inaccurately low (very much so possibly). Therefore only rely on the results of this class when
80 * not application-limited in this way.
81 *
82 * ### Units ###
83 * bandwidth_bytes_per_time() returns the current bandwidth estimate per unit time U, as an
84 * *integer* number of bytes (rounded down). What is U? U is given as `Time_unit(1)`, where
85 * #Time_unit is a public alias of a boost.chrono duration type. Be careful in any arithmetic done
86 * with the value returned; both overflow and underflow can occur, if one does not take care to
87 * sanity-check the arithmetic. The justification for the value of the #Time_unit alias is given
88 * in the #Time_unit doc header.
89 *
90 * ### Thread safety ###
91 * Unless stated otherwise, a Congestion_control_strategy object is to be accessed
92 * from the containing Peer_socket's Node's thread W only.
93 *
94 * Implementation notes
95 * --------------------
96 *
97 * How do we estimate the available bandwidth? Basically, we assume the sender -- whenever giving
98 * us on_acks() events -- is sending data at the maximum possible rate. Therefore, if we divide the
99 * # of bytes reported in acknowledgments by the time period over which we've accumulated those
100 * acknowledgments, then we know the available bandwidth over that time period. Such bandwidth
101 * samples can be blended with past samples using a low-pass filter to get a smoothed bandwidth
102 * estimate. Of course the question arises: what time period to choose, and what filter to use? We
103 * use the Westwood+ algorithm that answers these questions. The basic idea is to maintain a byte
104 * count that starts at 0 at the beginning of the sample period and keeps accumulating as the
105 * acknowledgments come in; once the sample period > SRTT (smoothed round trip time of the socket),
106 * the bytes/time bandwidth sample is taken and added into the running filter; and the new sample
107 * begins with the byte count at 0. The justification for these choices is given, partially, in the
108 * code and, more completely, in the papers referenced below.
109 *
110 * Why not make this a strategy pattern (like Congestion_control_strategy hierarchy) instead of just
111 * a single class, so that multiple ways of bandwidth estimation can exist? Because, as far as I
112 * know, Westwood+ is the best available technique, so I don't see the point of doing anything else.
113 * If that changes, revisit this.
114 *
115 * @todo Look into Google BBR, a recent (as of 2016) state-of-the-art congestion control algorithm that
116 * aims to estimate bandwidth among other things. Depending on what what one decides, another bandwidth
117 * estimator class could be written.
118 *
119 * @see Linux kernel's `tcp_westwood.c` for inspiration behind this implementation; the top comment in
120 * that file also cites the main papers that describe the Westwood+ bandwidth estimator.
121 */
123 public log::Log_context,
124 private boost::noncopyable
125{
126public:
127
128 // Types.
129
130 /**
131 * Type that represents the number of bytes, either as an absolute amount or over some time
132 * period. To avoid any surprises we set its width explicitly instead of using `size_t`.
133 */
134 using n_bytes_t = uint64_t;
135
136 /**
137 * The primary time unit over which this class reports bandwidth. So when
138 * bandwidth_bytes_per_time() return the number N, that means its bandwidth estimate is N bytes
139 * per `Time_unit(1)` time.
140 *
141 * ### Implementation notes ###
142 * Why choose milliseconds? There are two conflicting constraints on
143 * #Time_unit. In the basic bandwidth sample computation, the formula is B = N/D, where N is the
144 * number of bytes acknowledged over time period D, and D is a #Fine_duration. Note that B is an
145 * integer (#n_bytes_t), as we try to avoid floating-point computations, so the division rounds
146 * down. #Fine_duration, we can assume, is at least nanoseconds (10^-9 seconds).
147 *
148 * If #Time_unit is something large, like seconds, then in the straightforward B = N/D formula
149 * nanoseconds must first be converted (with integer math only!) to seconds. This loses a ton of
150 * precision in D, which will typically be something like .001-.2 seconds (up to about an RTT).
151 * That's clearly unacceptable. If #Time_unit is something tiny, like nanoseconds, then N/D's
152 * minimum non-zero value (1) will represent 10^9 bytes per second (all lower bandwidths B
153 * rounding down to zero), which is clearly absurd. What about milliseconds? RTTs are
154 * classically given in milliseconds anyway, so the precision is acceptable (though #Fine_clock can
155 * in fact give much more precise values -- but that precision is not necessary here). Meanwhile,
156 * the lowest value B = N/D = 1 represents 1000 bytes per second, which is definitely a very low
157 * bandwidth and thus good enough. So, we use milliseconds.
158 *
159 * Another important reason milliseconds is a reasonable unit is in the outside CWND = B * RTTmin
160 * calculation (see Congestion_control_classic_with_bandwidth_est), where CWND is the congestion
161 * window, B is the bandwidth estimate we generate, and RTTmin is some RTT. Assuming RTTmin is
162 * also in units of #Time_unit, and we use a very large unit for B, then the result of that
163 * calculation may actually exceed 64 bits, which will cause overflow. On the other hand if we
164 * just express B in bytes/ms and RTTmin in ms, then the formula avoids that danger while being
165 * quite precise enough.
166 */
167 using Time_unit = boost::chrono::milliseconds;
168
169 // Constructors/destructor.
170
171 /**
172 * Constructs object by setting up logging and saving a pointer to the containing Peer_socket.
173 * bandwidth_bytes_per_time() will return 0 until enough on_acks() events have occurred to make it
174 * receive a higher value, but certainly at least until on_acks() is called once.
175 *
176 * Only a weak pointer of `sock` is stored: the `shared_ptr` itself is not saved, so the reference
177 * count of `sock` does not increase. This avoids a circular `shared_ptr` situation that would arise
178 * from `*this` pointing to `sock`, and `sock` pointing to `*this` (the two objects *do* need access
179 * to each other, as explained in class doc header).
180 *
181 * @param logger_ptr
182 * The Logger implementation to use subsequently.
183 * @param sock
184 * The Peer_socket for which this module will estimate available outgoing bandwidth.
185 */
187
188 // Methods.
189
190 /**
191 * Returns the current estimate of the available outgoing bandwidth per unit time for the
192 * containing socket's connection, in units of bytes per `Time_unit(1)`. This value may be zero if
193 * either there is not enough information to make a reasonable estimate, or if the estimated
194 * bandwidth is less than a certain low threshold.
195 *
196 * @return Ditto.
197 */
199
200 /**
201 * Informs the bandwidth estimator strategy that 1 or more previously sent packets whose status
202 * was In-flight just received acknowledgments, thus changing their state from In-flight to
203 * Acknowledged. For efficiency and simplicity of behavior, on_acks() should be called as few
204 * times as possible while still satisfying the requirement in the previous sentence. That is,
205 * suppose acknowledgments for N packets were received simultaneously. Then on_acks() must be
206 * called one time, not several times.
207 *
208 * on_acks() must be called *after* any RTT measurement based on the acked bytes has affected
209 * the recorded SRTT in `sock` passed in constructor.
210 *
211 * Additional Peer_socket state used: Peer_socket::m_snd_smoothed_round_trip_time.
212 *
213 * Assumptions about ACK sender (DATA received): Same as Congestion_control_strategy::on_acks().
214 * However Send_bandwidth_estimator does not strictly rely on those exact requirements; however it
215 * generally expects quick acknowledgments in order to be effective, and
216 * Congestion_control_strategy::on_acks()'s specific requirements happen to basically satisfy that
217 * informal requirement.
218 *
219 * @note bandwidth_bytes_per_time() may return a different value after this call. If something,
220 * like congestion control, relies on that value and can be reasonably be called right after
221 * or right before this on_acks(), recommend calling it right after this on_acks().
222 * @note Acknowledgments of data that are not currently In-flight due to being Dropped (a/k/a late
223 * ACKs) or Acknowledged (i.e., duplicate ACKs) must NOT be passed to this method.
224 * @note on_acks() makes no assumptions about how the reported individual packet acks were
225 * packaged by the ACK sender into actual ACK packets (how many ACKs there were, etc.).
226 * It just assumes every individual acknowledgment is reported to on_acks() as soon as
227 * possible and grouped into as few on_acks() calls as possible.
228 * @note For definition of In-flight, Acknowledged, and Dropped bytes, see
229 * Peer_socket::m_snd_flying_pkts_by_sent_when and Peer_socket::m_snd_flying_pkts_by_seq_num.
230 *
231 * @param bytes
232 * The sum of the number of bytes in the user data fields of the packets that have been
233 * Acknowledged. Must not be zero.
234 */
235 void on_acks(size_t bytes);
236
237private:
238 // Methods.
239
240 /**
241 * Applies the low-pass filter that takes the given previous result of the filter and blends in
242 * the given new sample. The values should be in the same units, which are presumably bytes per
243 * `Time_unit(1)`.
244 *
245 * @param prev_val_per_time
246 * Previous result of this filter.
247 * @param new_sample_val_per_time
248 * New sample to blend into that filter.
249 * @return The blended value.
250 */
251 static n_bytes_t apply_filter(n_bytes_t prev_val_per_time, n_bytes_t new_sample_val_per_time);
252
253 /**
254 * Utility that returns a handle to the containing Peer_socket. If somehow the containing
255 * Peer_socket has been deleted, `assert()` trips.
256 *
257 * @return Ditto.
258 */
260
261 // Data.
262
263 /**
264 * The containing socket (read-only access). Implementation may rely on various state stored
265 * inside the pointed-to Peer_socket.
266 *
267 * Why `weak_ptr`? See similar comment on Congestion_control_strategy::m_sock.
268 */
269 boost::weak_ptr<Peer_socket::Const_ptr::element_type> m_sock;
270
271 /**
272 * The current smoothed bandwidth estimate, to be returned by bandwidth_bytes_per_time(), in the
273 * units described by that method's doc header. Zero until the first sample is completed.
274 */
276
277 /**
278 * In the same units as #m_bytes_per_time_smoothed, the less smoothed bandwidth estimate, which is
279 * simply the progressive application of apply_filter() on each available bandwidth sample. It's
280 * not used by bandwidth_bytes_per_time() but instead is blended into #m_bytes_per_time_smoothed.
281 *
282 * I do not really understand why `tcp_westwood.c` (Linux kernel) does it that way; I get that it
283 * adds an extra level of smoothing, but I did not catch any obvious references to this in the
284 * papers. Perhaps this is some elementary math that is just assumed to be understood. Anyway, I
285 * took it from `tcp_westwood.c`. Their comment on the equivalent data member `bw_ns_est` says:
286 * "first bandwidth estimation..not too smoothed 8)" which is not too helpful to me.
287 */
289
290 /**
291 * The number of bytes acknowledged by receiver since #m_this_sample_start_time (the time when the
292 * current sample began to be compiled). When a sample is deemed sufficient, `m_bytes_this_sample
293 * / (now() - m_this_sample_start_time)` is the individual bandwidth sample fed into the filter.
294 */
296
297 /// The time at which the currently ongoing bandwidth sample began to accumulate.
299
300 /// `true` until on_acks() called for the first time; `false` forever thereafter. Used to begin the sample sequence.
302
303 /**
304 * `true` until a sample has been completed and fed into the filter; `false` forever thereafter. Used to begin the
305 * sample sequence.
306 */
308}; // class Send_bandwidth_estimator
309
310} // namespace flow::net_flow
Convenience class that simply stores a Logger and/or Component passed into a constructor; and returns...
Definition: log.hpp:1619
Interface that the user should implement, passing the implementing Logger into logging classes (Flow'...
Definition: log.hpp:1291
A per-Peer_socket module that tries to estimate the bandwidth available to the outgoing flow.
Definition: bandwidth.hpp:125
n_bytes_t m_bytes_this_sample
The number of bytes acknowledged by receiver since m_this_sample_start_time (the time when the curren...
Definition: bandwidth.hpp:295
n_bytes_t bandwidth_bytes_per_time() const
Returns the current estimate of the available outgoing bandwidth per unit time for the containing soc...
Definition: bandwidth.cpp:38
boost::chrono::milliseconds Time_unit
The primary time unit over which this class reports bandwidth.
Definition: bandwidth.hpp:167
static n_bytes_t apply_filter(n_bytes_t prev_val_per_time, n_bytes_t new_sample_val_per_time)
Applies the low-pass filter that takes the given previous result of the filter and blends in the give...
Definition: bandwidth.cpp:231
boost::weak_ptr< Peer_socket::Const_ptr::element_type > m_sock
The containing socket (read-only access).
Definition: bandwidth.hpp:269
uint64_t n_bytes_t
Type that represents the number of bytes, either as an absolute amount or over some time period.
Definition: bandwidth.hpp:134
Peer_socket::Const_ptr socket() const
Utility that returns a handle to the containing Peer_socket.
Definition: bandwidth.cpp:238
Fine_time_pt m_this_sample_start_time
The time at which the currently ongoing bandwidth sample began to accumulate.
Definition: bandwidth.hpp:298
bool m_no_samples_yet
true until a sample has been completed and fed into the filter; false forever thereafter.
Definition: bandwidth.hpp:307
Send_bandwidth_estimator(log::Logger *logger_ptr, Peer_socket::Const_ptr sock)
Constructs object by setting up logging and saving a pointer to the containing Peer_socket.
Definition: bandwidth.cpp:26
bool m_no_acks_yet
true until on_acks() called for the first time; false forever thereafter. Used to begin the sample se...
Definition: bandwidth.hpp:301
n_bytes_t m_bytes_per_time_less_smoothed
In the same units as m_bytes_per_time_smoothed, the less smoothed bandwidth estimate,...
Definition: bandwidth.hpp:288
n_bytes_t m_bytes_per_time_smoothed
The current smoothed bandwidth estimate, to be returned by bandwidth_bytes_per_time(),...
Definition: bandwidth.hpp:275
void on_acks(size_t bytes)
Informs the bandwidth estimator strategy that 1 or more previously sent packets whose status was In-f...
Definition: bandwidth.cpp:43
Const_target_ptr Const_ptr
Short-hand for ref-counted pointer to immutable values of type Target_type::element_type (a-la T cons...
Flow module containing the API and implementation of the Flow network protocol, a TCP-inspired stream...
Definition: node.cpp:25
Fine_clock::time_point Fine_time_pt
A high-res time point as returned by Fine_clock::now() and suitable for precise time math in general.
Definition: common.hpp:407