Flow 2.0.0
Flow project: Full implementation reference.
checkpt_timer.hpp
Go to the documentation of this file.
1/* Flow
2 * Copyright 2023 Akamai Technologies, Inc.
3 *
4 * Licensed under the Apache License, Version 2.0 (the
5 * "License"); you may not use this file except in
6 * compliance with the License. You may obtain a copy
7 * of the License at
8 *
9 * https://www.apache.org/licenses/LICENSE-2.0
10 *
11 * Unless required by applicable law or agreed to in
12 * writing, software distributed under the License is
13 * distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
14 * CONDITIONS OF ANY KIND, either express or implied.
15 * See the License for the specific language governing
16 * permissions and limitations under the License. */
17
18/// @file
19#pragma once
20
23#include "flow/log/log.hpp"
24#include <boost/chrono/process_cpu_clocks.hpp>
25#include <string>
26#include <vector>
27
28namespace flow::perf
29{
30
31/**
32 * The central class in the `perf` Flow module, this efficiently times the user's operation, with a specified subset
33 * of timing methods; and with the optional ability to time intermediate *checkpoints* within the overall operation.
34 *
35 * ### How to use ###
36 * To use this: Construct a Checkpointing_timer; perform the operation in question; then access results via accessors;
37 * including in particular the string/stream output operation that lays out results in human-friendly fashion.
38 * Optionally, call checkpoint() to mark down intermediate timing results, so that the operation can be broken down
39 * into its component parts (the string/stream output will include these). This is optional, except that at least
40 * 1 checkpoint is required: the last checkpoint represents the end of the measured operation.
41 *
42 * The actual measurements are performed by sampling a time stamp from *each* of N clocks, where N <= M, and M is the
43 * number of values in perf::Clock_type `enum`. The N clocks are specified at construction by user; thus can measure
44 * just 1, in particular, as is often desired. This is a critical and potentially subtle decision; see doc headers
45 * of each perf::Clock_type member for tips.
46 *
47 * ### Performance ###
48 * This is a performance-measuring facility, and the performance of a performance-measuring-thing is quite important,
49 * especially since some of the things being timed can be very quick (single-digit microseconds potentially).
50 *
51 * Extreme care has been taken to keep the computational fat of all the book-keeping in the class very low
52 * (and empirical testing shows this was successful). In the cases where user cooperation is necessary to avoid undoing
53 * this, the doc headers point it out. Please read docs carefully to avoid timing-technique traps that might lead to
54 * wrong conclusions and frustration.
55 *
56 * ### Thread safety ###
57 * Checkpointing_timer itself is not thread-safe for non-`const` access to an object while any other access occurs
58 * concurrently. (Internally, there is no locking.) However, the nature of how one uses a timer object is that
59 * one does stuff to be measured, checkpoint(), more stuff, checkpoint(), ..., final checkpoint(). In other words,
60 * by its very nature, it expects only sequential non-`const` calls while measuring times -- so thread safety
61 * should not come up. (However, thread safety enters the picture with Checkpointing_timer::Aggregator; see below.)
62 *
63 * Aggregation
64 * -----------
65 * `Duration` result *aggregation* is the common-sense idea that to accurately measure the typical length of operation X
66 * is to repeat X many times (N times) and then view a sum-over-N-samples and/or mean-over-N-samples and/or
67 * mean-over-N-samples-scaled-times-M, where M is usually some convenience constant like 1,000,000. There are
68 * two patterns we provide/suggest to accomplish aggregation.
69 *
70 * ### Lowest-overhead aggregation approach: Single Checkpointing_timer ###
71 * Suppose you have operation X and are *not* interested in subdividing it into checkpoints; you only care about the
72 * total duration(s) (according to which `Clock_type`s interest you). Then the following technique leads to the
73 * lowest possible overhead:
74 *
75 * ~~~
76 * flow::perf::Checkpointing_timer sum_timer("op name", which_clocks, 1, logger); // 1 checkpoint only.
77 * const unsigned int n_samples = 1000000;
78 * for (unsigned int sample_idx = 0; sample_idx != n_samples; ++sample_idx)
79 * {
80 * // ...Operation X being measured goes here.
81 * }
82 * sum_timer.checkpoint("total"); // Mark down the total time taken.
83 * // Log the total duration(s), one per clock type!
84 * flow::perf::Checkpointing_timer::Aggregator::log_aggregated_result_in_timer(sum_timer, n_samples, false);
85 * // And/or: Log mean duration(s), times N_SAMPLES_SCALE_CONVENIENCE, one per clock type!
86 * {
87 * // Just make a copy of the raw sum, then scale it x N_SAMPLES_SCALE_CONVENIENCE / n_samples.
88 * auto mean_scaled_timer(sum_timer);
89 * mean_scaled_timer.scale(N_SAMPLES_SCALE_CONVENIENCE, n_samples);
90 * flow::perf::Checkpointing_timer::Aggregator::log_aggregated_result_in_timer(mean_scaled_timer, n_samples,
91 * true, N_SAMPLES_SCALE_CONVENIENCE);
92 * }
93 * ~~~
94 *
95 * This involves minimal overhead, as no timing calls occur at all until X is repeated a million times.
96 *
97 * ### Cross-product aggregation approach: Multiple `Checkpointing_timer`s + Checkpointing_timer::Aggregator ###
98 * If, by contrast, you want to divide X into checkpoints and see (say) how long checkpoint 3 takes on average
99 * (as well as the total as above), then use the dedicated Checkpointing_timer::Aggregator class for that purpose.
100 * This approach also allows one to do stuff (possibly unrelated stuff) between the many repetitions of operation X.
101 * Only X itself will be timed.
102 *
103 * To use `Aggregator`, construct one; then simply create a Checkpointing_timer for each repetition of X and time it
104 * as normal. Before or after each repetition, register it inside the agg using Aggregator::aggregate() which takes
105 * a pointer to Checkpointing_timer. The only requirement is that the number and respective names of the checkpoints
106 * in each Checkpointing_timer are the same for all of them.
107 *
108 * Once enough samples (`n_samples` in earlier example) have been thus collected, Aggregator::log_aggregated_results()
109 * will log every potential form (sum, mean, scaled-mean), not just for the total operation X but also
110 * each checkpoint(). For example, if X consists of steps X1, X2, X3, then one would see the mean duration of X1,
111 * X2, and X3 individually and their means would add up to the mean for all of X.
112 *
113 * This is quite powerful; and while the overhead of creating individual Checkpointing_timer objects, say, *might*
114 * affect the overall performance of the algorithm involving X, it shouldn't have any direct effect on the values
115 * actually being measured, since only each X invocation itself is measured; stuff in-between (slow or not) isn't.
116 * However:
117 * - Checkpointing_timer::Aggregator::aggregate() isn't thread-safe for concurrent access (so if that's relevant,
118 * you'll need a mutex lock around the call).
119 * - Always ensure you understand the structure of your measurement plan as it relates to concurrency.
120 * For example, it's meaningless to measure processor-time cost of X being done repeatedly, if various incarnations
121 * of X might be performed partially concurrently (say, if X = handling 1 HTTP request in a multi-threaded server
122 * under load). In that case processor-times will overlap and produce essentially meaningless (at the very least
123 * misleading) results. On the other hand, real-time measurements (Clock_type::S_REAL_HI_RES) make plenty of sense
124 * in that situation and would indeed reliably measure latency. Thread-time measurements
125 * (Clock_type::S_CPU_THREAD_TOTAL_HI_RES) can also make sense. Just saying: do not assume all clock types
126 * make sense in the face of concurrent operations; plan mindfully.
127 *
128 * @see Checkpointing_timer::Aggregator.
129 *
130 * @internal
131 *
132 * Implementation/design notes
133 * ---------------------------
134 * On the one hand we wanted to be able to measure multiple clock types; and since at least some of them are
135 * mutually complementary (Clock_type::S_CPU_USER_LO_RES and Clock_type::S_CPU_SYS_LO_RES happen to measure the
136 * user + kernel components of overall processor-time) they need to be sampled simultaneously. On the other hand
137 * we know clock sampling itself can take non-trivial time; so it must be possible to specify the clocks of interest.
138 *
139 * Also, we wanted to be able to programmatically and/or via logs provide convenient human access to the results,
140 * including optional ability to time the various parts of a multi-step operation and then see which part took how
141 * long; a/k/a checkpoints.
142 *
143 * On the other hand it all needs to be maximally fast, adding very low overhead to the operation being measured.
144 * The way I thought of this was considering the absolute bare minimum of measuring durations: one simply samples the
145 * clock and records it somehow very fast (stack variable perhaps); then at the end one can take time point differences
146 * and log some durations. This is the minimum; so every computational addition to that must be considered for impact.
147 * In the end, it was successful; as measured by actually raw-timing the timing facilities. How is this accomplished?
148 *
149 * - The core operation is checkpoint(), when a final or intermediate time point is sampled; so that's the thing
150 * that must be fast.
151 * - Each sample is recorded in a Duration_set which is a *statically* sized array of N `Duration` values;
152 * and each Duration (chrono.duration) is a simple nanosecond (or whatever is the unit of flow::Fine_clock, but
153 * let's say nanoseconds) count; N is the # of clocks available.
154 * - The samples are recorded in a simple `vector<>`, so basically it is an array of nanosecond (or whatever)
155 * counts, with no pointer indirection inside each element.
156 * - Moreover, the `vector<>` is preallocated to be large enough to store the max # of checkpoints.
157 * The user specifies this maximum at construction, guaranteeing (re-)allocation never occurs while stuff is
158 * being timed. In other words checkpoint() -- when it gets down to the low level of what is actually records --
159 * places the sampled nanosecond count (for each clock) into a preallocated buffer whose address/size doesn't
160 * change, at the address that's simply `i * sizeof(<N nanosecond counts tightly packed>)`, where `i` is the
161 * count of checkpoints already recorded.
162 * - In a nod to the goal of making things human-readable, we add one more thing into the data structures explained
163 * so far. Each checkpoint (of which there is at least 1, the final one indicating total elapsed time) therefore
164 * has a string name supplied by user in checkpoint() call. Hence instead of storing the `Duration_set`s
165 * directly in the array, instead a Checkpoint `struct` wraps the Duration_set and name; but this doesn't add any
166 * per-sample pointer indirection, so in term of performance it only increases the above `sizeof()` to (internally)
167 * also include the `std::string` stuff (probably `char*` pointer and a couple `size_t`s).
168 * - Using move semantics, the user's `std::string` is not even copied when it is recorded; only the internal
169 * `string` buffer pointer and sizes are copied.
170 * - However, the user must form the `string` in the first place! This is a potential gotcha albeit only when
171 * one times very short operations. I actually performed analysis of the cost of this. The outcome is as
172 * follows:
173 * - A nearly-constant checkpoint name (like this: `T.checkpoint(string("sample"));`) is essentially free.
174 * - A dynamically computed checkpoint name, using an improved version of `ostringstream` -
175 * `T.checkpoint(util::ostream_op_string("sample", sample_idx));` - has some cost: ~0.5 microseconds
176 * per sample (MacBook Pro 2015). Whether this matters or not depends on the op one is measuring, but
177 * in my experience it's significant only for the most subtly-tiny ops. In any case, the official doc header
178 * for checkpoint() makes suggestions accordingly. (Spoiler alert: Ultimately - if necessary - one can always
179 * use a constant checkpoint name, or construct a dynamic name via ultra-fast C-style `char*` direct-writing.)
180 *
181 * The bottom line is, there's very low overhead added, as long as the user mindfully provides the max # of checkpoints
182 * at construction; and doesn't waste time on fancily naming the checkpoints.
183 *
184 * @todo `const std::string&` to `String_view` in all of flow::perf where applicable. Look into `string&&` also.
185 */
187 public log::Log_context
188{
189public:
190
191 // Types.
192
193 class Aggregator; // Fwd.
194
195 /**
196 * The entirety of the information recorded with each Checkpointing_timer::checkpoint() call;
197 * it contains the `Duration` values (for each `Clock_type`) with the time elapsed since either
198 * the preceding `T.checkpoint()` or construction.
199 *
200 * User doesn't usually need to access these, since the string/stream/log facilities of Checkpointing_timer and
201 * Checkpointing_timer::Aggregator would already print out the info; but this provides programmatic access if desired.
202 */
204 {
205 /// User-supplied name of the checkpoint.
206 const std::string m_name;
207 /**
208 * For the `T.checkpoint()` call that generated `*this`, for each enabled `Clock_type`: Time elapsed since either
209 * the preceding `T.checkpoint()` or (if it's the first `T.checkpoint()`) since `T` construction.
210 * Zero is recorded for each disabled `Clock_type`.
211 */
213 };
214
215 // Constructors/destructor.
216
217 /**
218 * Constructs a new timer and immediately begins measuring time, in that the next checkpoint() call will record
219 * the time passed since right now. Please read the following docs to achieve low overhead.
220 *
221 * @param name_moved
222 * Nickname for the timer, for logging and such. If you want to preserve your string, pass in a copy:
223 * `string(your_name_value)`.
224 * @param which_clocks
225 * The clocks you want sampled in each checkpoint(). Please carefully read the doc headers of all
226 * perf::Clock_type `enum` members.
227 * @param max_n_checkpoints
228 * The number of times you shall call checkpoint() shall not exceed this.
229 * A tight value saves RAM; but more importantly a *correct* upper bound guarantees no internal reallocation
230 * will occur in checkpoint() which is important for maintaining low overhead.
231 * `assert()` trips if one breaks this promise.
232 * @param logger_ptr
233 * Logger to use for subsequently logging.
234 */
235 explicit Checkpointing_timer(log::Logger* logger_ptr, std::string&& name_moved,
236 Clock_types_subset which_clocks, size_t max_n_checkpoints);
237
238 /**
239 * Constructs a timer that is identical to but entirely independent of the source object.
240 * When performed outside of performance-critical sections, this is very useful for data massaging *after*
241 * having completely measured something in a Checkpointing_timer. E.g., one can create a copy and then
242 * scale() it to obtain a mean, etc.
243 *
244 * @see Checkpointing_timer doc header for discussion of aggregation approaches.
245 *
246 * @param src
247 * Object to copy.
248 */
249 explicit Checkpointing_timer(const Checkpointing_timer& src) = default; // Yes, do allow EXPLICIT copy construction.
250
251 /// For now at least there's no reason for move-construction.
253
254 // Methods.
255
256 /// Disallow overwriting. Copy-construction exists for a specific pattern to be possible; no need for this so far.
257 void operator=(const Checkpointing_timer&) = delete;
258
259 /// No copy assignment; no move assignment either.
261
262 /**
263 * Returns the bit-set containing only those `Clock_type`s enabled that measure passage of real (wall clock) time.
264 *
265 * ### Rationale ###
266 * One application is when deciding on the constructor `which_clocks` arg (also a `Clock_types_subset`).
267 * If one wants to time some operation in such a way as to make only real-time clocks make sense, and the user is
268 * allowed to control which clock types to sample, then one can do:
269 *
270 * ~~~
271 * // They can specify a bitmask the_clock_set_from_config; then we potentially cull that further to exclude all
272 * // but the real-time clocks, because we feel only those make sense in our timing context.
273 * const auto which_clocks = Checkpointing_timer::real_clock_types() & the_clock_set_from_config;
274 * Checkpointing_timer timer(..., which_clocks, ...);
275 * // The timing has begun....
276 * ~~~
277 *
278 * @see perf::Clock_type
279 *
280 * @return See above.
281 */
283
284 /**
285 * Returns the bit-set containing only those `Clock_type`s enabled that measure passage of processor time by all
286 * threads of the calling process combined.
287 *
288 * ### Rationale ###
289 * See real_clock_types().
290 *
291 * @see perf::Clock_type
292 * @return See above.
293 */
295
296 /**
297 * Returns the bit-set containing only those `Clock_type`s enabled that measure passage of processor time by the
298 * calling thread.
299 *
300 * ### Rationale ###
301 * See real_clock_types().
302 *
303 * @see perf::Clock_type
304 * @return See above.
305 */
306 static Clock_types_subset thread_cpu_clock_types(); // Each clock measures current thread's CPU cycles.
307
308 /**
309 * Records a checkpoint, which takes a sample of all enabled `Clock_type`s' clocks and records the corresponding
310 * duration(s) since the last checkpoint() or (if none) since construction.
311 *
312 * In practice, at least one checkpoint() call is required for `*this` to be of any use at all. The final
313 * checkpoint() call's sample(s) determine the current value of since_start() which is the duration of the entire
314 * operation, as measured since `*this` construction. Intermediate calls (ones preceding the final one)
315 * are optional and represent intermediate steps and how long they each took.
316 *
317 * since_start() depends on exactly two things: when `*this` was constructed, and when the last checkpoint() call
318 * (at that time) was made.
319 *
320 * ### Naming vs. performance ###
321 * `name_moved` will be taken via move semantics, meaning no string buffer will be copied, only the pointer(s)
322 * and buffer size scalar(s) inside `std::string`. This helps performance. However, to *form* `name_moved`
323 * you will naturally spend some processor cycles which therefore risks slowing down the measured operation and
324 * polluting timing results. To avoid this, please follow the following suggestions.
325 *
326 * - Passing a constant string as follows is very cheap: `"some constant"` or `string("same")`.
327 * So if that's good enough for you, do that.
328 * - In most STL, due to the `std::string` SSO (optimization which stores a short-enough string directly inside
329 * the `string` object), if you keep the constant 15 characters or less in length, it's even cheaper;
330 * heap is not used as a result.
331 * - Constructing something via util::ostream_op_string() (and/or util::String_ostream) adds about 500 nanoseconds
332 * per checkpoint() call (e.g.: `util::ostream_op_string("some string", some_int)`) on a 2015 MacBook Pro.
333 * For many, many timing scenarios such sub-microsecond overheads are no big deal, but watch out if you're
334 * measuring something tiny-subtle.
335 * - If it is a problem, use a constant thing as in previous bullet. If that isn't sufficient, you can
336 * fairly easily use C-style direct-`char*`-writing to do it almost as quickly as that, or at least far closer
337 * to it than to the (syntactically-pretty but slow-in-*this*-context) stream-based string formation.
338 *
339 * ### Logging ###
340 * It will log a TRACE message. Hence if TRACE is enabled, timing results might be polluted, as logging can
341 * take non-trivial time.
342 *
343 * @param name_moved
344 * Nickname for the timer, for logging and such. If you want to preserve your string, pass in a copy:
345 * `string(your_name_value)`. See above regarding perf impact.
346 * @return Reference to immutable new Checkpoint `struct`, as it sits directly inside `*this`.
347 */
348 const Checkpoint& checkpoint(std::string&& name_moved);
349
350 /**
351 * Returns the duration(s) (for all enabled `Clock_type`s) between construction and the last checkpoint() call.
352 * Behavior is undefined if checkpoint() has not been called.
353 *
354 * @return See above.
355 */
357
358 /**
359 * Returns the checkpoints recorded so far. This returns a reference and thus avoids a copy. The reference is
360 * valid until `*this` destroyed.
361 *
362 * @return Checkpoints so far.
363 */
364 const std::vector<Checkpoint>& checkpoints() const;
365
366 /**
367 * Called anytime after the last checkpoint(), this modifies the data collected so far to be as if every checkpoint
368 * duration D shortened/lengthened by `mult_scale` and divided by `div_scale`. For example, if
369 * `float(mult_scale) / float(div_scale) == 0.5`, and 3 checkpoints were recorded as (6s, 4s, 12s)
370 * (`since_start()` = 12s + 4s + 6s = 22s), then after the scale() call the checkpoints will reflect
371 * (3s, 2s, 6s) (`since_start()` == 11s).
372 *
373 * In particular (and the original use case for this -- see doc header), if one recorded operation X repeated
374 * sequentially 100x, with `*this` constructed just before the first iteration and `checkpoint()`ed
375 * just after the 100th, then performing `scale(1, 100)` would effectively result in `*this` representing
376 * the mean duration of a single X iteration.
377 *
378 * Formally, behavior is undefined if one calls checkpoint() after calling scale(). Informally it is likely to
379 * result in nonsensical data for all checkpoints created starting with that one and for since_start().
380 *
381 * @param mult_scale
382 * The timer data are conceptually multiplied by this, first.
383 * @param div_scale
384 * The timer data are conceptually divided by this, next.
385 */
386 void scale(uint64_t mult_scale, uint64_t div_scale);
387
388 /**
389 * Sample all currently enabled `Clock_type`s' clocks and return those values, each of which is a time stamp
390 * relative to some Epoch value. (The Epoch value differs by Clock_type. Durations can be generated by subtracting
391 * one time point from another which mathematically makes Epoch's meaning irrelevant.)
392 * The value `Time_pt()` (internally: 0 a/k/a Epoch) is set for the disabled clocks. In particular checkpoint() will
393 * internally call this.
394 *
395 * ### Rationale ###
396 * It is unusual to use this directly by the user, as absolute time
397 * stamps aren't useful directly, while since_start(), checkpoints(),
398 * the I/O facilities, etc., provide access to the various computed durations, so one needn't do the subtracting
399 * manually. Nevertheless being able to get the raw value is useful for (1) sanity-checking more advanced results;
400 * and (2) to time the timer! (An example of the latter is when I was comparing the duration of checkpoint()
401 * with a constant `name_moved` vs. a dynamic-stream-created one vs. just getting the time stamp via now() and
402 * not calling checkpoint() at all.)
403 *
404 * @return See above.
405 */
406 Time_pt_set now() const;
407
408 /**
409 * Based on the passed in `clock_type` argument, the current time is returned for that one particular clock type.
410 *
411 * @param clock_type
412 * The type of clock to use. Please carefully read the doc headers of all
413 * perf::Clock_type `enum` members.
414 * @return The current time.
415 */
416 static Time_pt now(Clock_type clock_type);
417
418 /**
419 * Based on the passed in `which_clocks` argument, the current time is returned for the clock types enabled by it.
420 * Identical to non-`static` now() but `which_clocks` is specified as arg instead of at construction.
421 *
422 * @param which_clocks
423 * The types of clock to use. Please carefully read the doc headers of all
424 * perf::Clock_type `enum` members.
425 * @return The current time or times.
426 */
427 static Time_pt_set now(const Clock_types_subset& which_clocks);
428
429 /**
430 * Equivalent to `(*os) << (*this)`.
431 *
432 * @param os
433 * Pointer to `ostream` to which to write.
434 */
435 void output(std::ostream* os) const;
436
437 // Data.
438
439 /// The immutable human-readable name for this timer. It's `const` so no need for accessor for now.
440 const std::string m_name;
441
442private:
443 // Types.
444
445 /**
446 * `chrono` clock used, internally, for Clock_type::S_CPU_USER_LO_RES and Clock_type::S_CPU_SYS_LO_RES queries.
447 * It can obtain a measurement of both user and kernel time spent by the process concurrently.
448 * In C++20 this isn't in std.chrono yet. Go, boost.chrono!
449 *
450 * @note It returns a "real" time too. However we use other, better clocks for that.
451 */
452 using Cpu_split_clock = boost::chrono::process_cpu_clock;
453
454 /// Short-hand for the combo (user/kernel) durations-since-epoch tracked by #Cpu_split_clock.
455 using Cpu_split_clock_durs_since_epoch = Cpu_split_clock::times;
456
457 // Methods.
458
459 /**
460 * Returns the current #Cpu_split_clock time, essentially as a combo `struct` containing raw (integer) counts of
461 * user and kernel ticks since that clock's epoch reference point.
462 *
463 * ### Rationale ###
464 * It's a helper for performance and accuracy: So we can access the clock only once and be able to able to get the
465 * various values in the result at the same time. now_cpu_lo_res() can get individual values from the return value
466 * here.
467 *
468 * @return See above.
469 */
471
472 /**
473 * Returns individual `Time_pt` value returnable by now() various overloads: either the user time or system time
474 * depending on arg.
475 *
476 * @param cpu_combo_now_raw
477 * Value returned by now_cpu_lo_res_raw() (or equivalent).
478 * @param user_else_sys
479 * If `true` the user time is returned; else the system time.
480 * @return See above.
481 */
482 static Time_pt now_cpu_lo_res(const Cpu_split_clock_durs_since_epoch& cpu_combo_now_raw, bool user_else_sys);
483
484 // Data.
485
486 /**
487 * Aggregator needs access to the Checkpointing_timer innards; generally speaking, stylistically, it is fine.
488 * It is an inner class to reflect that cooperation further.
489 */
490 friend class Aggregator;
491
492 /**
493 * Internally useful constructor that initializes an ill-formed `*this` without almost no meaningful member values,
494 * specifically useful at least when creating the aggregated Checkpointing_timer from multiple "real" timers,
495 * by Aggregator. Take care to initialize all fields properly before making
496 * `*this` publicly available.
497 *
498 * @param logger_ptr
499 * See `public` constructor.
500 * @param name_moved
501 * See `public` constructor.
502 * @param which_clocks
503 * See `public` constructor.
504 */
505 explicit Checkpointing_timer(log::Logger* logger_ptr, std::string&& name_moved, Clock_types_subset which_clocks);
506
507 /**
508 * The bit-set indexed by `size_t(Clock_type::...)` wherein that clock will be sampled in now() and checkpoint()
509 * if and only if its bit is set. All zeroes is allowed and will, accordingly, result in 0 `Duration`s and
510 * `Time_pt`s in all results. Subtlety: now() doesn't work properly until this is initialized.
511 */
513
514 /**
515 * The value of now() saved in `public` constructor, marking the time point when the timing began. Doesn't change
516 * after construction. since_start() (the total duration(s)) is based on this and #m_start_when.
517 *
518 * Ideally this would be `const` but isn't for some minor reasons (see `public` constructor).
519 */
521
522 /**
523 * The value of now() recorded during the last checkpoint() call; or equals #m_start_when if checkpoint() has not
524 * yet been called. since_start() (the total duration(s)) is based on this and #m_start_when.
525 *
526 * Ideally this would be `const` but had to be left technically mutable due to certain `array<>` initialization
527 * subtleties.
528 */
530
531 /**
532 * The checkpoints recorded so far, in order, one per checkpoint() call. To avoid any reallocations, this is
533 * reserved to contain a buffer at least `max_n_checkpoints` (see ctor) elements long.
534 */
535 std::vector<Checkpoint> m_checkpoints;
536};
537
538/**
539 * This companion/inner class of Checkpointing_timer provides aggregation of results from many `Checkpointing_timer`s
540 * each measuring some user operation being repeated many times; particularly when the operation consists of multiple
541 * steps (checkpoints) of interest.
542 *
543 * Please read the aggregation section in Checkpointing_timer doc header before using this.
544 *
545 * One small technical point: To make a Checkpointing_timer X available for `Aggregator::aggregate(X)`, you must
546 * construct it on the heap and provide a Checkpointing_timer_ptr accordingly:
547 *
548 * ~~~
549 * Checkpointing_timer_ptr X(new Checkpointing_timer(...));
550 * ~~~
551 *
552 * ### Thread safety ###
553 * A reiteration of the aforementioned Checkpointing_timer doc header: Non-`const` methods are generally not safe
554 * if called concurrently with any other method on the same Aggregator. In practice this should only matter for
555 * calling aggregate(). If you are timing things concurrently happening in different threads you may need to add
556 * a mutex lock around aggregate(); and in that case make sure it doesn't adversely affect your algorithm's overall
557 * performance (or, if it does, then that is acceptable).
558 */
561 private boost::noncopyable
562{
563public:
564 // Constructors/destructor.
565
566 /**
567 * Constructs a new, empty aggregator. After this call aggregate() to add source `Checkpointing_timer`s.
568 *
569 * @param name_moved
570 * Nickname for the agg, for logging and such.
571 * @param max_n_samples
572 * The number of times you shall call aggregate() shall not exceed this.
573 * See similar (in spirit) `max_n_checkpoints` in Checkpointing_timer constructor.
574 * @param logger_ptr
575 * Logger to use for subsequently logging.
576 */
577 explicit Aggregator(log::Logger* logger_ptr, std::string&& name_moved, size_t max_n_samples);
578
579 // Methods.
580
581 /**
582 * Register given Checkpointing_timer for aggregated results. The best time to call this is immediately after
583 * the last `timer_ptr->checkpoint()` call, as that way the call itself won't be included in the timer's duration(s).
584 *
585 * The second best time to call it is immediately after `*timer_ptr` is constructed; the call is quite cheap, so
586 * it really shouldn't affect results much. The third best time is somewhere between those 2 points in time; in that
587 * case its (quite small, probably negligible) time cost will be charged to whatever checkpoint was "running" at the
588 * time.
589 *
590 * ### Requirements on aggregated timers ###
591 * In and of itself `*timer_ptr` can be any Checkpointing_timer with at least one checkpoint() call done.
592 * In addition, behavior is undefined in log_aggregated_results() and create_aggregated_result() unless all of the
593 * following hold about all `aggregate()`d timers at the time of those calls.
594 * - Their `which_clocks` values (as passed to Checkpointing_timer constructor) are all equal.
595 * - They each have the same number of checkpoints.
596 * - For each `i`: Their `i`-th Checkpointing_timer::Checkpoint::m_name values are all mutually equal.
597 *
598 * Since aggregation collates the many measurements of the same multi-step operation, these should all be common
599 * sense; if one doesn't hold for you, there is some kind of very basic problem.
600 *
601 * ### Thread safety ###
602 * See Aggregator doc header. A mutex lock around this call may be desirable depending on situation.
603 *
604 * @param timer_ptr
605 * The timer to register.
606 */
607 void aggregate(Checkpointing_timer_ptr timer_ptr);
608
609 /// Act as if aggregate() has never been called since construction.
610 void clear();
611
612 /**
613 * Called after the final Checkpointing_timer has finished (final `checkpoint()` call) and been `aggregate()`d,
614 * this `INFO`-logs human-friendly readout of the aggregated results so far. This will include all 3 types of
615 * aggregation: sum; mean; and mean-scaled-by-convenience-constant; or some specified subset thereof.
616 *
617 * ### Performance ###
618 * It is possible to call it between `aggregate()`s, perhaps to output incremental results. It may be slow and should
619 * not be called within any perf-critical section and definitely not in any measured section. The most natural
620 * time to call it is after all data collection has finished.
621 *
622 * Tip: you can pass null as the logger to the ctor of each Checkpointing_timer and even the Aggregator
623 * and only use non-null `alternate_logger_ptr_or_null`. Then the cost of should-log checks will be zero
624 * all over except here (where presumably it does not matter).
625 *
626 * @param alternate_logger_ptr_or_null
627 * If null, logs are sent to get_logger() as normal; otherwise it is sent to the supplied log::Logger instead.
628 * Since logging this output is a first-class facility within Checkpointing_timer::Aggregator -- unlike
629 * vast majority of Flow logging which is a secondary (though important in practice) nicety -- it seemed
630 * sensible to provide extra control of this particular bit of logging.
631 * Also note that any (probably TRACE) logging outside of the timer-data output will still go to
632 * get_logger().
633 * @param show_sum
634 * Whether to show the SUM aggregated output type.
635 * @param show_mean
636 * Whether to show the MEAN aggregated output type.
637 * @param mean_scale_or_zero
638 * If zero do not show SCALED-MEAN aggregated output; otherwise show it, with this value as the
639 * scale-constant. (E.g., one could show the average latency per 100,000 requests which might be
640 * more conveniently large than a per-request mean.)
641 */
642 void log_aggregated_results(log::Logger* alternate_logger_ptr_or_null,
643 bool show_sum, bool show_mean, uint64_t mean_scale_or_zero) const;
644
645 /**
646 * Provides programmatic access to the aggregation result that would be logged by log_aggregated_results() -- in
647 * the form of a returned new Checkpointing_timer that stores those numbers.
648 *
649 * log_aggregated_results() basically calls this once for each type of aggregated data it wants to see.
650 * Accordingly, if you desire programmatic access to type X of aggregated results, call this and specify the
651 * type X of interest, as args. log_aggregated_results() doc header's perf notes apply here similarly.
652 *
653 * You can also use `info_log_result` to immediately log the results, as log_aggregated_results() would for that
654 * aggregation type. Conceivably you could then ignore the return value, depending on your goals.
655 *
656 * @param alternate_logger_ptr_or_null
657 * Same meaning as for log_aggregated_results(). However here this is only relevant if `info_log_result`;
658 * otherwise ignored (null is fine).
659 * @param info_log_result
660 * If and only if `true`, INFO messages are logged with human-readable representation of
661 * the returned data, similarly to (partial) log_aggregated_results() output.
662 * Actually it simply calls log_aggregated_result_in_timer() on the returned Checkpointing_timer.
663 * @param mean_scale_or_zero
664 * If zero then return SUM aggregated output; otherwise MEAN or SCALED-MEAN aggregated output, with this
665 * value as the scale-constant (the value 1 => an un-scaled MEAN output).
666 * (E.g., one could show the average latency per 100,000 requests which might be
667 * more conveniently large than a per-request mean.)
668 * @return Ref-counted pointer to a new Checkpointing_timer that stores the aggregated results.
669 * This timer will have the same `which_clocks`, and its `checkpoints()` will have the same length and
670 * in-order names, as each of the source timers; its name and get_logger() will come from Aggregator `name`
671 * (ctor arg) and get_logger() respectively.
672 */
673 Checkpointing_timer_ptr create_aggregated_result(log::Logger* alternate_logger_ptr_or_null, bool info_log_result,
674 uint64_t mean_scale_or_zero = 1) const;
675
676 /**
677 * Given an aggregated-result Checkpointing_timer (as if returned by create_aggregated_result()) logs the SUM or MEAN
678 * or SCALED-MEAN stored in it as INFO messages.
679 *
680 * Publicly, it exists for one use case (as of this writing), namely to log a "manually" compiled aggregated-result
681 * Checkpointing_timer. This *pattern* is explained in the Checkpointing_timer doc header in the aggregation section.
682 *
683 * Naturally, it is also used internally by create_aggregated_result() and log_aggregated_results(), hence why they
684 * all produce similar output.
685 *
686 * @param logger_ptr
687 * Same meaning as in log_aggregated_results(), except null is not a valid value (behavior is undefined).
688 * @param agg_timer
689 * Aggregated-result Checkpointing_timer, as if returned by create_aggregated_result().
690 * Behavior is undefined unless it was, in fact, returned by that method (or is a copy of such);
691 * or unless it was compiled as described in the aforementioned pattern, documented in Checkpointing_timer
692 * doc header.
693 * @param n_samples
694 * For appropriate logging, this must contain the # of samples (repetitions of the measured operation)
695 * collected to yield `agg_timer`.
696 * @param mean_scale_or_zero
697 * For appropriate logging, this must contain the type of aggregated numbers that are stored in `agg_timer`.
698 * See create_aggregated_result().
699 */
700 static void log_aggregated_result_in_timer(log::Logger* logger_ptr,
701 const Checkpointing_timer& agg_timer, unsigned int n_samples,
702 uint64_t mean_scale_or_zero = 1);
703
704 // Data.
705
706 /// The immutable human-readable name for this timer. It's `const` so no need for accessor for now.
707 const std::string m_name;
708
709private:
710 // Data.
711
712 /// The ref-counted pointers to source timers passed to aggregate() in that order.
713 std::vector<Checkpointing_timer_ptr> m_timers;
714}; // class Checkpointing_timer::Aggregator
715
716// Free functions: in *_fwd.hpp.
717
718// However the following refer to inner class and hence must be declared here and not _fwd.hpp.
719
720/**
721 * Prints string representation of the given `Checkpoint` to the given `ostream`. See
722 * Checkpointing_timer::checkpoint() and Checkpointing_timer::checkpoints().
723 *
724 * @relatesalso Checkpointing_timer::Checkpoint
725 *
726 * @param os
727 * Stream to which to write.
728 * @param checkpoint
729 * Object to serialize.
730 * @return `os`.
731 */
732std::ostream& operator<<(std::ostream& os, const Checkpointing_timer::Checkpoint& checkpoint);
733
734} // namespace flow::perf
Convenience class that simply stores a Logger and/or Component passed into a constructor; and returns...
Definition: log.hpp:1612
Interface that the user should implement, passing the implementing Logger into logging classes (Flow'...
Definition: log.hpp:1284
This companion/inner class of Checkpointing_timer provides aggregation of results from many Checkpoin...
void aggregate(Checkpointing_timer_ptr timer_ptr)
Register given Checkpointing_timer for aggregated results.
void log_aggregated_results(log::Logger *alternate_logger_ptr_or_null, bool show_sum, bool show_mean, uint64_t mean_scale_or_zero) const
Called after the final Checkpointing_timer has finished (final checkpoint() call) and been aggregate(...
const std::string m_name
The immutable human-readable name for this timer. It's const so no need for accessor for now.
static void log_aggregated_result_in_timer(log::Logger *logger_ptr, const Checkpointing_timer &agg_timer, unsigned int n_samples, uint64_t mean_scale_or_zero=1)
Given an aggregated-result Checkpointing_timer (as if returned by create_aggregated_result()) logs th...
Checkpointing_timer_ptr create_aggregated_result(log::Logger *alternate_logger_ptr_or_null, bool info_log_result, uint64_t mean_scale_or_zero=1) const
Provides programmatic access to the aggregation result that would be logged by log_aggregated_results...
Aggregator(log::Logger *logger_ptr, std::string &&name_moved, size_t max_n_samples)
Constructs a new, empty aggregator.
void clear()
Act as if aggregate() has never been called since construction.
std::vector< Checkpointing_timer_ptr > m_timers
The ref-counted pointers to source timers passed to aggregate() in that order.
The central class in the perf Flow module, this efficiently times the user's operation,...
void output(std::ostream *os) const
Equivalent to (*os) << (*this).
Checkpointing_timer(Checkpointing_timer &&)=delete
For now at least there's no reason for move-construction.
const std::vector< Checkpoint > & checkpoints() const
Returns the checkpoints recorded so far.
const std::string m_name
The immutable human-readable name for this timer. It's const so no need for accessor for now.
static Clock_types_subset process_cpu_clock_types()
Returns the bit-set containing only those Clock_types enabled that measure passage of processor time ...
void scale(uint64_t mult_scale, uint64_t div_scale)
Called anytime after the last checkpoint(), this modifies the data collected so far to be as if every...
Time_pt_set now() const
Sample all currently enabled Clock_types' clocks and return those values, each of which is a time sta...
void operator=(Checkpointing_timer &&)=delete
No copy assignment; no move assignment either.
Time_pt_set m_last_checkpoint_when
The value of now() recorded during the last checkpoint() call; or equals m_start_when if checkpoint()...
static Cpu_split_clock_durs_since_epoch now_cpu_lo_res_raw()
Returns the current Cpu_split_clock time, essentially as a combo struct containing raw (integer) coun...
static Time_pt now_cpu_lo_res(const Cpu_split_clock_durs_since_epoch &cpu_combo_now_raw, bool user_else_sys)
Returns individual Time_pt value returnable by now() various overloads: either the user time or syste...
Checkpointing_timer(const Checkpointing_timer &src)=default
Constructs a timer that is identical to but entirely independent of the source object.
static Clock_types_subset real_clock_types()
Returns the bit-set containing only those Clock_types enabled that measure passage of real (wall cloc...
std::vector< Checkpoint > m_checkpoints
The checkpoints recorded so far, in order, one per checkpoint() call.
const Checkpoint & checkpoint(std::string &&name_moved)
Records a checkpoint, which takes a sample of all enabled Clock_types' clocks and records the corresp...
Checkpointing_timer(log::Logger *logger_ptr, std::string &&name_moved, Clock_types_subset which_clocks, size_t max_n_checkpoints)
Constructs a new timer and immediately begins measuring time, in that the next checkpoint() call will...
static Clock_types_subset thread_cpu_clock_types()
Returns the bit-set containing only those Clock_types enabled that measure passage of processor time ...
Time_pt_set m_start_when
The value of now() saved in public constructor, marking the time point when the timing began.
const Clock_types_subset m_which_clocks
The bit-set indexed by size_t(Clock_type::...) wherein that clock will be sampled in now() and checkp...
void operator=(const Checkpointing_timer &)=delete
Disallow overwriting. Copy-construction exists for a specific pattern to be possible; no need for thi...
boost::chrono::process_cpu_clock Cpu_split_clock
chrono clock used, internally, for Clock_type::S_CPU_USER_LO_RES and Clock_type::S_CPU_SYS_LO_RES que...
Cpu_split_clock::times Cpu_split_clock_durs_since_epoch
Short-hand for the combo (user/kernel) durations-since-epoch tracked by Cpu_split_clock.
Duration_set since_start() const
Returns the duration(s) (for all enabled Clock_types) between construction and the last checkpoint() ...
Flow module containing tools for profiling and optimization.
boost::shared_ptr< Checkpointing_timer > Checkpointing_timer_ptr
Short-hand for ref-counting pointer to Checkpointing_timer.
Definition: perf_fwd.hpp:43
std::ostream & operator<<(std::ostream &os, const Checkpointing_timer::Checkpoint &checkpoint)
Prints string representation of the given Checkpoint to the given ostream.
std::bitset< size_t(Clock_type::S_END_SENTINEL)> Clock_types_subset
Short-hand for a bit-set of N bits which represents the presence or absence of each of the N possible...
Fine_time_pt Time_pt
Short-hand for a high-precision boost.chrono point in time, formally equivalent to flow::Fine_time_pt...
Clock_type
Clock types supported by flow::perf module facilities, perf::Checkpointing_timer in particular.
The entirety of the information recorded with each Checkpointing_timer::checkpoint() call; it contain...
const std::string m_name
User-supplied name of the checkpoint.
Duration_set m_since_last
For the T.checkpoint() call that generated *this, for each enabled Clock_type: Time elapsed since eit...
Convenience wrapper around an array<Duration, N>, which stores a duration for each of the N possible ...
Convenience wrapper around an array<Time_pt, N>, which stores a time point for each of the N possible...