Flow 1.0.2
Flow project: Full implementation reference.
checkpt_timer.hpp
Go to the documentation of this file.
1/* Flow
2 * Copyright 2023 Akamai Technologies, Inc.
3 *
4 * Licensed under the Apache License, Version 2.0 (the
5 * "License"); you may not use this file except in
6 * compliance with the License. You may obtain a copy
7 * of the License at
8 *
9 * https://www.apache.org/licenses/LICENSE-2.0
10 *
11 * Unless required by applicable law or agreed to in
12 * writing, software distributed under the License is
13 * distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
14 * CONDITIONS OF ANY KIND, either express or implied.
15 * See the License for the specific language governing
16 * permissions and limitations under the License. */
17
18/// @file
19#pragma once
20
23#include "flow/log/log.hpp"
24#include <boost/chrono/process_cpu_clocks.hpp>
25#include <string>
26#include <vector>
27
28namespace flow::perf
29{
30
31/**
32 * The central class in the `perf` Flow module, this efficiently times the user's operation, with a specified subset
33 * of timing methods; and with the optional ability to time intermediate *checkpoints* within the overall operation.
34 *
35 * ### How to use ###
36 * To use this: Construct a Checkpointing_timer; perform the operation in question; then access results via accessors;
37 * including in particular the string/stream output operation that lays out results in human-friendly fashion.
38 * Optionally, call checkpoint() to mark down intermediate timing results, so that the operation can be broken down
39 * into its component parts (the string/stream output will include these). This is optional, except that at least
40 * 1 checkpoint is required: the last checkpoint represents the end of the measured operation.
41 *
42 * The actual measurements are performed by sampling a time stamp from *each* of N clocks, where N <= M, and M is the
43 * number of values in perf::Clock_type `enum`. The N clocks are specified at construction by user; thus can measure
44 * just 1, in particular, as is often desired. This is a critical and potentially subtle decision; see doc headers
45 * of each perf::Clock_type member for tips.
46 *
47 * ### Performance ###
48 * This is a performance-measuring facility, and the performance of a performance-measuring-thing is quite important,
49 * especially since some of the things being timed can be very quick (single-digit microseconds potentially).
50 *
51 * Extreme care has been taken to keep the computational fat of all the book-keeping in the class very low
52 * (and empirical testing shows this was successful). In the cases where user cooperation is necessary to avoid undoing
53 * this, the doc headers point it out. Please read docs carefully to avoid timing-technique traps that might lead to
54 * wrong conclusions and frustration.
55 *
56 * ### Thread safety ###
57 * Checkpointing_timer itself is not thread-safe for non-`const` access to an object while any other access occurs
58 * concurrently. (Internally, there is no locking.) However, the nature of how one uses a timer object is that
59 * one does stuff to be measured, checkpoint(), more stuff, checkpoint(), ..., final checkpoint(). In other words,
60 * by its very nature, it expects only sequential non-`const` calls while measuring times -- so thread safety
61 * should not come up. (However, thread safety enters the picture with Checkpointing_timer::Aggregator; see below.)
62 *
63 * Aggregation
64 * -----------
65 * `Duration` result *aggregation* is the common-sense idea that to accurately measure the typical length of operation X
66 * is to repeat X many times (N times) and then view a sum-over-N-samples and/or mean-over-N-samples and/or
67 * mean-over-N-samples-scaled-times-M, where M is usually some convenience constant like 1,000,000. There are
68 * two patterns we provide/suggest to accomplish aggregation.
69 *
70 * ### Lowest-overhead aggregation approach: Single Checkpointing_timer ###
71 * Suppose you have operation X and are *not* interested in subdividing it into checkpoints; you only care about the
72 * total duration(s) (according to which `Clock_type`s interest you). Then the following technique leads to the
73 * lowest possible overhead:
74 *
75 * ~~~
76 * flow::perf::Checkpointing_timer sum_timer("op name", which_clocks, 1, logger); // 1 checkpoint only.
77 * const unsigned int n_samples = 1000000;
78 * for (unsigned int sample_idx = 0; sample_idx != n_samples; ++sample_idx)
79 * {
80 * // ...Operation X being measured goes here.
81 * }
82 * sum_timer.checkpoint("total"); // Mark down the total time taken.
83 * // Log the total duration(s), one per clock type!
84 * flow::perf::Checkpointing_timer::Aggregator::log_aggregated_result_in_timer(sum_timer, n_samples, false);
85 * // And/or: Log mean duration(s), times N_SAMPLES_SCALE_CONVENIENCE, one per clock type!
86 * {
87 * // Just make a copy of the raw sum, then scale it x N_SAMPLES_SCALE_CONVENIENCE / n_samples.
88 * auto mean_scaled_timer(sum_timer);
89 * mean_scaled_timer.scale(N_SAMPLES_SCALE_CONVENIENCE, n_samples);
90 * flow::perf::Checkpointing_timer::Aggregator::log_aggregated_result_in_timer(mean_scaled_timer, n_samples,
91 * true, N_SAMPLES_SCALE_CONVENIENCE);
92 * }
93 * ~~~
94 *
95 * This involves minimal overhead, as no timing calls occur at all until X is repeated a million times.
96 *
97 * ### Cross-product aggregation approach: Multiple `Checkpointing_timer`s + Checkpointing_timer::Aggregator ###
98 * If, by contrast, you want to divide X into checkpoints and see (say) how long checkpoint 3 takes on average
99 * (as well as the total as above), then use the dedicated Checkpointing_timer::Aggregator class for that purpose.
100 * This approach also allows one to do stuff (possibly unrelated stuff) between the many repetitions of operation X.
101 * Only X itself will be timed.
102 *
103 * To use `Aggregator`, construct one; then simply create a Checkpointing_timer for each repetition of X and time it
104 * as normal. Before or after each repetition, register it inside the agg using Aggregator::aggregate() which takes
105 * a pointer to Checkpointing_timer. The only requirement is that the number and respective names of the checkpoints
106 * in each Checkpointing_timer are the same for all of them.
107 *
108 * Once enough samples (`n_samples` in earlier example) have been thus collected, Aggregator::log_aggregated_results()
109 * will log every potential form (sum, mean, scaled-mean), not just for the total operation X but also
110 * each checkpoint(). For example, if X consists of steps X1, X2, X3, then one would see the mean duration of X1,
111 * X2, and X3 individually and their means would add up to the mean for all of X.
112 *
113 * This is quite powerful; and while the overhead of creating individual Checkpointing_timer objects, say, *might*
114 * affect the overall performance of the algorithm involving X, it shouldn't have any direct effect on the values
115 * actually being measured, since only each X invocation itself is measured; stuff in-between (slow or not) isn't.
116 * However:
117 * - Checkpointing_timer::Aggregator::aggregate() isn't thread-safe for concurrent access (so if that's relevant,
118 * you'll need a mutex lock around the call).
119 * - Always ensure you understand the structure of your measurement plan as it relates to concurrency.
120 * For example, it's meaningless to measure processor-time cost of X being done repeatedly, if various incarnations
121 * of X might be performed partially concurrently (say, if X = handling 1 HTTP request in a multi-threaded server
122 * under load). In that case processor-times will overlap and produce essentially meaningless (at the very least
123 * misleading) results. On the other hand, real-time measurements (Clock_type::S_REAL_HI_RES) make plenty of sense
124 * in that situation and would indeed reliably measure latency. Thread-time measurements
125 * (Clock_type::S_CPU_THREAD_TOTAL_HI_RES) can also make sense. Just saying: do not assume all clock types
126 * make sense in the face of concurrent operations; plan mindfully.
127 *
128 * @see Checkpointing_timer::Aggregator.
129 *
130 * @internal
131 *
132 * Implementation/design notes
133 * ---------------------------
134 * On the one hand we wanted to be able to measure multiple clock types; and since at least some of them are
135 * mutually complementary (Clock_type::S_CPU_USER_LO_RES and Clock_type::S_CPU_SYS_LO_RES happen to measure the
136 * user + kernel components of overall processor-time) they need to be sampled simultaneously. On the other hand
137 * we know clock sampling itself can take non-trivial time; so it must be possible to specify the clocks of interest.
138 *
139 * Also, we wanted to be able to programmatically and/or via logs provide convenient human access to the results,
140 * including optional ability to time the various parts of a multi-step operation and then see which part took how
141 * long; a/k/a checkpoints.
142 *
143 * On the other hand it all needs to be maximally fast, adding very low overhead to the operation being measured.
144 * The way I thought of this was considering the absolute bare minimum of measuring durations: one simply samples the
145 * clock and records it somehow very fast (stack variable perhaps); then at the end one can take time point differences
146 * and log some durations. This is the minimum; so every computational addition to that must be considered for impact.
147 * In the end, it was successful; as measured by actually raw-timing the timing facilities. How is this accomplished?
148 *
149 * - The core operation is checkpoint(), when a final or intermediate time point is sampled; so that's the thing
150 * that must be fast.
151 * - Each sample is recorded in a Duration_set which is a *statically* sized array of N `Duration` values;
152 * and each Duration (chrono.duration) is a simple nanosecond (or whatever is the unit of flow::Fine_clock, but
153 * let's say nanoseconds) count; N is the # of clocks available.
154 * - The samples are recorded in a simple `vector<>`, so basically it is an array of nanosecond (or whatever)
155 * counts, with no pointer indirection inside each element.
156 * - Moreover, the `vector<>` is preallocated to be large enough to store the max # of checkpoints.
157 * The user specifies this maximum at construction, guaranteeing (re-)allocation never occurs while stuff is
158 * being timed. In other words checkpoint() -- when it gets down to the low level of what is actually records --
159 * places the sampled nanosecond count (for each clock) into a preallocated buffer whose address/size doesn't
160 * change, at the address that's simply `i * sizeof(<N nanosecond counts tightly packed>)`, where `i` is the
161 * count of checkpoints already recorded.
162 * - In a nod to the goal of making things human-readable, we add one more thing into the data structures explained
163 * so far. Each checkpoint (of which there is at least 1, the final one indicating total elapsed time) therefore
164 * has a string name supplied by user in checkpoint() call. Hence instead of storing the `Duration_set`s
165 * directly in the array, instead a Checkpoint `struct` wraps the Duration_set and name; but this doesn't add any
166 * per-sample pointer indirection, so in term of performance it only increases the above `sizeof()` to (internally)
167 * also include the `std::string` stuff (probably `char*` pointer and a couple `size_t`s).
168 * - Using move semantics, the user's `std::string` is not even copied when it is recorded; only the internal
169 * `string` buffer pointer and sizes are copied.
170 * - However, the user must form the `string` in the first place! This is a potential gotcha albeit only when
171 * one times very short operations. I actually performed analysis of the cost of this. The outcome is as
172 * follows:
173 * - A nearly-constant checkpoint name (like this: `T.checkpoint(string("sample"));`) is essentially free.
174 * - A dynamically computed checkpoint name, using an improved version of `ostringstream` -
175 * `T.checkpoint(util::ostream_op_string("sample", sample_idx));` - has some cost: ~0.5 microseconds
176 * per sample (MacBook Pro 2015). Whether this matters or not depends on the op one is measuring, but
177 * in my experience it's significant only for the most subtly-tiny ops. In any case, the official doc header
178 * for checkpoint() makes suggestions accordingly. (Spoiler alert: Ultimately - if necessary - one can always
179 * use a constant checkpoint name, or construct a dynamic name via ultra-fast C-style `char*` direct-writing.)
180 *
181 * The bottom line is, there's very low overhead added, as long as the user mindfully provides the max # of checkpoints
182 * at construction; and doesn't waste time on fancily naming the checkpoints.
183 *
184 * @todo `const std::string&` to `String_view` in all of flow::perf where applicable. Look into `string&&` also.
185 */
187 public log::Log_context
188{
189public:
190
191 // Types.
192
193 class Aggregator; // Fwd.
194
195 /**
196 * The entirety of the information recorded with each Checkpointing_timer::checkpoint() call;
197 * it contains the `Duration` values (for each `Clock_type`) with the time elapsed since either
198 * the preceding `T.checkpoint()` or construction.
199 *
200 * User doesn't usually need to access these, since the string/stream/log facilities of Checkpointing_timer and
201 * Checkpointing_timer::Aggregator would already print out the info; but this provides programmatic access if desired.
202 */
204 {
205 /// User-supplied name of the checkpoint.
206 const std::string m_name;
207 /**
208 * For the `T.checkpoint()` call that generated `*this`, for each enabled `Clock_type`: Time elapsed since either
209 * the preceding `T.checkpoint()` or (if it's the first `T.checkpoint()`) since `T` construction.
210 * Zero is recorded for each disabled `Clock_type`.
211 */
213 };
214
215 // Constructors/destructor.
216
217 /**
218 * Constructs a new timer and immediately begins measuring time, in that the next checkpoint() call will record
219 * the time passed since right now. Please read the following docs to achieve low overhead.
220 *
221 * @param name_moved
222 * Nickname for the timer, for logging and such. If you want to preserve your string, pass in a copy:
223 * `string(your_name_value)`.
224 * @param which_clocks
225 * The clocks you want sampled in each checkpoint(). Please carefully read the doc headers of all
226 * perf::Clock_type `enum` members.
227 * @param max_n_checkpoints
228 * The number of times you shall call checkpoint() shall not exceed this.
229 * A tight value saves RAM; but more importantly a *correct* upper bound guarantees no internal reallocation
230 * will occur in checkpoint() which is important for maintaining low overhead.
231 * `assert()` trips if one breaks this promise.
232 * @param logger_ptr
233 * Logger to use for subsequently logging.
234 */
235 explicit Checkpointing_timer(log::Logger* logger_ptr, std::string&& name_moved,
236 Clock_types_subset which_clocks, size_t max_n_checkpoints);
237
238 /**
239 * Constructs a timer that is identical to but entirely independent of the source object.
240 * When performed outside of performance-critical sections, this is very useful for data massaging *after*
241 * having completely measured something in a Checkpointing_timer. E.g., one can create a copy and then
242 * scale() it to obtain a mean, etc.
243 *
244 * @see Checkpointing_timer doc header for discussion of aggregation approaches.
245 *
246 * @param src
247 * Object to copy.
248 */
249 explicit Checkpointing_timer(const Checkpointing_timer& src) = default; // Yes, do allow EXPLICIT copy construction.
250
251 /// For now at least there's no reason for move-construction.
253
254 // Methods.
255
256 /// Disallow overwriting. Copy-construction exists for a specific pattern to be possible; no need for this so far.
257 void operator=(const Checkpointing_timer&) = delete;
258
259 /// No copy assignment; no move assignment either.
261
262 /**
263 * Returns the bit-set containing only those `Clock_type`s enabled that measure passage of real (wall clock) time.
264 *
265 * ### Rationale ###
266 * One application is when deciding on the constructor `which_clocks` arg (also a `Clock_types_subset`).
267 * If one wants to time some operation in such a way as to make only real-time clocks make sense, and the user is
268 * allowed to control which clock types to sample, then one can do:
269 *
270 * ~~~
271 * // They can specify a bitmask the_clock_set_from_config; then we potentially cull that further to exclude all
272 * // but the real-time clocks, because we feel only those make sense in our timing context.
273 * const auto which_clocks = Checkpointing_timer::real_clock_types() & the_clock_set_from_config;
274 * Checkpointing_timer timer(..., which_clocks, ...);
275 * // The timing has begun....
276 * ~~~
277 *
278 * @see perf::Clock_type
279 *
280 * @return See above.
281 */
283
284 /**
285 * Returns the bit-set containing only those `Clock_type`s enabled that measure passage of processor time by all
286 * threads of the calling process combined.
287 *
288 * ### Rationale ###
289 * See real_clock_types().
290 *
291 * @see perf::Clock_type
292 * @return See above.
293 */
295
296 /**
297 * Returns the bit-set containing only those `Clock_type`s enabled that measure passage of processor time by the
298 * calling thread.
299 *
300 * ### Rationale ###
301 * See real_clock_types().
302 *
303 * @see perf::Clock_type
304 * @return See above.
305 */
306 static Clock_types_subset thread_cpu_clock_types(); // Each clock measures current thread's CPU cycles.
307
308 /**
309 * Records a checkpoint, which takes a sample of all enabled `Clock_type`s' clocks and records the corresponding
310 * duration(s) since the last checkpoint() or (if none) since construction.
311 *
312 * In practice, at least one checkpoint() call is required for `*this` to be of any use at all. The final
313 * checkpoint() call's sample(s) determine the current value of since_start() which is the duration of the entire
314 * operation, as measured since `*this` construction. Intermediate calls (ones preceding the final one)
315 * are optional and represent intermediate steps and how long they each took.
316 *
317 * since_start() depends on exactly two things: when `*this` was constructed, and when the last checkpoint() call
318 * (at that time) was made.
319 *
320 * ### Naming vs. performance ###
321 * `name_moved` will be taken via move semantics, meaning no string buffer will be copied, only the pointer(s)
322 * and buffer size scalar(s) inside `std::string`. This helps performance. However, to *form* `name_moved`
323 * you will naturally spend some processor cycles which therefore risks slowing down the measured operation and
324 * polluting timing results. To avoid this, please follow the following suggestions.
325 *
326 * - Passing a constant string as follows is very cheap: `string("some constant")`. All similar constructions
327 * from a pre-made `const char*` or `std::string` are very cheap. So if that's good enough for you, do that.
328 * - Constructing something via util::ostream_op_string() (and/or util::String_ostream) adds about 500 nanoseconds
329 * per checkpoint() call (e.g.: `util::ostream_op_string("some string", some_int)`) on a 2015 MacBook Pro.
330 * For many, many timing scenarios such sub-microsecond overheads are no big deal, but watch out if you're
331 * measuring something tiny-subtle.
332 * - If it is a problem, use a constant thing as in previous bullet. If that isn't sufficient, you can
333 * fairly easily use C-style direct-`char*`-writing to do it almost as quickly as that, or at least far closer
334 * to it than to the (syntactically-pretty but slow-in-*this*-context) stream-based string formation.
335 *
336 * ### Logging ###
337 * It will log a TRACE message. Hence if TRACE is enabled, timing results might be polluted, as logging can
338 * take non-trivial time.
339 *
340 * @param name_moved
341 * Nickname for the timer, for logging and such. If you want to preserve your string, pass in a copy:
342 * `string(your_name_value)`.
343 * @return Reference to immutable new Checkpoint `struct`, as it sits directly inside `*this`.
344 */
345 const Checkpoint& checkpoint(std::string&& name_moved);
346
347 /**
348 * Returns the duration(s) (for all enabled `Clock_type`s) between construction and the last checkpoint() call.
349 * Behavior is undefined if checkpoint() has not been called.
350 *
351 * @return See above.
352 */
354
355 /**
356 * Returns the checkpoints recorded so far. This returns a reference and thus avoids a copy. The reference is
357 * valid until `*this` destroyed.
358 *
359 * @return Checkpoints so far.
360 */
361 const std::vector<Checkpoint>& checkpoints() const;
362
363 /**
364 * Called anytime after the last checkpoint(), this modifies the data collected so far to be as if every checkpoint
365 * duration D shortened/lengthened by `mult_scale` and divided by `div_scale`. For example, if
366 * `float(mult_scale) / float(div_scale) == 0.5`, and 3 checkpoints were recorded as (6s, 4s, 12s)
367 * (`since_start()` = 12s + 4s + 6s = 22s), then after the scale() call the checkpoints will reflect
368 * (3s, 2s, 6s) (`since_start()` == 11s).
369 *
370 * In particular (and the original use case for this -- see doc header), if one recorded operation X repeated
371 * sequentially 100x, with `*this` constructed just before the first iteration and `checkpoint()`ed
372 * just after the 100th, then performing `scale(1, 100)` would effectively result in `*this` representing
373 * the mean duration of a single X iteration.
374 *
375 * Formally, behavior is undefined if one calls checkpoint() after calling scale(). Informally it is likely to
376 * result in nonsensical data for all checkpoints created starting with that one and for since_start().
377 *
378 * @param mult_scale
379 * The timer data are conceptually multiplied by this, first.
380 * @param div_scale
381 * The timer data are conceptually divided by this, next.
382 */
383 void scale(uint64_t mult_scale, uint64_t div_scale);
384
385 /**
386 * Sample all currently enabled `Clock_type`s' clocks and return those values, each of which is a time stamp
387 * relative to some Epoch value. (The Epoch value differs by Clock_type. Durations can be generated by subtracting
388 * one time point from another which mathematically makes Epoch's meaning irrelevant.)
389 * The value `Time_pt()` (internally: 0 a/k/a Epoch) is set for the disabled clocks. In particular checkpoint() will
390 * internally call this.
391 *
392 * ### Rationale ###
393 * It is unusual to use this directly by the user, as absolute time
394 * stamps aren't useful directly, while since_start(), checkpoints(),
395 * the I/O facilities, etc., provide access to the various computed durations, so one needn't do the subtracting
396 * manually. Nevertheless being able to get the raw value is useful for (1) sanity-checking more advanced results;
397 * and (2) to time the timer! (An example of the latter is when I was comparing the duration of checkpoint()
398 * with a constant `name_moved` vs. a dynamic-stream-created one vs. just getting the time stamp via now() and
399 * not calling checkpoint() at all.)
400 *
401 * @return See above.
402 */
403 Time_pt_set now() const;
404
405 /**
406 * Based on the passed in `clock_type` argument, the current time is returned for that one particular clock type.
407 *
408 * @param clock_type
409 * The type of clock to use. Please carefully read the doc headers of all
410 * perf::Clock_type `enum` members.
411 * @return The current time.
412 */
413 static Time_pt now(Clock_type clock_type);
414
415 /**
416 * Based on the passed in `which_clocks` argument, the current time is returned for the clock types enabled by it.
417 * Identical to non-`static` now() but `which_clocks` is specified as arg instead of at construction.
418 *
419 * @param which_clocks
420 * The types of clock to use. Please carefully read the doc headers of all
421 * perf::Clock_type `enum` members.
422 * @return The current time or times.
423 */
424 static Time_pt_set now(const Clock_types_subset& which_clocks);
425
426 /**
427 * Equivalent to `(*os) << (*this)`.
428 *
429 * @param os
430 * Pointer to `ostream` to which to write.
431 */
432 void output(std::ostream* os) const;
433
434 // Data.
435
436 /// The immutable human-readable name for this timer. It's `const` so no need for accessor for now.
437 const std::string m_name;
438
439private:
440 // Types.
441
442 /**
443 * `chrono` clock used, internally, for Clock_type::S_CPU_USER_LO_RES and Clock_type::S_CPU_SYS_LO_RES queries.
444 * It can obtain a measurement of both user and kernel time spent by the process concurrently.
445 * In C++20 this isn't in std.chrono yet. Go, boost.chrono!
446 *
447 * @note It returns a "real" time too. However we use other, better clocks for that.
448 */
449 using Cpu_split_clock = boost::chrono::process_cpu_clock;
450
451 /// Short-hand for the combo (user/kernel) durations-since-epoch tracked by #Cpu_split_clock.
452 using Cpu_split_clock_durs_since_epoch = Cpu_split_clock::times;
453
454 // Methods.
455
456 /**
457 * Returns the current #Cpu_split_clock time, essentially as a combo `struct` containing raw (integer) counts of
458 * user and kernel ticks since that clock's epoch reference point.
459 *
460 * ### Rationale ###
461 * It's a helper for performance and accuracy: So we can access the clock only once and be able to able to get the
462 * various values in the result at the same time. now_cpu_lo_res() can get individual values from the return value
463 * here.
464 *
465 * @return See above.
466 */
468
469 /**
470 * Returns individual `Time_pt` value returnable by now() various overloads: either the user time or system time
471 * depending on arg.
472 *
473 * @param cpu_combo_now_raw
474 * Value returned by now_cpu_lo_res_raw() (or equivalent).
475 * @param user_else_sys
476 * If `true` the user time is returned; else the system time.
477 * @return See above.
478 */
479 static Time_pt now_cpu_lo_res(const Cpu_split_clock_durs_since_epoch& cpu_combo_now_raw, bool user_else_sys);
480
481 // Data.
482
483 /**
484 * Aggregator needs access to the Checkpointing_timer innards; generally speaking, stylistically, it is fine.
485 * It is an inner class to reflect that cooperation further.
486 */
487 friend class Aggregator;
488
489 /**
490 * Internally useful constructor that initializes an ill-formed `*this` without almost no meaningful member values,
491 * specifically useful at least when creating the aggregated Checkpointing_timer from multiple "real" timers,
492 * by Aggregator. Take care to initialize all fields properly before making
493 * `*this` publicly available.
494 *
495 * @param logger_ptr
496 * See `public` constructor.
497 * @param name_moved
498 * See `public` constructor.
499 * @param which_clocks
500 * See `public` constructor.
501 */
502 explicit Checkpointing_timer(log::Logger* logger_ptr, std::string&& name_moved, Clock_types_subset which_clocks);
503
504 /**
505 * The bit-set indexed by `size_t(Clock_type::...)` wherein that clock will be sampled in now() and checkpoint()
506 * if and only if its bit is set. All zeroes is allowed and will, accordingly, result in 0 `Duration`s and
507 * `Time_pt`s in all results. Subtlety: now() doesn't work properly until this is initialized.
508 */
510
511 /**
512 * The value of now() saved in `public` constructor, marking the time point when the timing began. Doesn't change
513 * after construction. since_start() (the total duration(s)) is based on this and #m_start_when.
514 *
515 * Ideally this would be `const` but isn't for some minor reasons (see `public` constructor).
516 */
518
519 /**
520 * The value of now() recorded during the last checkpoint() call; or equals #m_start_when if checkpoint() has not
521 * yet been called. since_start() (the total duration(s)) is based on this and #m_start_when.
522 *
523 * Ideally this would be `const` but had to be left technically mutable due to certain `array<>` initialization
524 * subtleties.
525 */
527
528 /**
529 * The checkpoints recorded so far, in order, one per checkpoint() call. To avoid any reallocations, this is
530 * reserved to contain a buffer at least `max_n_checkpoints` (see ctor) elements long.
531 */
532 std::vector<Checkpoint> m_checkpoints;
533};
534
535/**
536 * This companion/inner class of Checkpointing_timer provides aggregation of results from many `Checkpointing_timer`s
537 * each measuring some user operation being repeated many times; particularly when the operation consists of multiple
538 * steps (checkpoints) of interest.
539 *
540 * Please read the aggregation section in Checkpointing_timer doc header before using this.
541 *
542 * One small technical point: To make a Checkpointing_timer X available for `Aggregator::aggregate(X)`, you must
543 * construct it on the heap and provide a Checkpointing_timer_ptr accordingly:
544 *
545 * ~~~
546 * Checkpointing_timer_ptr X(new Checkpointing_timer(...));
547 * ~~~
548 *
549 * ### Thread safety ###
550 * A reiteration of the aforementioned Checkpointing_timer doc header: Non-`const` methods are generally not safe
551 * if called concurrently with any other method on the same Aggregator. In practice this should only matter for
552 * calling aggregate(). If you are timing things concurrently happening in different threads you may need to add
553 * a mutex lock around aggregate(); and in that case make sure it doesn't adversely affect your algorithm's overall
554 * performance (or, if it does, then that is acceptable).
555 */
558 private boost::noncopyable
559{
560public:
561 // Constructors/destructor.
562
563 /**
564 * Constructs a new, empty aggregator. After this call aggregate() to add source `Checkpointing_timer`s.
565 *
566 * @param name_moved
567 * Nickname for the agg, for logging and such.
568 * @param max_n_samples
569 * The number of times you shall call aggregate() shall not exceed this.
570 * See similar (in spirit) `max_n_checkpoints` in Checkpointing_timer constructor.
571 * @param logger_ptr
572 * Logger to use for subsequently logging.
573 */
574 explicit Aggregator(log::Logger* logger_ptr, std::string&& name_moved, size_t max_n_samples);
575
576 // Methods.
577
578 /**
579 * Register given Checkpointing_timer for aggregated results. The best time to call this is immediately after
580 * the last `timer_ptr->checkpoint()` call, as that way the call itself won't be included in the timer's duration(s).
581 *
582 * The second best time to call it is immediately after `*timer_ptr` is constructed; the call is quite cheap, so
583 * it really shouldn't affect results much. The third best time is somewhere between those 2 points in time; in that
584 * case its (quite small, probably negligible) time cost will be charged to whatever checkpoint was "running" at the
585 * time.
586 *
587 * ### Requirements on aggregated timers ###
588 * In and of itself `*timer_ptr` can be any Checkpointing_timer with at least one checkpoint() call done.
589 * In addition, behavior is undefined in log_aggregated_results() and create_aggregated_result() unless all of the
590 * following hold about all `aggregate()`d timers at the time of those calls.
591 * - Their `which_clocks` values (as passed to Checkpointing_timer constructor) are all equal.
592 * - They each have the same number of checkpoints.
593 * - For each `i`: Their `i`-th Checkpointing_timer::Checkpoint::m_name values are all mutually equal.
594 *
595 * Since aggregation collates the many measurements of the same multi-step operation, these should all be common
596 * sense; if one doesn't hold for you, there is some kind of very basic problem.
597 *
598 * ### Thread safety ###
599 * See Aggregator doc header. A mutex lock around this call may be desirable depending on situation.
600 *
601 * @param timer_ptr
602 * The timer to register.
603 */
604 void aggregate(Checkpointing_timer_ptr timer_ptr);
605
606 /// Act as if aggregate() has never been called since construction.
607 void clear();
608
609 /**
610 * Called after the final Checkpointing_timer has finished (final `checkpoint()` call) and been `aggregate()`d,
611 * this `INFO`-logs human-friendly readout of the aggregated results so far. This will include all 3 types of
612 * aggregation: sum; mean; and mean-scaled-by-convenience-constant; or some specified subset thereof.
613 *
614 * ### Performance ###
615 * It is possible to call it between `aggregate()`s, perhaps to output incremental results. It may be slow and should
616 * not be called within any perf-critical section and definitely not in any measured section. The most natural
617 * time to call it is after all data collection has finished.
618 *
619 * @param alternate_logger_ptr_or_null
620 * If null, logs are sent to get_logger() as normal; otherwise it is sent to the supplied log::Logger instead.
621 * Since logging this output is a first-class facility within Checkpointing_timer::Aggregator -- unlike
622 * vast majority of Flow logging which is a secondary (though important in practice) nicety -- it seemed
623 * sensible to provide extra control of this particular bit of logging.
624 * Also note that any (probably TRACE) logging outside of the timer-data output will still go to
625 * get_logger().
626 * @param show_sum
627 * Whether to show the SUM aggregated output type.
628 * @param show_mean
629 * Whether to show the MEAN aggregated output type.
630 * @param mean_scale_or_zero
631 * If zero do not show SCALED-MEAN aggregated output; otherwise show it, with this value as the
632 * scale-constant. (E.g., one could show the average latency per 100,000 requests which might be
633 * more conveniently large than a per-request mean.)
634 */
635 void log_aggregated_results(log::Logger* alternate_logger_ptr_or_null,
636 bool show_sum, bool show_mean, uint64_t mean_scale_or_zero) const;
637
638 /**
639 * Provides programmatic access to the aggregation result that would be logged by log_aggregated_results() -- in
640 * the form of a returned new Checkpointing_timer that stores those numbers.
641 *
642 * log_aggregated_results() basically calls this once for each type of aggregated data it wants to see.
643 * Accordingly, if you desire programmatic access to type X of aggregated results, call this and specify the
644 * type X of interest, as args. log_aggregated_results() doc header's perf notes apply here similarly.
645 *
646 * You can also use `info_log_result` to immediately log the results, as log_aggregated_results() would for that
647 * aggregation type. Conceivably you could then ignore the return value, depending on your goals.
648 *
649 * @param alternate_logger_ptr_or_null
650 * Same meaning as for log_aggregated_results(). However here this is only relevant if `info_log_result`;
651 * otherwise ignored (null is fine).
652 * @param info_log_result
653 * If and only if `true`, INFO messages are logged with human-readable representation of
654 * the returned data, similarly to (partial) log_aggregated_results() output.
655 * Actually it simply calls log_aggregated_result_in_timer() on the returned Checkpointing_timer.
656 * @param mean_scale_or_zero
657 * If zero then return SUM aggregated output; otherwise MEAN or SCALED-MEAN aggregated output, with this
658 * value as the scale-constant (the value 1 => an un-scaled MEAN output).
659 * (E.g., one could show the average latency per 100,000 requests which might be
660 * more conveniently large than a per-request mean.)
661 * @return Ref-counted pointer to a new Checkpointing_timer that stores the aggregated results.
662 * This timer will have the same `which_clocks`, and its `checkpoints()` will have the same length and
663 * in-order names, as each of the source timers; its name and get_logger() will come from Aggregator `name`
664 * (ctor arg) and get_logger() respectively.
665 */
666 Checkpointing_timer_ptr create_aggregated_result(log::Logger* alternate_logger_ptr_or_null, bool info_log_result,
667 uint64_t mean_scale_or_zero = 1) const;
668
669 /**
670 * Given an aggregated-result Checkpointing_timer (as if returned by create_aggregated_result()) logs the SUM or MEAN
671 * or SCALED-MEAN stored in it as INFO messages.
672 *
673 * Publicly, it exists for one use case (as of this writing), namely to log a "manually" compiled aggregated-result
674 * Checkpointing_timer. This *pattern* is explained in the Checkpointing_timer doc header in the aggregation section.
675 *
676 * Naturally, it is also used internally by create_aggregated_result() and log_aggregated_results(), hence why they
677 * all produce similar output.
678 *
679 * @param logger_ptr
680 * Same meaning as in log_aggregated_results(), except null is not a valid value (behavior is undefined).
681 * @param agg_timer
682 * Aggregated-result Checkpointing_timer, as if returned by create_aggregated_result().
683 * Behavior is undefined unless it was, in fact, returned by that method (or is a copy of such);
684 * or unless it was compiled as described in the aforementioned pattern, documented in Checkpointing_timer
685 * doc header.
686 * @param n_samples
687 * For appropriate logging, this must contain the # of samples (repetitions of the measured operation)
688 * collected to yield `agg_timer`.
689 * @param mean_scale_or_zero
690 * For appropriate logging, this must contain the type of aggregated numbers that are stored in `agg_timer`.
691 * See create_aggregated_result().
692 */
693 static void log_aggregated_result_in_timer(log::Logger* logger_ptr,
694 const Checkpointing_timer& agg_timer, unsigned int n_samples,
695 uint64_t mean_scale_or_zero = 1);
696
697 // Data.
698
699 /// The immutable human-readable name for this timer. It's `const` so no need for accessor for now.
700 const std::string m_name;
701
702private:
703 // Data.
704
705 /// The ref-counted pointers to source timers passed to aggregate() in that order.
706 std::vector<Checkpointing_timer_ptr> m_timers;
707}; // class Checkpointing_timer::Aggregator
708
709// Free functions: in *_fwd.hpp.
710
711// However the following refer to inner class and hence must be declared here and not _fwd.hpp.
712
713/**
714 * Prints string representation of the given `Checkpoint` to the given `ostream`. See
715 * Checkpointing_timer::checkpoint() and Checkpointing_timer::checkpoints().
716 *
717 * @relatesalso Checkpointing_timer::Checkpoint
718 *
719 * @param os
720 * Stream to which to write.
721 * @param checkpoint
722 * Object to serialize.
723 * @return `os`.
724 */
725std::ostream& operator<<(std::ostream& os, const Checkpointing_timer::Checkpoint& checkpoint);
726
727} // namespace flow::perf
Convenience class that simply stores a Logger and/or Component passed into a constructor; and returns...
Definition: log.hpp:1619
Interface that the user should implement, passing the implementing Logger into logging classes (Flow'...
Definition: log.hpp:1291
This companion/inner class of Checkpointing_timer provides aggregation of results from many Checkpoin...
void aggregate(Checkpointing_timer_ptr timer_ptr)
Register given Checkpointing_timer for aggregated results.
void log_aggregated_results(log::Logger *alternate_logger_ptr_or_null, bool show_sum, bool show_mean, uint64_t mean_scale_or_zero) const
Called after the final Checkpointing_timer has finished (final checkpoint() call) and been aggregate(...
const std::string m_name
The immutable human-readable name for this timer. It's const so no need for accessor for now.
static void log_aggregated_result_in_timer(log::Logger *logger_ptr, const Checkpointing_timer &agg_timer, unsigned int n_samples, uint64_t mean_scale_or_zero=1)
Given an aggregated-result Checkpointing_timer (as if returned by create_aggregated_result()) logs th...
Checkpointing_timer_ptr create_aggregated_result(log::Logger *alternate_logger_ptr_or_null, bool info_log_result, uint64_t mean_scale_or_zero=1) const
Provides programmatic access to the aggregation result that would be logged by log_aggregated_results...
Aggregator(log::Logger *logger_ptr, std::string &&name_moved, size_t max_n_samples)
Constructs a new, empty aggregator.
void clear()
Act as if aggregate() has never been called since construction.
std::vector< Checkpointing_timer_ptr > m_timers
The ref-counted pointers to source timers passed to aggregate() in that order.
The central class in the perf Flow module, this efficiently times the user's operation,...
void output(std::ostream *os) const
Equivalent to (*os) << (*this).
Checkpointing_timer(Checkpointing_timer &&)=delete
For now at least there's no reason for move-construction.
const std::vector< Checkpoint > & checkpoints() const
Returns the checkpoints recorded so far.
const std::string m_name
The immutable human-readable name for this timer. It's const so no need for accessor for now.
static Clock_types_subset process_cpu_clock_types()
Returns the bit-set containing only those Clock_types enabled that measure passage of processor time ...
void scale(uint64_t mult_scale, uint64_t div_scale)
Called anytime after the last checkpoint(), this modifies the data collected so far to be as if every...
Time_pt_set now() const
Sample all currently enabled Clock_types' clocks and return those values, each of which is a time sta...
void operator=(Checkpointing_timer &&)=delete
No copy assignment; no move assignment either.
Time_pt_set m_last_checkpoint_when
The value of now() recorded during the last checkpoint() call; or equals m_start_when if checkpoint()...
static Cpu_split_clock_durs_since_epoch now_cpu_lo_res_raw()
Returns the current Cpu_split_clock time, essentially as a combo struct containing raw (integer) coun...
static Time_pt now_cpu_lo_res(const Cpu_split_clock_durs_since_epoch &cpu_combo_now_raw, bool user_else_sys)
Returns individual Time_pt value returnable by now() various overloads: either the user time or syste...
Checkpointing_timer(const Checkpointing_timer &src)=default
Constructs a timer that is identical to but entirely independent of the source object.
static Clock_types_subset real_clock_types()
Returns the bit-set containing only those Clock_types enabled that measure passage of real (wall cloc...
std::vector< Checkpoint > m_checkpoints
The checkpoints recorded so far, in order, one per checkpoint() call.
const Checkpoint & checkpoint(std::string &&name_moved)
Records a checkpoint, which takes a sample of all enabled Clock_types' clocks and records the corresp...
Checkpointing_timer(log::Logger *logger_ptr, std::string &&name_moved, Clock_types_subset which_clocks, size_t max_n_checkpoints)
Constructs a new timer and immediately begins measuring time, in that the next checkpoint() call will...
static Clock_types_subset thread_cpu_clock_types()
Returns the bit-set containing only those Clock_types enabled that measure passage of processor time ...
Time_pt_set m_start_when
The value of now() saved in public constructor, marking the time point when the timing began.
const Clock_types_subset m_which_clocks
The bit-set indexed by size_t(Clock_type::...) wherein that clock will be sampled in now() and checkp...
void operator=(const Checkpointing_timer &)=delete
Disallow overwriting. Copy-construction exists for a specific pattern to be possible; no need for thi...
boost::chrono::process_cpu_clock Cpu_split_clock
chrono clock used, internally, for Clock_type::S_CPU_USER_LO_RES and Clock_type::S_CPU_SYS_LO_RES que...
Cpu_split_clock::times Cpu_split_clock_durs_since_epoch
Short-hand for the combo (user/kernel) durations-since-epoch tracked by Cpu_split_clock.
Duration_set since_start() const
Returns the duration(s) (for all enabled Clock_types) between construction and the last checkpoint() ...
Flow module containing tools for profiling and optimization.
boost::shared_ptr< Checkpointing_timer > Checkpointing_timer_ptr
Short-hand for ref-counting pointer to Checkpointing_timer.
Definition: perf_fwd.hpp:43
std::ostream & operator<<(std::ostream &os, const Checkpointing_timer::Checkpoint &checkpoint)
Prints string representation of the given Checkpoint to the given ostream.
std::bitset< size_t(Clock_type::S_END_SENTINEL)> Clock_types_subset
Short-hand for a bit-set of N bits which represents the presence or absence of each of the N possible...
Fine_time_pt Time_pt
Short-hand for a high-precision boost.chrono point in time, formally equivalent to flow::Fine_time_pt...
Clock_type
Clock types supported by flow::perf module facilities, perf::Checkpointing_timer in particular.
The entirety of the information recorded with each Checkpointing_timer::checkpoint() call; it contain...
const std::string m_name
User-supplied name of the checkpoint.
Duration_set m_since_last
For the T.checkpoint() call that generated *this, for each enabled Clock_type: Time elapsed since eit...
Convenience wrapper around an array<Duration, N>, which stores a duration for each of the N possible ...
Convenience wrapper around an array<Time_pt, N>, which stores a time point for each of the N possible...