Flow 1.0.0
Flow project: Public API.
|
The central class in the perf
Flow module, this efficiently times the user's operation, with a specified subset of timing methods; and with the optional ability to time intermediate checkpoints within the overall operation.
More...
#include <checkpt_timer.hpp>
Classes | |
class | Aggregator |
This companion/inner class of Checkpointing_timer provides aggregation of results from many Checkpointing_timer s each measuring some user operation being repeated many times; particularly when the operation consists of multiple steps (checkpoints) of interest. More... | |
struct | Checkpoint |
The entirety of the information recorded with each Checkpointing_timer::checkpoint() call; it contains the Duration values (for each Clock_type ) with the time elapsed since either the preceding T.checkpoint() or construction. More... | |
Public Member Functions | |
Checkpointing_timer (log::Logger *logger_ptr, std::string &&name_moved, Clock_types_subset which_clocks, size_t max_n_checkpoints) | |
Constructs a new timer and immediately begins measuring time, in that the next checkpoint() call will record the time passed since right now. More... | |
Checkpointing_timer (const Checkpointing_timer &src)=default | |
Constructs a timer that is identical to but entirely independent of the source object. More... | |
Checkpointing_timer (Checkpointing_timer &&)=delete | |
For now at least there's no reason for move-construction. | |
void | operator= (const Checkpointing_timer &)=delete |
Disallow overwriting. Copy-construction exists for a specific pattern to be possible; no need for this so far. | |
void | operator= (Checkpointing_timer &&)=delete |
No copy assignment; no move assignment either. | |
const Checkpoint & | checkpoint (std::string &&name_moved) |
Records a checkpoint, which takes a sample of all enabled Clock_type s' clocks and records the corresponding duration(s) since the last checkpoint() or (if none) since construction. More... | |
Duration_set | since_start () const |
Returns the duration(s) (for all enabled Clock_type s) between construction and the last checkpoint() call. More... | |
const std::vector< Checkpoint > & | checkpoints () const |
Returns the checkpoints recorded so far. More... | |
void | scale (uint64_t mult_scale, uint64_t div_scale) |
Called anytime after the last checkpoint(), this modifies the data collected so far to be as if every checkpoint duration D shortened/lengthened by mult_scale and divided by div_scale . More... | |
Time_pt_set | now () const |
Sample all currently enabled Clock_type s' clocks and return those values, each of which is a time stamp relative to some Epoch value. More... | |
void | output (std::ostream *os) const |
Equivalent to (*os) << (*this) . More... | |
Public Member Functions inherited from flow::log::Log_context | |
Log_context (Logger *logger=0) | |
Constructs Log_context by storing the given pointer to a Logger and a null Component. More... | |
template<typename Component_payload > | |
Log_context (Logger *logger, Component_payload component_payload) | |
Constructs Log_context by storing the given pointer to a Logger and a new Component storing the specified generically typed payload (an enum value). More... | |
Log_context (const Log_context &src) | |
Copy constructor that stores equal Logger* and Component values as the source. More... | |
Log_context (Log_context &&src) | |
Move constructor that makes this equal to src , while the latter becomes as-if default-constructed. More... | |
Log_context & | operator= (const Log_context &src) |
Assignment operator that behaves similarly to the copy constructor. More... | |
Log_context & | operator= (Log_context &&src) |
Move assignment operator that behaves similarly to the move constructor. More... | |
void | swap (Log_context &other) |
Swaps Logger pointers and Component objects held by *this and other . More... | |
Logger * | get_logger () const |
Returns the stored Logger pointer, particularly as many FLOW_LOG_*() macros expect. More... | |
const Component & | get_log_component () const |
Returns reference to the stored Component object, particularly as many FLOW_LOG_*() macros expect. More... | |
Static Public Member Functions | |
static Clock_types_subset | real_clock_types () |
Returns the bit-set containing only those Clock_type s enabled that measure passage of real (wall clock) time. More... | |
static Clock_types_subset | process_cpu_clock_types () |
Returns the bit-set containing only those Clock_type s enabled that measure passage of processor time by all threads of the calling process combined. More... | |
static Clock_types_subset | thread_cpu_clock_types () |
Returns the bit-set containing only those Clock_type s enabled that measure passage of processor time by the calling thread. More... | |
static Time_pt | now (Clock_type clock_type) |
Based on the passed in clock_type argument, the current time is returned for that one particular clock type. More... | |
static Time_pt_set | now (const Clock_types_subset &which_clocks) |
Based on the passed in which_clocks argument, the current time is returned for the clock types enabled by it. More... | |
Public Attributes | |
const std::string | m_name |
The immutable human-readable name for this timer. It's const so no need for accessor for now. | |
Related Functions | |
(Note that these are not member functions.) | |
std::ostream & | operator<< (std::ostream &os, const Checkpointing_timer &timer) |
Prints string representation of the given Checkpointing_timer (whether with original data or an aggregated-result timer) to the given ostream . More... | |
The central class in the perf
Flow module, this efficiently times the user's operation, with a specified subset of timing methods; and with the optional ability to time intermediate checkpoints within the overall operation.
To use this: Construct a Checkpointing_timer; perform the operation in question; then access results via accessors; including in particular the string/stream output operation that lays out results in human-friendly fashion. Optionally, call checkpoint() to mark down intermediate timing results, so that the operation can be broken down into its component parts (the string/stream output will include these). This is optional, except that at least 1 checkpoint is required: the last checkpoint represents the end of the measured operation.
The actual measurements are performed by sampling a time stamp from each of N clocks, where N <= M, and M is the number of values in perf::Clock_type enum
. The N clocks are specified at construction by user; thus can measure just 1, in particular, as is often desired. This is a critical and potentially subtle decision; see doc headers of each perf::Clock_type member for tips.
This is a performance-measuring facility, and the performance of a performance-measuring-thing is quite important, especially since some of the things being timed can be very quick (single-digit microseconds potentially).
Extreme care has been taken to keep the computational fat of all the book-keeping in the class very low (and empirical testing shows this was successful). In the cases where user cooperation is necessary to avoid undoing this, the doc headers point it out. Please read docs carefully to avoid timing-technique traps that might lead to wrong conclusions and frustration.
Checkpointing_timer itself is not thread-safe for non-const
access to an object while any other access occurs concurrently. (Internally, there is no locking.) However, the nature of how one uses a timer object is that one does stuff to be measured, checkpoint(), more stuff, checkpoint(), ..., final checkpoint(). In other words, by its very nature, it expects only sequential non-const
calls while measuring times – so thread safety should not come up. (However, thread safety enters the picture with Checkpointing_timer::Aggregator; see below.)
Duration
result aggregation is the common-sense idea that to accurately measure the typical length of operation X is to repeat X many times (N times) and then view a sum-over-N-samples and/or mean-over-N-samples and/or mean-over-N-samples-scaled-times-M, where M is usually some convenience constant like 1,000,000. There are two patterns we provide/suggest to accomplish aggregation.
Suppose you have operation X and are not interested in subdividing it into checkpoints; you only care about the total duration(s) (according to which Clock_type
s interest you). Then the following technique leads to the lowest possible overhead:
This involves minimal overhead, as no timing calls occur at all until X is repeated a million times.
Checkpointing_timer
s + Checkpointing_timer::AggregatorIf, by contrast, you want to divide X into checkpoints and see (say) how long checkpoint 3 takes on average (as well as the total as above), then use the dedicated Checkpointing_timer::Aggregator class for that purpose. This approach also allows one to do stuff (possibly unrelated stuff) between the many repetitions of operation X. Only X itself will be timed.
To use Aggregator
, construct one; then simply create a Checkpointing_timer for each repetition of X and time it as normal. Before or after each repetition, register it inside the agg using Aggregator::aggregate() which takes a pointer to Checkpointing_timer. The only requirement is that the number and respective names of the checkpoints in each Checkpointing_timer are the same for all of them.
Once enough samples (n_samples
in earlier example) have been thus collected, Aggregator::log_aggregated_results() will log every potential form (sum, mean, scaled-mean), not just for the total operation X but also each checkpoint(). For example, if X consists of steps X1, X2, X3, then one would see the mean duration of X1, X2, and X3 individually and their means would add up to the mean for all of X.
This is quite powerful; and while the overhead of creating individual Checkpointing_timer objects, say, might affect the overall performance of the algorithm involving X, it shouldn't have any direct effect on the values actually being measured, since only each X invocation itself is measured; stuff in-between (slow or not) isn't. However:
|
explicit |
Constructs a new timer and immediately begins measuring time, in that the next checkpoint() call will record the time passed since right now.
Please read the following docs to achieve low overhead.
name_moved | Nickname for the timer, for logging and such. If you want to preserve your string, pass in a copy: string(your_name_value) . |
which_clocks | The clocks you want sampled in each checkpoint(). Please carefully read the doc headers of all perf::Clock_type enum members. |
max_n_checkpoints | The number of times you shall call checkpoint() shall not exceed this. A tight value saves RAM; but more importantly a correct upper bound guarantees no internal reallocation will occur in checkpoint() which is important for maintaining low overhead. assert() trips if one breaks this promise. |
logger_ptr | Logger to use for subsequently logging. |
|
explicitdefault |
Constructs a timer that is identical to but entirely independent of the source object.
When performed outside of performance-critical sections, this is very useful for data massaging after having completely measured something in a Checkpointing_timer. E.g., one can create a copy and then scale() it to obtain a mean, etc.
src | Object to copy. |
const Checkpointing_timer::Checkpoint & flow::perf::Checkpointing_timer::checkpoint | ( | std::string && | name_moved | ) |
Records a checkpoint, which takes a sample of all enabled Clock_type
s' clocks and records the corresponding duration(s) since the last checkpoint() or (if none) since construction.
In practice, at least one checkpoint() call is required for *this
to be of any use at all. The final checkpoint() call's sample(s) determine the current value of since_start() which is the duration of the entire operation, as measured since *this
construction. Intermediate calls (ones preceding the final one) are optional and represent intermediate steps and how long they each took.
since_start() depends on exactly two things: when *this
was constructed, and when the last checkpoint() call (at that time) was made.
name_moved
will be taken via move semantics, meaning no string buffer will be copied, only the pointer(s) and buffer size scalar(s) inside std::string
. This helps performance. However, to form name_moved
you will naturally spend some processor cycles which therefore risks slowing down the measured operation and polluting timing results. To avoid this, please follow the following suggestions.
string("some constant")
. All similar constructions from a pre-made const char*
or std::string
are very cheap. So if that's good enough for you, do that.util::ostream_op_string("some string", some_int)
) on a 2015 MacBook Pro. For many, many timing scenarios such sub-microsecond overheads are no big deal, but watch out if you're measuring something tiny-subtle.char*
-writing to do it almost as quickly as that, or at least far closer to it than to the (syntactically-pretty but slow-in-*this*-context) stream-based string formation.It will log a TRACE message. Hence if TRACE is enabled, timing results might be polluted, as logging can take non-trivial time.
name_moved | Nickname for the timer, for logging and such. If you want to preserve your string, pass in a copy: string(your_name_value) . |
struct
, as it sits directly inside *this
. const std::vector< Checkpointing_timer::Checkpoint > & flow::perf::Checkpointing_timer::checkpoints | ( | ) | const |
Returns the checkpoints recorded so far.
This returns a reference and thus avoids a copy. The reference is valid until *this
destroyed.
Time_pt_set flow::perf::Checkpointing_timer::now | ( | ) | const |
Sample all currently enabled Clock_type
s' clocks and return those values, each of which is a time stamp relative to some Epoch value.
(The Epoch value differs by Clock_type. Durations can be generated by subtracting one time point from another which mathematically makes Epoch's meaning irrelevant.) The value Time_pt()
(internally: 0 a/k/a Epoch) is set for the disabled clocks. In particular checkpoint() will internally call this.
It is unusual to use this directly by the user, as absolute time stamps aren't useful directly, while since_start(), checkpoints(), the I/O facilities, etc., provide access to the various computed durations, so one needn't do the subtracting manually. Nevertheless being able to get the raw value is useful for (1) sanity-checking more advanced results; and (2) to time the timer! (An example of the latter is when I was comparing the duration of checkpoint() with a constant name_moved
vs. a dynamic-stream-created one vs. just getting the time stamp via now() and not calling checkpoint() at all.)
|
static |
Based on the passed in clock_type
argument, the current time is returned for that one particular clock type.
clock_type | The type of clock to use. Please carefully read the doc headers of all perf::Clock_type enum members. |
|
static |
Based on the passed in which_clocks
argument, the current time is returned for the clock types enabled by it.
Identical to non-static
now() but which_clocks
is specified as arg instead of at construction.
which_clocks | The types of clock to use. Please carefully read the doc headers of all perf::Clock_type enum members. |
void flow::perf::Checkpointing_timer::output | ( | std::ostream * | os | ) | const |
Equivalent to (*os) << (*this)
.
os | Pointer to ostream to which to write. |
|
static |
Returns the bit-set containing only those Clock_type
s enabled that measure passage of processor time by all threads of the calling process combined.
See real_clock_types().
|
static |
Returns the bit-set containing only those Clock_type
s enabled that measure passage of real (wall clock) time.
One application is when deciding on the constructor which_clocks
arg (also a Clock_types_subset
). If one wants to time some operation in such a way as to make only real-time clocks make sense, and the user is allowed to control which clock types to sample, then one can do:
void flow::perf::Checkpointing_timer::scale | ( | uint64_t | mult_scale, |
uint64_t | div_scale | ||
) |
Called anytime after the last checkpoint(), this modifies the data collected so far to be as if every checkpoint duration D shortened/lengthened by mult_scale
and divided by div_scale
.
For example, if float(mult_scale) / float(div_scale) == 0.5
, and 3 checkpoints were recorded as (6s, 4s, 12s) (since_start()
= 12s + 4s + 6s = 22s), then after the scale() call the checkpoints will reflect (3s, 2s, 6s) (since_start()
== 11s).
In particular (and the original use case for this – see doc header), if one recorded operation X repeated sequentially 100x, with *this
constructed just before the first iteration and checkpoint()
ed just after the 100th, then performing scale(1, 100)
would effectively result in *this
representing the mean duration of a single X iteration.
Formally, behavior is undefined if one calls checkpoint() after calling scale(). Informally it is likely to result in nonsensical data for all checkpoints created starting with that one and for since_start().
mult_scale | The timer data are conceptually multiplied by this, first. |
div_scale | The timer data are conceptually divided by this, next. |
Duration_set flow::perf::Checkpointing_timer::since_start | ( | ) | const |
Returns the duration(s) (for all enabled Clock_type
s) between construction and the last checkpoint() call.
Behavior is undefined if checkpoint() has not been called.
|
static |
Returns the bit-set containing only those Clock_type
s enabled that measure passage of processor time by the calling thread.
See real_clock_types().
|
related |
Prints string representation of the given Checkpointing_timer
(whether with original data or an aggregated-result timer) to the given ostream
.
Note this is multi-line output that does not end in newline.
os | Stream to which to write. |
timer | Object to serialize. |
os
.