Flow 1.0.1
Flow project: Public API.
Classes | Public Member Functions | Static Public Member Functions | Public Attributes | Related Functions | List of all members
flow::perf::Checkpointing_timer Class Reference

The central class in the perf Flow module, this efficiently times the user's operation, with a specified subset of timing methods; and with the optional ability to time intermediate checkpoints within the overall operation. More...

#include <checkpt_timer.hpp>

Inheritance diagram for flow::perf::Checkpointing_timer:
[legend]
Collaboration diagram for flow::perf::Checkpointing_timer:
[legend]

Classes

class  Aggregator
 This companion/inner class of Checkpointing_timer provides aggregation of results from many Checkpointing_timers each measuring some user operation being repeated many times; particularly when the operation consists of multiple steps (checkpoints) of interest. More...
 
struct  Checkpoint
 The entirety of the information recorded with each Checkpointing_timer::checkpoint() call; it contains the Duration values (for each Clock_type) with the time elapsed since either the preceding T.checkpoint() or construction. More...
 

Public Member Functions

 Checkpointing_timer (log::Logger *logger_ptr, std::string &&name_moved, Clock_types_subset which_clocks, size_t max_n_checkpoints)
 Constructs a new timer and immediately begins measuring time, in that the next checkpoint() call will record the time passed since right now. More...
 
 Checkpointing_timer (const Checkpointing_timer &src)=default
 Constructs a timer that is identical to but entirely independent of the source object. More...
 
 Checkpointing_timer (Checkpointing_timer &&)=delete
 For now at least there's no reason for move-construction.
 
void operator= (const Checkpointing_timer &)=delete
 Disallow overwriting. Copy-construction exists for a specific pattern to be possible; no need for this so far.
 
void operator= (Checkpointing_timer &&)=delete
 No copy assignment; no move assignment either.
 
const Checkpointcheckpoint (std::string &&name_moved)
 Records a checkpoint, which takes a sample of all enabled Clock_types' clocks and records the corresponding duration(s) since the last checkpoint() or (if none) since construction. More...
 
Duration_set since_start () const
 Returns the duration(s) (for all enabled Clock_types) between construction and the last checkpoint() call. More...
 
const std::vector< Checkpoint > & checkpoints () const
 Returns the checkpoints recorded so far. More...
 
void scale (uint64_t mult_scale, uint64_t div_scale)
 Called anytime after the last checkpoint(), this modifies the data collected so far to be as if every checkpoint duration D shortened/lengthened by mult_scale and divided by div_scale. More...
 
Time_pt_set now () const
 Sample all currently enabled Clock_types' clocks and return those values, each of which is a time stamp relative to some Epoch value. More...
 
void output (std::ostream *os) const
 Equivalent to (*os) << (*this). More...
 
- Public Member Functions inherited from flow::log::Log_context
 Log_context (Logger *logger=0)
 Constructs Log_context by storing the given pointer to a Logger and a null Component. More...
 
template<typename Component_payload >
 Log_context (Logger *logger, Component_payload component_payload)
 Constructs Log_context by storing the given pointer to a Logger and a new Component storing the specified generically typed payload (an enum value). More...
 
 Log_context (const Log_context &src)
 Copy constructor that stores equal Logger* and Component values as the source. More...
 
 Log_context (Log_context &&src)
 Move constructor that makes this equal to src, while the latter becomes as-if default-constructed. More...
 
Log_contextoperator= (const Log_context &src)
 Assignment operator that behaves similarly to the copy constructor. More...
 
Log_contextoperator= (Log_context &&src)
 Move assignment operator that behaves similarly to the move constructor. More...
 
void swap (Log_context &other)
 Swaps Logger pointers and Component objects held by *this and other. More...
 
Loggerget_logger () const
 Returns the stored Logger pointer, particularly as many FLOW_LOG_*() macros expect. More...
 
const Componentget_log_component () const
 Returns reference to the stored Component object, particularly as many FLOW_LOG_*() macros expect. More...
 

Static Public Member Functions

static Clock_types_subset real_clock_types ()
 Returns the bit-set containing only those Clock_types enabled that measure passage of real (wall clock) time. More...
 
static Clock_types_subset process_cpu_clock_types ()
 Returns the bit-set containing only those Clock_types enabled that measure passage of processor time by all threads of the calling process combined. More...
 
static Clock_types_subset thread_cpu_clock_types ()
 Returns the bit-set containing only those Clock_types enabled that measure passage of processor time by the calling thread. More...
 
static Time_pt now (Clock_type clock_type)
 Based on the passed in clock_type argument, the current time is returned for that one particular clock type. More...
 
static Time_pt_set now (const Clock_types_subset &which_clocks)
 Based on the passed in which_clocks argument, the current time is returned for the clock types enabled by it. More...
 

Public Attributes

const std::string m_name
 The immutable human-readable name for this timer. It's const so no need for accessor for now.
 

Related Functions

(Note that these are not member functions.)

std::ostream & operator<< (std::ostream &os, const Checkpointing_timer &timer)
 Prints string representation of the given Checkpointing_timer (whether with original data or an aggregated-result timer) to the given ostream. More...
 

Detailed Description

The central class in the perf Flow module, this efficiently times the user's operation, with a specified subset of timing methods; and with the optional ability to time intermediate checkpoints within the overall operation.

How to use

To use this: Construct a Checkpointing_timer; perform the operation in question; then access results via accessors; including in particular the string/stream output operation that lays out results in human-friendly fashion. Optionally, call checkpoint() to mark down intermediate timing results, so that the operation can be broken down into its component parts (the string/stream output will include these). This is optional, except that at least 1 checkpoint is required: the last checkpoint represents the end of the measured operation.

The actual measurements are performed by sampling a time stamp from each of N clocks, where N <= M, and M is the number of values in perf::Clock_type enum. The N clocks are specified at construction by user; thus can measure just 1, in particular, as is often desired. This is a critical and potentially subtle decision; see doc headers of each perf::Clock_type member for tips.

Performance

This is a performance-measuring facility, and the performance of a performance-measuring-thing is quite important, especially since some of the things being timed can be very quick (single-digit microseconds potentially).

Extreme care has been taken to keep the computational fat of all the book-keeping in the class very low (and empirical testing shows this was successful). In the cases where user cooperation is necessary to avoid undoing this, the doc headers point it out. Please read docs carefully to avoid timing-technique traps that might lead to wrong conclusions and frustration.

Thread safety

Checkpointing_timer itself is not thread-safe for non-const access to an object while any other access occurs concurrently. (Internally, there is no locking.) However, the nature of how one uses a timer object is that one does stuff to be measured, checkpoint(), more stuff, checkpoint(), ..., final checkpoint(). In other words, by its very nature, it expects only sequential non-const calls while measuring times – so thread safety should not come up. (However, thread safety enters the picture with Checkpointing_timer::Aggregator; see below.)

Aggregation

Duration result aggregation is the common-sense idea that to accurately measure the typical length of operation X is to repeat X many times (N times) and then view a sum-over-N-samples and/or mean-over-N-samples and/or mean-over-N-samples-scaled-times-M, where M is usually some convenience constant like 1,000,000. There are two patterns we provide/suggest to accomplish aggregation.

Lowest-overhead aggregation approach: Single Checkpointing_timer

Suppose you have operation X and are not interested in subdividing it into checkpoints; you only care about the total duration(s) (according to which Clock_types interest you). Then the following technique leads to the lowest possible overhead:

flow::perf::Checkpointing_timer sum_timer("op name", which_clocks, 1, logger); // 1 checkpoint only.
const unsigned int n_samples = 1000000;
for (unsigned int sample_idx = 0; sample_idx != n_samples; ++sample_idx)
{
// ...Operation X being measured goes here.
}
sum_timer.checkpoint("total"); // Mark down the total time taken.
// Log the total duration(s), one per clock type!
// And/or: Log mean duration(s), times N_SAMPLES_SCALE_CONVENIENCE, one per clock type!
{
// Just make a copy of the raw sum, then scale it x N_SAMPLES_SCALE_CONVENIENCE / n_samples.
auto mean_scaled_timer(sum_timer);
mean_scaled_timer.scale(N_SAMPLES_SCALE_CONVENIENCE, n_samples);
true, N_SAMPLES_SCALE_CONVENIENCE);
}
static void log_aggregated_result_in_timer(log::Logger *logger_ptr, const Checkpointing_timer &agg_timer, unsigned int n_samples, uint64_t mean_scale_or_zero=1)
Given an aggregated-result Checkpointing_timer (as if returned by create_aggregated_result()) logs th...
Definition: checkpt_timer.cpp:471
The central class in the perf Flow module, this efficiently times the user's operation,...
Definition: checkpt_timer.hpp:188

This involves minimal overhead, as no timing calls occur at all until X is repeated a million times.

Cross-product aggregation approach: Multiple Checkpointing_timers + Checkpointing_timer::Aggregator

If, by contrast, you want to divide X into checkpoints and see (say) how long checkpoint 3 takes on average (as well as the total as above), then use the dedicated Checkpointing_timer::Aggregator class for that purpose. This approach also allows one to do stuff (possibly unrelated stuff) between the many repetitions of operation X. Only X itself will be timed.

To use Aggregator, construct one; then simply create a Checkpointing_timer for each repetition of X and time it as normal. Before or after each repetition, register it inside the agg using Aggregator::aggregate() which takes a pointer to Checkpointing_timer. The only requirement is that the number and respective names of the checkpoints in each Checkpointing_timer are the same for all of them.

Once enough samples (n_samples in earlier example) have been thus collected, Aggregator::log_aggregated_results() will log every potential form (sum, mean, scaled-mean), not just for the total operation X but also each checkpoint(). For example, if X consists of steps X1, X2, X3, then one would see the mean duration of X1, X2, and X3 individually and their means would add up to the mean for all of X.

This is quite powerful; and while the overhead of creating individual Checkpointing_timer objects, say, might affect the overall performance of the algorithm involving X, it shouldn't have any direct effect on the values actually being measured, since only each X invocation itself is measured; stuff in-between (slow or not) isn't. However:

See also
Checkpointing_timer::Aggregator.

Constructor & Destructor Documentation

◆ Checkpointing_timer() [1/2]

flow::perf::Checkpointing_timer::Checkpointing_timer ( log::Logger logger_ptr,
std::string &&  name_moved,
Clock_types_subset  which_clocks,
size_t  max_n_checkpoints 
)
explicit

Constructs a new timer and immediately begins measuring time, in that the next checkpoint() call will record the time passed since right now.

Please read the following docs to achieve low overhead.

Parameters
name_movedNickname for the timer, for logging and such. If you want to preserve your string, pass in a copy: string(your_name_value).
which_clocksThe clocks you want sampled in each checkpoint(). Please carefully read the doc headers of all perf::Clock_type enum members.
max_n_checkpointsThe number of times you shall call checkpoint() shall not exceed this. A tight value saves RAM; but more importantly a correct upper bound guarantees no internal reallocation will occur in checkpoint() which is important for maintaining low overhead. assert() trips if one breaks this promise.
logger_ptrLogger to use for subsequently logging.

◆ Checkpointing_timer() [2/2]

flow::perf::Checkpointing_timer::Checkpointing_timer ( const Checkpointing_timer src)
explicitdefault

Constructs a timer that is identical to but entirely independent of the source object.

When performed outside of performance-critical sections, this is very useful for data massaging after having completely measured something in a Checkpointing_timer. E.g., one can create a copy and then scale() it to obtain a mean, etc.

See also
Checkpointing_timer doc header for discussion of aggregation approaches.
Parameters
srcObject to copy.

Member Function Documentation

◆ checkpoint()

const Checkpointing_timer::Checkpoint & flow::perf::Checkpointing_timer::checkpoint ( std::string &&  name_moved)

Records a checkpoint, which takes a sample of all enabled Clock_types' clocks and records the corresponding duration(s) since the last checkpoint() or (if none) since construction.

In practice, at least one checkpoint() call is required for *this to be of any use at all. The final checkpoint() call's sample(s) determine the current value of since_start() which is the duration of the entire operation, as measured since *this construction. Intermediate calls (ones preceding the final one) are optional and represent intermediate steps and how long they each took.

since_start() depends on exactly two things: when *this was constructed, and when the last checkpoint() call (at that time) was made.

Naming vs. performance

name_moved will be taken via move semantics, meaning no string buffer will be copied, only the pointer(s) and buffer size scalar(s) inside std::string. This helps performance. However, to form name_moved you will naturally spend some processor cycles which therefore risks slowing down the measured operation and polluting timing results. To avoid this, please follow the following suggestions.

  • Passing a constant string as follows is very cheap: string("some constant"). All similar constructions from a pre-made const char* or std::string are very cheap. So if that's good enough for you, do that.
  • Constructing something via util::ostream_op_string() (and/or util::String_ostream) adds about 500 nanoseconds per checkpoint() call (e.g.: util::ostream_op_string("some string", some_int)) on a 2015 MacBook Pro. For many, many timing scenarios such sub-microsecond overheads are no big deal, but watch out if you're measuring something tiny-subtle.
    • If it is a problem, use a constant thing as in previous bullet. If that isn't sufficient, you can fairly easily use C-style direct-char*-writing to do it almost as quickly as that, or at least far closer to it than to the (syntactically-pretty but slow-in-*this*-context) stream-based string formation.

Logging

It will log a TRACE message. Hence if TRACE is enabled, timing results might be polluted, as logging can take non-trivial time.

Parameters
name_movedNickname for the timer, for logging and such. If you want to preserve your string, pass in a copy: string(your_name_value).
Returns
Reference to immutable new Checkpoint struct, as it sits directly inside *this.

◆ checkpoints()

const std::vector< Checkpointing_timer::Checkpoint > & flow::perf::Checkpointing_timer::checkpoints ( ) const

Returns the checkpoints recorded so far.

This returns a reference and thus avoids a copy. The reference is valid until *this destroyed.

Returns
Checkpoints so far.

◆ now() [1/3]

Time_pt_set flow::perf::Checkpointing_timer::now ( ) const

Sample all currently enabled Clock_types' clocks and return those values, each of which is a time stamp relative to some Epoch value.

(The Epoch value differs by Clock_type. Durations can be generated by subtracting one time point from another which mathematically makes Epoch's meaning irrelevant.) The value Time_pt() (internally: 0 a/k/a Epoch) is set for the disabled clocks. In particular checkpoint() will internally call this.

Rationale

It is unusual to use this directly by the user, as absolute time stamps aren't useful directly, while since_start(), checkpoints(), the I/O facilities, etc., provide access to the various computed durations, so one needn't do the subtracting manually. Nevertheless being able to get the raw value is useful for (1) sanity-checking more advanced results; and (2) to time the timer! (An example of the latter is when I was comparing the duration of checkpoint() with a constant name_moved vs. a dynamic-stream-created one vs. just getting the time stamp via now() and not calling checkpoint() at all.)

Returns
See above.

◆ now() [2/3]

Time_pt flow::perf::Checkpointing_timer::now ( Clock_type  clock_type)
static

Based on the passed in clock_type argument, the current time is returned for that one particular clock type.

Parameters
clock_typeThe type of clock to use. Please carefully read the doc headers of all perf::Clock_type enum members.
Returns
The current time.

◆ now() [3/3]

Time_pt_set flow::perf::Checkpointing_timer::now ( const Clock_types_subset which_clocks)
static

Based on the passed in which_clocks argument, the current time is returned for the clock types enabled by it.

Identical to non-static now() but which_clocks is specified as arg instead of at construction.

Parameters
which_clocksThe types of clock to use. Please carefully read the doc headers of all perf::Clock_type enum members.
Returns
The current time or times.

◆ output()

void flow::perf::Checkpointing_timer::output ( std::ostream *  os) const

Equivalent to (*os) << (*this).

Parameters
osPointer to ostream to which to write.

◆ process_cpu_clock_types()

Clock_types_subset flow::perf::Checkpointing_timer::process_cpu_clock_types ( )
static

Returns the bit-set containing only those Clock_types enabled that measure passage of processor time by all threads of the calling process combined.

Rationale

See real_clock_types().

See also
perf::Clock_type
Returns
See above.

◆ real_clock_types()

Clock_types_subset flow::perf::Checkpointing_timer::real_clock_types ( )
static

Returns the bit-set containing only those Clock_types enabled that measure passage of real (wall clock) time.

Rationale

One application is when deciding on the constructor which_clocks arg (also a Clock_types_subset). If one wants to time some operation in such a way as to make only real-time clocks make sense, and the user is allowed to control which clock types to sample, then one can do:

// They can specify a bitmask the_clock_set_from_config; then we potentially cull that further to exclude all
// but the real-time clocks, because we feel only those make sense in our timing context.
const auto which_clocks = Checkpointing_timer::real_clock_types() & the_clock_set_from_config;
Checkpointing_timer timer(..., which_clocks, ...);
// The timing has begun....
static Clock_types_subset real_clock_types()
Returns the bit-set containing only those Clock_types enabled that measure passage of real (wall cloc...
Definition: checkpt_timer.cpp:77
See also
perf::Clock_type
Returns
See above.

◆ scale()

void flow::perf::Checkpointing_timer::scale ( uint64_t  mult_scale,
uint64_t  div_scale 
)

Called anytime after the last checkpoint(), this modifies the data collected so far to be as if every checkpoint duration D shortened/lengthened by mult_scale and divided by div_scale.

For example, if float(mult_scale) / float(div_scale) == 0.5, and 3 checkpoints were recorded as (6s, 4s, 12s) (since_start() = 12s + 4s + 6s = 22s), then after the scale() call the checkpoints will reflect (3s, 2s, 6s) (since_start() == 11s).

In particular (and the original use case for this – see doc header), if one recorded operation X repeated sequentially 100x, with *this constructed just before the first iteration and checkpoint()ed just after the 100th, then performing scale(1, 100) would effectively result in *this representing the mean duration of a single X iteration.

Formally, behavior is undefined if one calls checkpoint() after calling scale(). Informally it is likely to result in nonsensical data for all checkpoints created starting with that one and for since_start().

Parameters
mult_scaleThe timer data are conceptually multiplied by this, first.
div_scaleThe timer data are conceptually divided by this, next.

◆ since_start()

Duration_set flow::perf::Checkpointing_timer::since_start ( ) const

Returns the duration(s) (for all enabled Clock_types) between construction and the last checkpoint() call.

Behavior is undefined if checkpoint() has not been called.

Returns
See above.

◆ thread_cpu_clock_types()

Clock_types_subset flow::perf::Checkpointing_timer::thread_cpu_clock_types ( )
static

Returns the bit-set containing only those Clock_types enabled that measure passage of processor time by the calling thread.

Rationale

See real_clock_types().

See also
perf::Clock_type
Returns
See above.

Friends And Related Function Documentation

◆ operator<<()

std::ostream & operator<< ( std::ostream &  os,
const Checkpointing_timer timer 
)
related

Prints string representation of the given Checkpointing_timer (whether with original data or an aggregated-result timer) to the given ostream.

Note this is multi-line output that does not end in newline.

Parameters
osStream to which to write.
timerObject to serialize.
Returns
os.

The documentation for this class was generated from the following files: