The central class in the perf Flow module, this efficiently times the user's operation, with a specified subset of timing methods; and with the optional ability to time intermediate checkpoints within the overall operation. More...

#include <checkpt_timer.hpp>

Inheritance diagram for flow::perf::Checkpointing_timer:

[legend]

Collaboration diagram for flow::perf::Checkpointing_timer:

[legend]

Classes
class	Aggregator
	This companion/inner class of Checkpointing_timer provides aggregation of results from many `Checkpointing_timer`s each measuring some user operation being repeated many times; particularly when the operation consists of multiple steps (checkpoints) of interest. More...

struct	Checkpoint
	The entirety of the information recorded with each Checkpointing_timer::checkpoint() call; it contains the `Duration` values (for each `Clock_type`) with the time elapsed since either the preceding `T.checkpoint()` or construction. More...

Public Member Functions
	Checkpointing_timer (log::Logger *logger_ptr, std::string &&name_moved, Clock_types_subset which_clocks, size_t max_n_checkpoints)
	Constructs a new timer and immediately begins measuring time, in that the next checkpoint() call will record the time passed since right now. More...

	Checkpointing_timer (const Checkpointing_timer &src)=default
	Constructs a timer that is identical to but entirely independent of the source object. More...

	Checkpointing_timer (Checkpointing_timer &&)=delete
	For now at least there's no reason for move-construction.

void	operator= (const Checkpointing_timer &)=delete
	Disallow overwriting. Copy-construction exists for a specific pattern to be possible; no need for this so far.

void	operator= (Checkpointing_timer &&)=delete
	No copy assignment; no move assignment either.

const Checkpoint &	checkpoint (std::string &&name_moved)
	Records a checkpoint, which takes a sample of all enabled `Clock_type`s' clocks and records the corresponding duration(s) since the last checkpoint() or (if none) since construction. More...

Duration_set	since_start () const
	Returns the duration(s) (for all enabled `Clock_type`s) between construction and the last checkpoint() call. More...

const std::vector< Checkpoint > &	checkpoints () const
	Returns the checkpoints recorded so far. More...

void	scale (uint64_t mult_scale, uint64_t div_scale)
	Called anytime after the last checkpoint(), this modifies the data collected so far to be as if every checkpoint duration D shortened/lengthened by `mult_scale` and divided by `div_scale`. More...

Time_pt_set	now () const
	Sample all currently enabled `Clock_type`s' clocks and return those values, each of which is a time stamp relative to some Epoch value. More...

void	output (std::ostream *os) const
	Equivalent to `(os) << (this)`. More...

Public Member Functions inherited from flow::log::Log_context
	Log_context (Logger *logger=0)
	Constructs Log_context by storing the given pointer to a Logger and a null Component. More...

template<typename Component_payload >
	Log_context (Logger *logger, Component_payload component_payload)
	Constructs Log_context by storing the given pointer to a Logger and a new Component storing the specified generically typed payload (an `enum` value). More...

	Log_context (const Log_context &src)
	Copy constructor that stores equal `Logger*` and Component values as the source. More...

	Log_context (Log_context &&src)
	Move constructor that makes this equal to `src`, while the latter becomes as-if default-constructed. More...

Log_context &	operator= (const Log_context &src)
	Assignment operator that behaves similarly to the copy constructor. More...

Log_context &	operator= (Log_context &&src)
	Move assignment operator that behaves similarly to the move constructor. More...

void	swap (Log_context &other)
	Swaps Logger pointers and Component objects held by `*this` and `other`. More...

Logger *	get_logger () const
	Returns the stored Logger pointer, particularly as many `FLOW_LOG_*()` macros expect. More...

const Component &	get_log_component () const
	Returns reference to the stored Component object, particularly as many `FLOW_LOG_*()` macros expect. More...

Static Public Member Functions
static Clock_types_subset	real_clock_types ()
	Returns the bit-set containing only those `Clock_type`s enabled that measure passage of real (wall clock) time. More...

static Clock_types_subset	process_cpu_clock_types ()
	Returns the bit-set containing only those `Clock_type`s enabled that measure passage of processor time by all threads of the calling process combined. More...

static Clock_types_subset	thread_cpu_clock_types ()
	Returns the bit-set containing only those `Clock_type`s enabled that measure passage of processor time by the calling thread. More...

static Time_pt	now (Clock_type clock_type)
	Based on the passed in `clock_type` argument, the current time is returned for that one particular clock type. More...

static Time_pt_set	now (const Clock_types_subset &which_clocks)
	Based on the passed in `which_clocks` argument, the current time is returned for the clock types enabled by it. More...

Public Attributes
const std::string	m_name
	The immutable human-readable name for this timer. It's `const` so no need for accessor for now.

Related Functions
(Note that these are not member functions.)
std::ostream &	operator<< (std::ostream &os, const Checkpointing_timer &timer)
	Prints string representation of the given `Checkpointing_timer` (whether with original data or an aggregated-result timer) to the given `ostream`. More...

Detailed Description

The central class in the perf Flow module, this efficiently times the user's operation, with a specified subset of timing methods; and with the optional ability to time intermediate checkpoints within the overall operation.

How to use

To use this: Construct a Checkpointing_timer; perform the operation in question; then access results via accessors; including in particular the string/stream output operation that lays out results in human-friendly fashion. Optionally, call checkpoint() to mark down intermediate timing results, so that the operation can be broken down into its component parts (the string/stream output will include these). This is optional, except that at least 1 checkpoint is required: the last checkpoint represents the end of the measured operation.

The actual measurements are performed by sampling a time stamp from each of N clocks, where N <= M, and M is the number of values in perf::Clock_type enum. The N clocks are specified at construction by user; thus can measure just 1, in particular, as is often desired. This is a critical and potentially subtle decision; see doc headers of each perf::Clock_type member for tips.

Performance

This is a performance-measuring facility, and the performance of a performance-measuring-thing is quite important, especially since some of the things being timed can be very quick (single-digit microseconds potentially).

Extreme care has been taken to keep the computational fat of all the book-keeping in the class very low (and empirical testing shows this was successful). In the cases where user cooperation is necessary to avoid undoing this, the doc headers point it out. Please read docs carefully to avoid timing-technique traps that might lead to wrong conclusions and frustration.

Thread safety

Checkpointing_timer itself is not thread-safe for non-const access to an object while any other access occurs concurrently. (Internally, there is no locking.) However, the nature of how one uses a timer object is that one does stuff to be measured, checkpoint(), more stuff, checkpoint(), ..., final checkpoint(). In other words, by its very nature, it expects only sequential non-const calls while measuring times – so thread safety should not come up. (However, thread safety enters the picture with Checkpointing_timer::Aggregator; see below.)

Aggregation

Duration result aggregation is the common-sense idea that to accurately measure the typical length of operation X is to repeat X many times (N times) and then view a sum-over-N-samples and/or mean-over-N-samples and/or mean-over-N-samples-scaled-times-M, where M is usually some convenience constant like 1,000,000. There are two patterns we provide/suggest to accomplish aggregation.

Lowest-overhead aggregation approach: Single Checkpointing_timer

Suppose you have operation X and are not interested in subdividing it into checkpoints; you only care about the total duration(s) (according to which Clock_types interest you). Then the following technique leads to the lowest possible overhead:

flow::perf::Checkpointing_timer sum_timer(logger, "op name", which_clocks, 1); // 1 checkpoint only.
const unsigned int n_samples = 1000000;
for (unsigned int sample_idx = 0; sample_idx != n_samples; ++sample_idx)
{
  // ...Operation X being measured goes here.
}
sum_timer.checkpoint("total"); // Mark down the total time taken.
// Log the total duration(s), one per clock type!
flow::perf::Checkpointing_timer::Aggregator::log_aggregated_result_in_timer(sum_timer, n_samples, false);
// And/or: Log mean duration(s), times N_SAMPLES_SCALE_CONVENIENCE, one per clock type!
{
  // Just make a copy of the raw sum, then scale it x N_SAMPLES_SCALE_CONVENIENCE / n_samples.
  auto mean_scaled_timer(sum_timer);
  mean_scaled_timer.scale(N_SAMPLES_SCALE_CONVENIENCE, n_samples);
  flow::perf::Checkpointing_timer::Aggregator::log_aggregated_result_in_timer(mean_scaled_timer, n_samples,
                                                                              true, N_SAMPLES_SCALE_CONVENIENCE);
}

This involves minimal overhead, as no timing calls occur at all until X is repeated a million times.

Cross-product aggregation approach: Multiple `Checkpointing_timer`s + Checkpointing_timer::Aggregator

If, by contrast, you want to divide X into checkpoints and see (say) how long checkpoint 3 takes on average (as well as the total as above), then use the dedicated Checkpointing_timer::Aggregator class for that purpose. This approach also allows one to do stuff (possibly unrelated stuff) between the many repetitions of operation X. Only X itself will be timed.

To use Aggregator, construct one; then simply create a Checkpointing_timer for each repetition of X and time it as normal. Before or after each repetition, register it inside the agg using Aggregator::aggregate() which takes a pointer to Checkpointing_timer. The only requirement is that the number and respective names of the checkpoints in each Checkpointing_timer are the same for all of them.

Once enough samples (n_samples in earlier example) have been thus collected, Aggregator::log_aggregated_results() will log every potential form (sum, mean, scaled-mean), not just for the total operation X but also each checkpoint(). For example, if X consists of steps X1, X2, X3, then one would see the mean duration of X1, X2, and X3 individually and their means would add up to the mean for all of X.

This is quite powerful; and while the overhead of creating individual Checkpointing_timer objects, say, might affect the overall performance of the algorithm involving X, it shouldn't have any direct effect on the values actually being measured, since only each X invocation itself is measured; stuff in-between (slow or not) isn't. However:

Checkpointing_timer::Aggregator::aggregate() isn't thread-safe for concurrent access (so if that's relevant, you'll need a mutex lock around the call).
Always ensure you understand the structure of your measurement plan as it relates to concurrency. For example, it's meaningless to measure processor-time cost of X being done repeatedly, if various incarnations of X might be performed partially concurrently (say, if X = handling 1 HTTP request in a multi-threaded server under load). In that case processor-times will overlap and produce essentially meaningless (at the very least misleading) results. On the other hand, real-time measurements (Clock_type::S_REAL_HI_RES) make plenty of sense in that situation and would indeed reliably measure latency. Thread-time measurements (Clock_type::S_CPU_THREAD_TOTAL_HI_RES) can also make sense. Just saying: do not assume all clock types make sense in the face of concurrent operations; plan mindfully.

See also: Checkpointing_timer::Aggregator.

Constructor & Destructor Documentation

◆ Checkpointing_timer() [1/2]

flow::perf::Checkpointing_timer::Checkpointing_timer	(	log::Logger *	logger_ptr,
		std::string &&	name_moved,
		Clock_types_subset	which_clocks,
		size_t	max_n_checkpoints
	)

explicit

Constructs a new timer and immediately begins measuring time, in that the next checkpoint() call will record the time passed since right now.

Please read the following docs to achieve low overhead.

Parameters

name_moved	Nickname for the timer, for logging and such. If you want to preserve your string, pass in a copy: `string(your_name_value)`.
which_clocks	The clocks you want sampled in each checkpoint(). Please carefully read the doc headers of all perf::Clock_type `enum` members.
max_n_checkpoints	The number of times you shall call checkpoint() shall not exceed this. A tight value saves RAM; but more importantly a correct upper bound guarantees no internal reallocation will occur in checkpoint() which is important for maintaining low overhead. `assert()` trips if one breaks this promise.
logger_ptr	Logger to use for subsequently logging.

◆ Checkpointing_timer() [2/2]

flow::perf::Checkpointing_timer::Checkpointing_timer ( const Checkpointing_timer & src )

explicitdefault

Constructs a timer that is identical to but entirely independent of the source object.

When performed outside of performance-critical sections, this is very useful for data massaging after having completely measured something in a Checkpointing_timer. E.g., one can create a copy and then scale() it to obtain a mean, etc.

See also: Checkpointing_timer doc header for discussion of aggregation approaches.

Parameters

src	Object to copy.

Member Function Documentation

◆ checkpoint()

const Checkpointing_timer::Checkpoint & flow::perf::Checkpointing_timer::checkpoint ( std::string && name_moved )

Records a checkpoint, which takes a sample of all enabled Clock_types' clocks and records the corresponding duration(s) since the last checkpoint() or (if none) since construction.

In practice, at least one checkpoint() call is required for *this to be of any use at all. The final checkpoint() call's sample(s) determine the current value of since_start() which is the duration of the entire operation, as measured since *this construction. Intermediate calls (ones preceding the final one) are optional and represent intermediate steps and how long they each took.

since_start() depends on exactly two things: when *this was constructed, and when the last checkpoint() call (at that time) was made.

Naming vs. performance

name_moved will be taken via move semantics, meaning no string buffer will be copied, only the pointer(s) and buffer size scalar(s) inside std::string. This helps performance. However, to form name_moved you will naturally spend some processor cycles which therefore risks slowing down the measured operation and polluting timing results. To avoid this, please follow the following suggestions.

Passing a constant string as follows is very cheap: "some constant" or string("same"). So if that's good enough for you, do that.
- In most STL, due to the std::string SSO (optimization which stores a short-enough string directly inside the string object), if you keep the constant 15 characters or less in length, it's even cheaper; heap is not used as a result.
Constructing something via util::ostream_op_string() (and/or util::String_ostream) adds about 500 nanoseconds per checkpoint() call (e.g.: util::ostream_op_string("some string", some_int)) on a 2015 MacBook Pro. For many, many timing scenarios such sub-microsecond overheads are no big deal, but watch out if you're measuring something tiny-subtle.
- If it is a problem, use a constant thing as in previous bullet. If that isn't sufficient, you can fairly easily use C-style direct-char*-writing to do it almost as quickly as that, or at least far closer to it than to the (syntactically-pretty but slow-in-*this*-context) stream-based string formation.

Logging

It will log a TRACE message. Hence if TRACE is enabled, timing results might be polluted, as logging can take non-trivial time.

Parameters

name_moved Nickname for the timer, for logging and such. If you want to preserve your string, pass in a copy: string(your_name_value). See above regarding perf impact.

Returns: Reference to immutable new Checkpoint struct, as it sits directly inside *this.

◆ checkpoints()

const std::vector< Checkpointing_timer::Checkpoint > & flow::perf::Checkpointing_timer::checkpoints ( ) const

Returns the checkpoints recorded so far.

This returns a reference and thus avoids a copy. The reference is valid until *this destroyed.

Returns: Checkpoints so far.

◆ now() [1/3]

Time_pt_set flow::perf::Checkpointing_timer::now ( ) const

Sample all currently enabled Clock_types' clocks and return those values, each of which is a time stamp relative to some Epoch value.

(The Epoch value differs by Clock_type. Durations can be generated by subtracting one time point from another which mathematically makes Epoch's meaning irrelevant.) The value Time_pt() (internally: 0 a/k/a Epoch) is set for the disabled clocks. In particular checkpoint() will internally call this.

Rationale

It is unusual to use this directly by the user, as absolute time stamps aren't useful directly, while since_start(), checkpoints(), the I/O facilities, etc., provide access to the various computed durations, so one needn't do the subtracting manually. Nevertheless being able to get the raw value is useful for (1) sanity-checking more advanced results; and (2) to time the timer! (An example of the latter is when I was comparing the duration of checkpoint() with a constant name_moved vs. a dynamic-stream-created one vs. just getting the time stamp via now() and not calling checkpoint() at all.)

Returns: See above.

◆ now() [2/3]

Time_pt flow::perf::Checkpointing_timer::now ( Clock_type clock_type )

static

Based on the passed in clock_type argument, the current time is returned for that one particular clock type.

Parameters

clock_type The type of clock to use. Please carefully read the doc headers of all perf::Clock_type enum members.

Returns: The current time.

◆ now() [3/3]

Time_pt_set flow::perf::Checkpointing_timer::now ( const Clock_types_subset & which_clocks )

static

Based on the passed in which_clocks argument, the current time is returned for the clock types enabled by it.

Identical to non-static now() but which_clocks is specified as arg instead of at construction.

Parameters

which_clocks The types of clock to use. Please carefully read the doc headers of all perf::Clock_type enum members.

Returns: The current time or times.

◆ output()

void flow::perf::Checkpointing_timer::output ( std::ostream * os ) const

Equivalent to (*os) << (*this).

Parameters

os	Pointer to `ostream` to which to write.

◆ process_cpu_clock_types()

Clock_types_subset flow::perf::Checkpointing_timer::process_cpu_clock_types ( )

static

Returns the bit-set containing only those Clock_types enabled that measure passage of processor time by all threads of the calling process combined.

Rationale

See real_clock_types().

See also: perf::Clock_type

Returns: See above.

◆ real_clock_types()

Clock_types_subset flow::perf::Checkpointing_timer::real_clock_types ( )

static

Returns the bit-set containing only those Clock_types enabled that measure passage of real (wall clock) time.

Rationale

One application is when deciding on the constructor which_clocks arg (also a Clock_types_subset). If one wants to time some operation in such a way as to make only real-time clocks make sense, and the user is allowed to control which clock types to sample, then one can do:

// They can specify a bitmask the_clock_set_from_config; then we potentially cull that further to exclude all
// but the real-time clocks, because we feel only those make sense in our timing context.
const auto which_clocks = Checkpointing_timer::real_clock_types() & the_clock_set_from_config;
Checkpointing_timer timer(..., which_clocks, ...);
// The timing has begun....

See also: perf::Clock_type

Returns: See above.

◆ scale()

void flow::perf::Checkpointing_timer::scale	(	uint64_t	mult_scale,
		uint64_t	div_scale
	)

Called anytime after the last checkpoint(), this modifies the data collected so far to be as if every checkpoint duration D shortened/lengthened by mult_scale and divided by div_scale.

For example, if float(mult_scale) / float(div_scale) == 0.5, and 3 checkpoints were recorded as (6s, 4s, 12s) (since_start() = 12s + 4s + 6s = 22s), then after the scale() call the checkpoints will reflect (3s, 2s, 6s) (since_start() == 11s).

In particular (and the original use case for this – see doc header), if one recorded operation X repeated sequentially 100x, with *this constructed just before the first iteration and checkpoint()ed just after the 100th, then performing scale(1, 100) would effectively result in *this representing the mean duration of a single X iteration.

Formally, behavior is undefined if one calls checkpoint() after calling scale(). Informally it is likely to result in nonsensical data for all checkpoints created starting with that one and for since_start().

Parameters

mult_scale	The timer data are conceptually multiplied by this, first.
div_scale	The timer data are conceptually divided by this, next.

◆ since_start()

Duration_set flow::perf::Checkpointing_timer::since_start ( ) const

Returns the duration(s) (for all enabled Clock_types) between construction and the last checkpoint() call.

Behavior is undefined if checkpoint() has not been called.

Returns: See above.

◆ thread_cpu_clock_types()

Clock_types_subset flow::perf::Checkpointing_timer::thread_cpu_clock_types ( )

static

Returns the bit-set containing only those Clock_types enabled that measure passage of processor time by the calling thread.

Rationale

See real_clock_types().

See also: perf::Clock_type

Returns: See above.

Friends And Related Function Documentation

◆ operator<<()

std::ostream & operator<<	(	std::ostream &	os,
		const Checkpointing_timer &	timer
	)

Note this is multi-line output that does not end in newline.

Parameters

os	Stream to which to write.
timer	Object to serialize.

Returns: os.

The documentation for this class was generated from the following files:

Classes

Public Member Functions

Static Public Member Functions

Public Attributes

Related Functions

Detailed Description

How to use

Performance

Thread safety

Aggregation

Lowest-overhead aggregation approach: Single Checkpointing_timer

Cross-product aggregation approach: Multiple Checkpointing_timers + Checkpointing_timer::Aggregator

Constructor & Destructor Documentation

◆ Checkpointing_timer() [1/2]

◆ Checkpointing_timer() [2/2]

Member Function Documentation

◆ checkpoint()

Naming vs. performance

Logging

◆ checkpoints()

◆ now() [1/3]

Rationale

◆ now() [2/3]

◆ now() [3/3]

◆ output()

◆ process_cpu_clock_types()

Rationale

◆ real_clock_types()

Rationale

◆ scale()

◆ since_start()

◆ thread_cpu_clock_types()

Rationale

Friends And Related Function Documentation

◆ operator<<()

Cross-product aggregation approach: Multiple `Checkpointing_timer`s + Checkpointing_timer::Aggregator