Flow 1.0.0
Flow project: Full implementation reference.
|
Flow module containing tools for profiling and optimization. More...
Classes | |
class | Checkpointing_timer |
The central class in the perf Flow module, this efficiently times the user's operation, with a specified subset of timing methods; and with the optional ability to time intermediate checkpoints within the overall operation. More... | |
struct | Duration_set |
Convenience wrapper around an array<Duration, N> , which stores a duration for each of the N possible clock types in perf::Clock_type. More... | |
struct | Time_pt_set |
Convenience wrapper around an array<Time_pt, N> , which stores a time point for each of the N possible clock types in perf::Clock_type. More... | |
Typedefs | |
using | Time_pt = Fine_time_pt |
Short-hand for a high-precision boost.chrono point in time, formally equivalent to flow::Fine_time_pt. More... | |
using | Duration = Fine_duration |
Short-hand for a high-precision boost.chrono duration, formally equivalent to flow::Fine_duration. More... | |
using | duration_rep_t = Duration::rep |
The raw type used in Duration to store its clock ticks. More... | |
using | Clock_types_subset = std::bitset< size_t(Clock_type::S_END_SENTINEL)> |
Short-hand for a bit-set of N bits which represents the presence or absence of each of the N possible clock types in perf::Clock_type. More... | |
using | Checkpointing_timer_ptr = boost::shared_ptr< Checkpointing_timer > |
Short-hand for ref-counting pointer to Checkpointing_timer. More... | |
Enumerations | |
enum class | Clock_type : size_t { S_REAL_HI_RES = 0 , S_CPU_USER_LO_RES , S_CPU_SYS_LO_RES , S_CPU_TOTAL_HI_RES , S_CPU_THREAD_TOTAL_HI_RES , S_END_SENTINEL } |
Clock types supported by flow::perf module facilities, perf::Checkpointing_timer in particular. More... | |
Functions | |
std::ostream & | operator<< (std::ostream &os, const Checkpointing_timer::Checkpoint &checkpoint) |
Prints string representation of the given Checkpoint to the given ostream . More... | |
std::ostream & | operator<< (std::ostream &os, const Checkpointing_timer &timer) |
Prints string representation of the given Checkpointing_timer (whether with original data or an aggregated-result timer) to the given ostream . More... | |
Duration_set | operator- (const Time_pt_set &to, const Time_pt_set &from) |
Returns a Duration_set representing the time that passed since from to to (negative if to happened earlier), for each Clock_type stored. More... | |
Duration_set & | operator+= (Duration_set &target, const Duration_set &to_add) |
Advances each Duration in the target Duration_set by the given respective addend Duration s (negative Duration causes advancing backwards). More... | |
Time_pt_set & | operator+= (Time_pt_set &target, const Duration_set &to_add) |
Advances each Time_pt in the target Time_pt_set by the given respective addend Duration s (negative Duration causes advancing backwards). More... | |
Duration_set & | operator*= (Duration_set &target, uint64_t mult_scale) |
Scales each Duration in the target Duration_set by the given numerical constant. More... | |
Duration_set & | operator/= (Duration_set &target, uint64_t div_scale) |
Divides each Duration in the target Duration_set by the given numerical constant. More... | |
std::ostream & | operator<< (std::ostream &os, Clock_type clock_type) |
Prints string representation of the given clock type enum value to the given ostream . More... | |
std::ostream & | operator<< (std::ostream &os, const Duration_set &duration_set) |
Prints string representation of the given Duration_set value to the given ostream . More... | |
template<typename Accumulator , typename Func > | |
auto | timed_function (Clock_type clock_type, Accumulator *accumulator, Func &&function) |
Constructs a closure that times and executes void -returning function() , adding the elapsed time with clock type clock_type – as raw ticks of perf::Duration – to accumulator . More... | |
template<typename Accumulator , typename Func > | |
auto | timed_function_nvr (Clock_type clock_type, Accumulator *accumulator, Func &&function) |
Constructs a closure that times and executes non-void -returning function() , adding the elapsed time with clock type clock_type – as raw ticks of perf::Duration – to accumulator . More... | |
template<typename Accumulator , typename Handler > | |
auto | timed_handler (Clock_type clock_type, Accumulator *accumulator, Handler &&handler) |
Identical to timed_function() but suitable for boost.asio-targeted handler functions. More... | |
Flow module containing tools for profiling and optimization.
As of this writing (around the time the flow::perf Flow module was created) this centers on Checkpointing_timer, a facility for measuring real and processor time elapsed during the arbitrary measured operation. That said, generally speaking, this module is meant to be a "kitchen-sink" set of facilities fitting the sentence at the very top of this doc header.
using flow::perf::Checkpointing_timer_ptr = typedef boost::shared_ptr<Checkpointing_timer> |
Short-hand for ref-counting pointer to Checkpointing_timer.
Original use case is to allow Checkpointing_timer::Aggregator to generate and return Checkpointing_timer objects with minimal headaches for user.
Definition at line 43 of file perf_fwd.hpp.
using flow::perf::Clock_types_subset = typedef std::bitset<size_t(Clock_type::S_END_SENTINEL)> |
Short-hand for a bit-set of N bits which represents the presence or absence of each of the N possible clock types in perf::Clock_type.
This is what we use to represent such things, as it is more compact and (we suspect) faster in typical operations, especially "is clock type T enabled?".
If C is a Clock_types_subset
, and T is a Clock_type
, then bit C[size_t(T)] is true
if and only if T is in C.
Potential gotcha: bit-sets are indexed right-to-left (LSB-to-MSB); so if the 0th (in enum
) clock type is enabled and others are disabled, then a print-out of such a Clock_types_subset would be 0...0001, not 1000...0. So watch out when reading logs.
Definition at line 164 of file clock_type_fwd.hpp.
using flow::perf::Duration = typedef Fine_duration |
Short-hand for a high-precision boost.chrono duration, formally equivalent to flow::Fine_duration.
The alias exists 1/2 for brevity, 1/2 to declare the standardly-used duration type in flow::perf Flow module.
Definition at line 39 of file clock_type_fwd.hpp.
using flow::perf::duration_rep_t = typedef Duration::rep |
The raw type used in Duration to store its clock ticks.
It is likely int64_t
, but try not to rely on that directly.
Useful, e.g., in atomic<duration_rep_t>
, when one wants to perform high-performance operations like +=
and fetch_add()
on atomic<>
s: these do not exist for chrono::duration
, because the latter is not an integral type.
Definition at line 50 of file clock_type_fwd.hpp.
using flow::perf::Time_pt = typedef Fine_time_pt |
Short-hand for a high-precision boost.chrono point in time, formally equivalent to flow::Fine_time_pt.
The alias exists 1/2 for brevity, 1/2 to declare the standardly-used time point type in flow::perf Flow module.
Definition at line 33 of file clock_type_fwd.hpp.
|
strong |
Clock types supported by flow::perf module facilities, perf::Checkpointing_timer in particular.
These are used, among other things, as array/vector
indices and therefore numerically equal 0, 1, .... Clock_type::S_END_SENTINEL is an invalid clock type whose numerical value equals the number of clock types available.
S_REAL_HI_RES
would always be preferable. Nevertheless it would be interesting to "officially" see its characteristics including in particular (1) resolution and (2) its own perf cost especially vs. S_REAL_HI_RES
which we know is quite fast itself. This may also help a certain to-do listed as of this writing in the doc header of flow::log FLOW_LOG_WITHOUT_CHECKING() (the main worker bee of the log system, the one that generates each log time stamp). Enumerator | |
---|---|
S_REAL_HI_RES | Measures real time (not processor time), using the highest-resolution system clock available that guarantees steady, monotonic time passage with no discontinuities. In POSIX Obervations, informal suggestionsOf all clocks observed so far, it has the best resolution and also is the cheapest computationally itself. However, it measures real time, so (for example) another thread or process pegging the processor concurrently can affect the time being measured. That, in particular, is not necessarily a problem in test rigs; but even so it cannot measure (for example) how costly one thread is over another; nor does it separate idle time from computing time from I/O time from.... Due to the high resolution and low computational cost, one should strive to use this clock whenever possible; but it is not always possible (as additional info may be required, as outlined just above). |
S_CPU_USER_LO_RES | Along with In POSIX Obervations, informal suggestions
See discussion on |
S_CPU_SYS_LO_RES | Counterpart of |
S_CPU_TOTAL_HI_RES | Measures processor time (user- and kernel-level total) spent by the current process; this is the higher-resolution process-time facility. In POSIX Obervations, informal suggestionsFirstly see Processor time actually measures processor cycles being spent to make computations. (I/O ops and idle time are not counted.) Every cycle spent by any processor core is either charged to this process or another process; if the former then it's counted; otherwise it isn't. Next, the cycle count is multiplied by the its standard constant time duration (which is based directly on the clock frequency, the GHz thing). That is the result. Multiple threads acting concurrently would all count if present, so remember that. Further, it is apparently not straightforward what the system will charge to process A vs. process B. For this reason, processor-time results of very short operations (on the order of, like, a few system calls, say) are notoriously inconsistent: you should strive to measure longer operations, or operation repeated many times in a row. This stands in stark contrast to See also |
S_CPU_THREAD_TOTAL_HI_RES | Similar to In POSIX Obervations, informal suggestionsSee |
S_END_SENTINEL | Final, invalid clock type; its numerical value equals the number of clocks currently supported. If you add or remove clock types, other places (in addition to this |
Definition at line 66 of file clock_type_fwd.hpp.
Duration_set & operator*= | ( | Duration_set & | target, |
uint64_t | mult_scale | ||
) |
Scales each Duration
in the target Duration_set by the given numerical constant.
operator*=(Duration_set)
by a potentially negative number; same for division.target | The set of Duration s each of which may be modified. |
mult_scale | Constant by which to multiply each target Duration . |
target
to enable standard *=
semantics. Definition at line 58 of file clock_type.cpp.
Duration_set & operator+= | ( | Duration_set & | target, |
const Duration_set & | to_add | ||
) |
Advances each Duration
in the target Duration_set by the given respective addend Duration
s (negative Duration causes advancing backwards).
target | The set of Duration s each of which may be modified. |
to_add | The set of Duration s each of which is added to a target Duration . |
target
to enable standard +=
semantics. Definition at line 38 of file clock_type.cpp.
Time_pt_set & flow::perf::operator+= | ( | Time_pt_set & | target, |
const Duration_set & | to_add | ||
) |
Advances each Time_pt
in the target Time_pt_set by the given respective addend Duration
s (negative Duration
causes advancing backwards).
target | The set of Time_pt s each of which may be modified. |
to_add | The set of Duration s each of which is added to a target Time_pt . |
target
to enable standard +=
semantics. Definition at line 48 of file clock_type.cpp.
References flow::perf::Duration_set::m_values, and flow::perf::Time_pt_set::m_values.
Duration_set operator- | ( | const Time_pt_set & | to, |
const Time_pt_set & | from | ||
) |
Returns a Duration_set representing the time that passed since from
to to
(negative if to
happened earlier), for each Clock_type
stored.
to | The minuend set of time points. |
from | The subtrahend set of time points. |
Definition at line 26 of file clock_type.cpp.
Duration_set & operator/= | ( | Duration_set & | target, |
uint64_t | div_scale | ||
) |
Divides each Duration
in the target Duration_set by the given numerical constant.
target | The set of Duration s each of which may be modified. |
div_scale | Constant by which to divide each target Duration . |
target
to enable standard /=
semantics. Definition at line 68 of file clock_type.cpp.
std::ostream & flow::perf::operator<< | ( | std::ostream & | os, |
Clock_type | clock_type | ||
) |
Prints string representation of the given clock type enum
value to the given ostream
.
os | Stream to which to write. |
clock_type | Object to serialize. |
os
. Definition at line 78 of file clock_type.cpp.
References S_CPU_SYS_LO_RES, S_CPU_THREAD_TOTAL_HI_RES, S_CPU_TOTAL_HI_RES, S_CPU_USER_LO_RES, S_END_SENTINEL, and S_REAL_HI_RES.
std::ostream & operator<< | ( | std::ostream & | os, |
const Checkpointing_timer & | timer | ||
) |
Prints string representation of the given Checkpointing_timer
(whether with original data or an aggregated-result timer) to the given ostream
.
Note this is multi-line output that does not end in newline.
os | Stream to which to write. |
timer | Object to serialize. |
os
. Definition at line 340 of file checkpt_timer.cpp.
std::ostream & operator<< | ( | std::ostream & | os, |
const Checkpointing_timer::Checkpoint & | checkpoint | ||
) |
Prints string representation of the given Checkpoint
to the given ostream
.
See Checkpointing_timer::checkpoint() and Checkpointing_timer::checkpoints().
os | Stream to which to write. |
checkpoint | Object to serialize. |
os
. Definition at line 335 of file checkpt_timer.cpp.
std::ostream & operator<< | ( | std::ostream & | os, |
const Duration_set & | duration_set | ||
) |
Prints string representation of the given Duration_set value to the given ostream
.
os | Stream to which to write. |
duration_set | Object to serialize. |
os
. Definition at line 93 of file clock_type.cpp.
auto flow::perf::timed_function | ( | Clock_type | clock_type, |
Accumulator * | accumulator, | ||
Func && | function | ||
) |
Constructs a closure that times and executes void
-returning function()
, adding the elapsed time with clock type clock_type
– as raw ticks of perf::Duration – to accumulator
.
Consider other overload(s) and similarly named functions as well. With this one you get:
function()
is treated as returning void
(any return value is ignored).function()
is a generally-used timed function: not necessarily a boost.asio
or flow::async handler. Any associated executor (such as a strand
) will be lost. See timed_handler(), if you have a handler.+=(duration_rep_t)
, where perf::duration_rep_t is – as a reminder – a raw integer type like int64_t
. If accumulation may occur in a multi-threaded situation concurrently, this can improve performance vs. using an explicit lock, if one uses Accumulator
= atomic<duration_rep_t>
.chrono
-style type safety: It is up to you to interpret the *accumulator
-stored ticks as their appropriate units.Time a function that happens to take a couple of args. Don't worry about the timing also happening concurrenty: not using atomic
.
Same thing but with an atomic
to support timing/execution occuring concurrently:
Accumulator A
type requirements/recommendationsIt must have A += duration_rep_t(...)
. This operation must be safe for concurrent execution with itself, if timed_function() is potentially used concurrently. In that case consider atomic<duration_rep_t>
. If concurrency is not a concern, you can just use duration_rep_t
to avoid the strict-ordering overhead involved in atomic
plus-equals operation.
Accumulator
is understood to store raw ticks of Duration – not actual Duration – for performance reasons (to wit: so that atomic
plus-equals can be made use of, if it exists). If you need a Duration ultimately – and for type safety you really should – it is up to you to construct a Duration from the accumulated duration_rep_t
. This is trivially done via the Duration(duration_rep_t)
constructor.
atomic<duration_rep_t>
, uses +=
for accumulation which may be lock-free but uses strict ordering; a version that uses fetch_add()
with relaxed ordering may be desirable for extra performance at the cost of not-always-up-to-date accumulation results in all threads. As of this writing this can be done by the user by providing a custom type that defines +=
as explicitly using fetch_add()
with relaxed ordering; but we could provide an API for this.Clock_type
, but simultaneous multi-clock timing using the perf::Clock_types_subset paradigm (as used, e.g., in Checkpointing_timer) would be a useful and consistent API. E.g., one could measure user and system elapsed time simultaneously. As of this writing this only does not exist due to time constraints: a perf-niggardly version targeting one clock type was necessary.Accumulator | Integral accumulator of clock ticks. See above for details. |
Func | A function that is called void -style taking any arbitrary number of args, possibly none. |
clock_type | The type of clock to use for timing function() . |
accumulator | The accumulator to add time elapsed when calling function() to. See instructions above regarding concurrency, atomic , etc. |
function | The function to execute and time. |
function()
, adding the elapsed time to accumulator
. Definition at line 32 of file timed_function.hpp.
References flow::perf::Checkpointing_timer::now().
Referenced by flow::async::Timed_concurrent_task_loop_impl< Time_accumulator >::post(), flow::async::Timed_concurrent_task_loop_impl< Time_accumulator >::schedule_at(), flow::async::Timed_concurrent_task_loop_impl< Time_accumulator >::schedule_from_now(), and timed_handler().
auto flow::perf::timed_function_nvr | ( | Clock_type | clock_type, |
Accumulator * | accumulator, | ||
Func && | function | ||
) |
Constructs a closure that times and executes non-void
-returning function()
, adding the elapsed time with clock type clock_type
– as raw ticks of perf::Duration – to accumulator
.
"Nvr" stands for non-void
-returning.
Consider other overload(s) and similarly named functions as well. With this one you get:
function()
is treated as returning non-void
(any return value returned by it is then returned by the returned closure accordingly).function()
cannot be a boost.asio
handler, which are always void
-returning. So there is no timed_handler() counterpart to the present function.Similar to the 2nd example in timed_function() doc header: Time a function that happens to take a couple of args, allowing for concurrency by using an atomic
. The difference: timed_func()
returns a value.
Accumulator A
type requirements/recommendationsSee timed_function().
Accumulator | See timed_function(). |
Func | A function that is called non-void -style taking any arbitrary number of args, possibly none. |
clock_type | The type of clock to use for timing function() . |
accumulator | The accumulator to add time elapsed when calling function() to. See instructions above regarding concurrency, atomic , etc. |
function | The function to execute and time. |
function()
, adding the elapsed time to accumulator
. Definition at line 47 of file timed_function.hpp.
References flow::perf::Checkpointing_timer::now().
auto flow::perf::timed_handler | ( | Clock_type | clock_type, |
Accumulator * | accumulator, | ||
Handler && | handler | ||
) |
Identical to timed_function() but suitable for boost.asio-targeted handler functions.
In other words, if you want to post(handler)
or async_...(handler)
in a boost.asio Task_engine
, and you'd like to time handler()
when it is executed by boost.asio, then use timed_handler(..., handler)
.
Consider other overload(s) and similarly named functions as well. With this one you get:
handler()
is a boost.asio
or flow::async handler.timed_function(handler)
would "work" too, in that it would compile and at a first glance appear to work fine. The problem: If handler
is bound to an executor – most commonly a boost.asio strand (util::Strand) – then using timed_function() would "unbind it." So it it was bound to Strand S
, meant to make certain handler()
never executed concurrently with other handlers bound to S
, then that constraint would (silently!) no longer be observed – leading to terrible intermittent concurrency bugs. void
(meaning anything else they might return is ignored). Hence there is no timed_handler_nvr()
, even though there is a timed_function_nvr().Similar to the 2nd example in timed_function() doc header: Time a function that happens to take a couple of args, allowing for concurrency by using an atomic
. The difference: it is first bound to a strand. In this case we post()
the handler, so it takes no args in this example. However, if used with, say, boost::asio::ip::tcp::socket::async_read_some()
, it would take args such as bytes-received and error code.
Accumulator A
type requirements/recommendationsSee timed_function().
Accumulator | See timed_function(). |
Handler | Handler meant to be post() ed or otherwise async-executed on a Task_engine . Can take any arbitrary number of args, possibly none. |
clock_type | See timed_function(). |
accumulator | See timed_function(). |
handler | The handler to execute and time. |
handler()
, adding the elapsed time to accumulator
; bound to the same executor (if any; e.g., a util::Strand) to which handler
is bound. Definition at line 33 of file timed_handler.hpp.
References timed_function().
Referenced by flow::async::Timed_concurrent_task_loop_impl< Time_accumulator >::asio_handler_timed().