Flow 1.0.2
Flow project: Full implementation reference.
perf_fwd.hpp
Go to the documentation of this file.
1/* Flow
2 * Copyright 2023 Akamai Technologies, Inc.
3 *
4 * Licensed under the Apache License, Version 2.0 (the
5 * "License"); you may not use this file except in
6 * compliance with the License. You may obtain a copy
7 * of the License at
8 *
9 * https://www.apache.org/licenses/LICENSE-2.0
10 *
11 * Unless required by applicable law or agreed to in
12 * writing, software distributed under the License is
13 * distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
14 * CONDITIONS OF ANY KIND, either express or implied.
15 * See the License for the specific language governing
16 * permissions and limitations under the License. */
17
18/// @file
19#pragma once
20
22
23/**
24 * Flow module containing tools for profiling and optimization.
25 *
26 * As of this writing (around the time the flow::perf Flow module was created) this centers on Checkpointing_timer,
27 * a facility for measuring real and processor time elapsed during the arbitrary measured operation. That said,
28 * generally speaking, this module is meant to be a "kitchen-sink" set of facilities fitting the sentence at the
29 * very top of this doc header.
30 */
31namespace flow::perf
32{
33// Types.
34
35// Find doc headers near the bodies of these compound types.
36
37class Checkpointing_timer;
38
39/**
40 * Short-hand for ref-counting pointer to Checkpointing_timer. Original use case is to allow
41 * Checkpointing_timer::Aggregator to generate and return Checkpointing_timer objects with minimal headaches for user.
42 */
43using Checkpointing_timer_ptr = boost::shared_ptr<Checkpointing_timer>;
44
45// Free functions.
46
47/**
48 * Constructs a closure that times and executes `void`-returning `function()`, adding the elapsed time with
49 * clock type `clock_type` -- as raw ticks of perf::Duration -- to `accumulator`.
50 *
51 * Consider other overload(s) and similarly named functions as well. With this one you get:
52 * - `function()` is treated as returning `void` (any return value is ignored).
53 * - `function()` is a generally-used timed function: not necessarily a `boost.asio` or flow::async *handler*.
54 * Any associated executor (such as a `strand`) *will* be lost. See timed_handler(), if you have a handler.
55 * - One specific perf::Clock_type, not some subset given as perf::Clock_types_subset. For performance this may
56 * be significant, even though operations on the latter are still light-weight.
57 * - Accumulation (the plus-equals operation) done by performing `+=(duration_rep_t)`, where perf::duration_rep_t
58 * is -- as a reminder -- a raw integer type like `int64_t`. If accumulation may occur in a multi-threaded
59 * situation concurrently, this can improve performance vs. using an explicit lock, if one
60 * uses `Accumulator` = `atomic<duration_rep_t>`.
61 * - Lack of `chrono`-style type safety: It is up to you to interpret the `*accumulator`-stored ticks as their
62 * appropriate units.
63 *
64 * ### Synopsis/examples ###
65 * Time a function that happens to take a couple of args. Don't worry about the timing also happening concurrenty:
66 * not using `atomic`.
67 *
68 * ~~~
69 * flow::perf::duration_rep_t accumulated_ticks(0);
70 * const auto timed_func
71 * = flow::perf::timed_function
72 * (flow::perf::Clock_type::S_CPU_THREAD_TOTAL_HI_RES, &accumulated_ticks,
73 * [](int x, int y) { for (auto i = 0; i < (x * y); ++i) {} });
74 * // ...
75 * // Later, run it -- this will add to accumulated_ticks. Can do this many times but not concurrently.
76 * timed_func(7, 7); // Note it can only be called void-style.
77 * // ...
78 * // Later, here's the result. Note the construction from type-unsafe ticks to type-safe Duration.
79 * const flow::perf::Duration total_dur(accumulated_ticks);
80 * // Can convert to whatever units type-safely now (duration_cast<> in this case allows for precision loss).
81 * const auto total_dur_us = chrono::duration_cast<chrono::microseconds>(total_dur);
82 * ~~~
83 *
84 * Same thing but with an `atomic` to support timing/execution occuring concurrently:
85 *
86 * ~~~
87 * std::atomic<flow::perf::duration_rep_t> accumulated_ticks(0);
88 * const auto timed_func
89 * = flow::perf::timed_function
90 * (flow::perf::Clock_type::S_CPU_THREAD_TOTAL_HI_RES, &accumulated_ticks,
91 * [](int x, int y) { for (auto i = 0; i < (x * y); ++i) {} });
92 * // ...
93 * // Later, run it -- this will add to accumulated_ticks. Can do this many times *and* concurrently in N threads.
94 * timed_func(7, 7); // Note it can only be called void-style.
95 * // ...
96 * // Later, here's the result. Note the construction from type-unsafe ticks to type-safe Duration.
97 * const flow::perf::Duration total_dur(accumulated_ticks);
98 * // Can convert to whatever units type-safely now (duration_cast<> in this case allows for precision loss).
99 * const auto total_dur_us = chrono::duration_cast<chrono::microseconds>(total_dur);
100 * ~~~
101 *
102 * ### `Accumulator A` type requirements/recommendations ###
103 * It must have `A += duration_rep_t(...)`. This operation must be safe for concurrent execution with itself, if
104 * timed_function() is potentially used concurrently. In that case consider `atomic<duration_rep_t>`. If concurrency
105 * is not a concern, you can just use `duration_rep_t` to avoid the strict-ordering overhead involved in `atomic`
106 * plus-equals operation.
107 *
108 * `Accumulator` is understood to store raw ticks of #Duration -- not actual #Duration -- for performance reasons
109 * (to wit: so that `atomic` plus-equals can be made use of, if it exists). If you need a #Duration
110 * ultimately -- and for type safety you really *should* -- it is up to you to construct a #Duration from the
111 * accumulated `duration_rep_t`. This is trivially done via the `Duration(duration_rep_t)` constructor.
112 *
113 * @todo timed_function(), when operating on an `atomic<duration_rep_t>`, uses `+=` for accumulation which may be
114 * lock-free but uses strict ordering; a version that uses `fetch_add()` with relaxed ordering may be desirable
115 * for extra performance at the cost of not-always-up-to-date accumulation results in all threads.
116 * As of this writing this can be done by the user by providing a custom type that defines `+=` as explicitly
117 * using `fetch_add()` with relaxed ordering; but we could provide an API for this.
118 *
119 * @todo timed_function() overload exists for a single `Clock_type`, but simultaneous multi-clock timing using the
120 * perf::Clock_types_subset paradigm (as used, e.g., in Checkpointing_timer) would be a useful and consistent API.
121 * E.g., one could measure user and system elapsed time simultaneously. As of this writing this only does not exist
122 * due to time constraints: a perf-niggardly version targeting one clock type was necessary.
123 *
124 * @tparam Accumulator
125 * Integral accumulator of clock ticks. See above for details.
126 * @tparam Func
127 * A function that is called `void`-style taking any arbitrary number of args, possibly none.
128 * @param clock_type
129 * The type of clock to use for timing `function()`.
130 * @param accumulator
131 * The accumulator to add time elapsed when calling `function()` to. See instructions above regarding
132 * concurrency, `atomic`, etc.
133 * @param function
134 * The function to execute and time.
135 * @return A closure that will time and execute `function()`, adding the elapsed time to `accumulator`.
136 */
137template<typename Accumulator, typename Func>
138auto timed_function(Clock_type clock_type, Accumulator* accumulator, Func&& function);
139
140/**
141 * Constructs a closure that times and executes non-`void`-returning `function()`, adding the elapsed time with
142 * clock type `clock_type` -- as raw ticks of perf::Duration -- to `accumulator`. "Nvr" stands for
143 * non-`void`-returning.
144 *
145 * Consider other overload(s) and similarly named functions as well. With this one you get:
146 * - `function()` is treated as returning non-`void` (any return value returned by it is then returned
147 * by the returned closure accordingly).
148 * - Hence `function()` cannot be a `boost.asio` handler, which are always `void`-returning.
149 * So there is no timed_handler() counterpart to the present function.
150 * - Otherwise identical to the similar timed_function().
151 *
152 * ### Synopsis/examples ###
153 * Similar to the 2nd example in timed_function() doc header: Time a function that happens to take a couple of args,
154 * allowing for concurrency by using an `atomic`. The difference: `timed_func()` returns a value.
155 *
156 * ~~~
157 * std::atomic<flow::perf::duration_rep_t> accumulated_ticks(0);
158 * const auto timed_func
159 * = flow::perf::timed_function_nvr
160 * (flow::perf::Clock_type::S_CPU_THREAD_TOTAL_HI_RES, &accumulated_ticks,
161 * [](int x, int y) -> int { for (auto i = 0; i < (x * y); ++i) {} return i; });
162 * // ...
163 * // Later, run it -- this will add to accumulated_ticks. Can do this many times *and* concurrently in N threads.
164 * const auto result = timed_func(7, 7); // Note it is called non-void-style, with the return value passed-through.
165 * // ...
166 * // Later, here's the result. Note the construction from type-unsafe ticks to type-safe Duration.
167 * const flow::perf::Duration total_dur(accumulated_ticks);
168 * // Can convert to whatever units type-safely now (duration_cast<> in this case allows for precision loss).
169 * const auto total_dur_us = chrono::duration_cast<chrono::microseconds>(total_dur);
170 * ~~~
171 *
172 * ### `Accumulator A` type requirements/recommendations ###
173 * See timed_function().
174 *
175 * @tparam Accumulator
176 * See timed_function().
177 * @tparam Func
178 * A function that is called non-`void`-style taking any arbitrary number of args, possibly none.
179 * @param clock_type
180 * The type of clock to use for timing `function()`.
181 * @param accumulator
182 * The accumulator to add time elapsed when calling `function()` to. See instructions above regarding
183 * concurrency, `atomic`, etc.
184 * @param function
185 * The function to execute and time.
186 * @return A closure that will time and execute `function()`, adding the elapsed time to `accumulator`.
187 */
188template<typename Accumulator, typename Func>
189auto timed_function_nvr(Clock_type clock_type, Accumulator* accumulator, Func&& function);
190
191/**
192 * Identical to timed_function() but suitable for boost.asio-targeted handler functions. In other words, if you want
193 * to `post(handler)` or `async_...(handler)` in a boost.asio `Task_engine`, and you'd like to time `handler()` when
194 * it is executed by boost.asio, then use `timed_handler(..., handler)`.
195 *
196 * Consider other overload(s) and similarly named functions as well. With this one you get:
197 * - `handler()` is a `boost.asio` or flow::async *handler*.
198 * - Otherwise identical to the similar timed_function().
199 *
200 * @note This is suitable for using the Flow-recommended boost.asio wrapper/helper API, flow::async.
201 * @warning Using `timed_function(handler)` would "work" too, in that it would compile and at a first glance appear to
202 * work fine. The problem: If `handler` is bound to an executor -- most commonly a boost.asio strand
203 * (util::Strand) -- then using timed_function() would "unbind it." So it it was bound to `Strand S`, meant
204 * to make certain `handler()` never executed concurrently with other handlers bound to `S`, then that
205 * constraint would (silently!) no longer be observed -- leading to terrible intermittent concurrency bugs.
206 * @note boost.asio handlers always return `void` (meaning anything else they might return is ignored). Hence there is
207 * no `timed_handler_nvr()`, even though there is a timed_function_nvr().
208 *
209 * ### Synopsis/examples ###
210 * Similar to the 2nd example in timed_function() doc header: Time a function that happens to take a couple of args,
211 * allowing for concurrency by using an `atomic`. The difference: it is first bound to a strand.
212 * In this case we `post()` the handler, so it takes no args in this example. However, if used with, say,
213 * `boost::asio::ip::tcp::socket::async_read_some()`, it would take args such as bytes-received and error code.
214 *
215 * ~~~
216 * flow::util::Task_engine multi_threaded_engine; // boost.asio Task_engine later associated with 2+ threads.
217 * // ...
218 * // Strand guaranteeing non-concurrency for any handler functions bound to it, perhaps pertaining to HTTP request R.
219 * flow::util::Strand this_request_strand(multi_threaded_engine);
220 * std::atomic<flow::perf::duration_rep_t> accumulated_ticks(0);
221 * auto timed_hnd
222 * = flow::perf::timed_handler
223 * (flow::perf::Clock_type::S_CPU_THREAD_TOTAL_HI_RES, &accumulated_ticks,
224 * boost::asio::bind_executor(this_request_strand,
225 * []() { for (unsigned int i = 0; i < 1000000; ++i) {} });
226 * // Post it for ASAP execution -- *when* it asynchronously executed in some thread, will add to accumulated_ticks.
227 * // timed_hnd() is bound to this_request_strand, because the function we passed to timed_handler() was so bound.
228 * boost::asio::post(multi_threaded_engine, timed_hnd);
229 * // ...
230 * // Later, here's the result. Note the construction from type-unsafe ticks to type-safe Duration.
231 * const flow::perf::Duration total_dur(accumulated_ticks);
232 * // Can convert to whatever units type-safely now (duration_cast<> in this case allows for precision loss).
233 * const auto total_dur_us = chrono::duration_cast<chrono::microseconds>(total_dur);
234 * ~~~
235 *
236 * ### `Accumulator A` type requirements/recommendations ###
237 * See timed_function().
238 *
239 * @tparam Accumulator
240 * See timed_function().
241 * @tparam Handler
242 * Handler meant to be `post()`ed or otherwise async-executed on a `Task_engine`. Can take any arbitrary number
243 * of args, possibly none.
244 * @param clock_type
245 * See timed_function().
246 * @param accumulator
247 * See timed_function().
248 * @param handler
249 * The handler to execute and time.
250 * @return A closure that will time and execute `handler()`, adding the elapsed time to `accumulator`; bound
251 * to the same executor (if any; e.g., a util::Strand) to which `handler` is bound.
252 */
253template<typename Accumulator, typename Handler>
254auto timed_handler(Clock_type clock_type, Accumulator* accumulator, Handler&& handler);
255
256/**
257 * Prints string representation of the given `Checkpointing_timer` (whether with original data or an
258 * aggregated-result timer) to the given `ostream`. Note this is multi-line output that does *not* end in newline.
259 *
260 * @relatesalso Checkpointing_timer
261 *
262 * @param os
263 * Stream to which to write.
264 * @param timer
265 * Object to serialize.
266 * @return `os`.
267 */
268std::ostream& operator<<(std::ostream& os, const Checkpointing_timer& timer);
269
270} // namespace flow::perf
The central class in the perf Flow module, this efficiently times the user's operation,...
Flow module containing tools for profiling and optimization.
boost::shared_ptr< Checkpointing_timer > Checkpointing_timer_ptr
Short-hand for ref-counting pointer to Checkpointing_timer.
Definition: perf_fwd.hpp:43
auto timed_function(Clock_type clock_type, Accumulator *accumulator, Func &&function)
Constructs a closure that times and executes void-returning function(), adding the elapsed time with ...
auto timed_function_nvr(Clock_type clock_type, Accumulator *accumulator, Func &&function)
Constructs a closure that times and executes non-void-returning function(), adding the elapsed time w...
std::ostream & operator<<(std::ostream &os, const Checkpointing_timer::Checkpoint &checkpoint)
Prints string representation of the given Checkpoint to the given ostream.
auto timed_handler(Clock_type clock_type, Accumulator *accumulator, Handler &&handler)
Identical to timed_function() but suitable for boost.asio-targeted handler functions.
Clock_type
Clock types supported by flow::perf module facilities, perf::Checkpointing_timer in particular.