Flow 2.0.0
Flow project: Full implementation reference.
thread_lcl.hpp
Go to the documentation of this file.
1/* Flow
2 * Copyright 2023 Akamai Technologies, Inc.
3 *
4 * Licensed under the Apache License, Version 2.0 (the
5 * "License"); you may not use this file except in
6 * compliance with the License. You may obtain a copy
7 * of the License at
8 *
9 * https://www.apache.org/licenses/LICENSE-2.0
10 *
11 * Unless required by applicable law or agreed to in
12 * writing, software distributed under the License is
13 * distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
14 * CONDITIONS OF ANY KIND, either express or implied.
15 * See the License for the specific language governing
16 * permissions and limitations under the License. */
17
18/// @file
19#pragma once
20
21#include <flow/util/util.hpp>
23#include <flow/log/log.hpp>
24#include <boost/thread.hpp>
25#include <boost/unordered_map.hpp>
26#include <boost/shared_ptr.hpp>
27#include <boost/weak_ptr.hpp>
28#include <string>
29#include <vector>
30#include <typeinfo>
31#include <type_traits>
32
33namespace flow::util
34{
35// Types.
36
37/**
38 * Similar to `boost::thread_specific_ptr<T>` but with built-in lazy-init semantics; and more importantly on
39 * destruction deletes any outstanding `T`s belonging to threads that are still up; plus allows iteration
40 * through all per-thread data.
41 *
42 * An object of this type manages thread-local data as encapsulated in the user-supplied-as-template-arg type,
43 * each of which is instantiated (constructed) as needed for each given thread (on first this_thread_state() accessor
44 * call in that thread); and cleaned (via destructor) at each thread's exit or `*this` destruction -- whichever
45 * occurs earlier.
46 *
47 * ### Overview/rationale ###
48 * Fundamentally `Thread_local_state_registry<T>` is quite similar to `boost::thread_specific_ptr<T>`.
49 * There are some minor differences (it is more rigid, always using `new` and `delete` instead of leaving it
50 * to the user; mandating a lazy-initialization semantic instead of leaving it to the user;
51 * disallowing any reset to null and back from null), but these are just happenstance/preference-based.
52 * Likely we'd have just used the Boost guy, if that's all we wanted.
53 *
54 * The main reason `Thread_local_state_registry<T>` exists comprises the following (mutually related) features:
55 * -# The ability to look at all per-thread data accumulated so far (from any
56 * thread). See state_per_thread() accessor (+ while_locked()).
57 * -# If `~Thread_local_state_registry` (a `*this` dtor) executes before a given thread X (that has earlier
58 * caused the creation of a thread-local `T`, by calling `this->this_thread_state()` from X) is joined (exits),
59 * then:
60 * - That dtor, from whichever thread invoked it, deletes that thread-local `T` (for all `T`).
61 *
62 * Feature 1 us a clear value-add over `thread_specific_ptr` (or just `static thread_local`). If one can get by
63 * without dealing with the set of per-thread objects from any 1 thread at a given time, that's good; it will involve
64 * far fewer corner cases and worries. Unfortunately it is not always possible to do that. In that case you want
65 * a *registry* of your `Thread_local_state`s; a `*this` provides this.
66 *
67 * As for feature 2: Consider `static thread_specific_ptr` (or `static thread_local`, broadly speaking, without getting
68 * into the formal details of C++ language guarantees as to how such per-thread items are cleaned up). By definition
69 * of `static` the `thread_specific_ptr` will outlive any threads to have been spawned by the time `main()` exits.
70 * Therefore an implicit `.reset()` will execute when extant threads are joined, and each thread-local object will
71 * be cleaned up. No problem! However, a non-`static thread_specific_ptr` offers no such behavior or guarantee:
72 * If the `~thread_specific_ptr` dtor runs in thread X, at most that thread's TL object shall be auto-`.reset()`.
73 * The other extant threads' TL objects will live on (leak). (Nor can one iterate through them; that would be
74 * feature 1.)
75 *
76 * However a `*this` being destroyed in thread X will cause an automatic looping through the extant threads'
77 * objects (if any) and their cleanup as well. So if you want that, use a `*this` instead of
78 * non-`static thread_specific_ptr`.
79 *
80 * As a secondary reason (ignoring the above 2 features) `Thread_local_state_registry` has a more straightforward/rigid
81 * API that enforces certain assumptions/conventions (some of this is mentioned above). These might be appealing
82 * depending on one's taste/reasoning.
83 *
84 * ### How to use ###
85 * Ensure the template-arg #Thread_local_state has a proper destructor; and either ensure it has default (no-arg) ctor
86 * (in which case it'll be created via `new Thread_local_state`), or assign #m_create_state_func (to choose
87 * a method of creation yourself).
88 *
89 * Addendum: the default is `new Thread_local_state{this->get_logger()}` if and only if `Thread_local_state*` is
90 * convertible to `Log_context_mt*` (i.e., the latter is a `public` super-class of the former).
91 * See also Logging section below.
92 *
93 * From any thread where you need a #Thread_local_state, call `this->this_thread_state()`. The first time
94 * in a given thread, this shall perform and save `new Thread_local_state`; subsequent times it shall return
95 * the same pointer. (You can also save the pointer and reuse it; just be careful.)
96 *
97 * A given thread's #Thread_local_state object shall be deleted via `delete Thread_local_state` when one of
98 * the following occurs, whichever happens first:
99 * - The thread exits (is joined). (Deletion occurs shortly before.)
100 * - `*this` is destroyed (in some -- any -- thread, possibly a totally different one; or one of the ones
101 * for which this_thread_state() was called).
102 *
103 * That's it. It is simple. However we emphasize that using this properly may take some care: particularly
104 * as concerns the contents of the #Thread_local_state dtor (whether auto-generated or explicit or some mix), which
105 * might run from the relevant thread X in which the underlying object was created and used; but it might
106 * also instead run from whichever thread invokes the Thread_local_state_registry dtor first.
107 *
108 * See doc header for ~Thread_local_state_registry() (dtor) for more on that particular subject.
109 *
110 * ### How to iterate over/possibly modify other threads' data (safely) ###
111 * In short please see state_per_thread() doc header.
112 *
113 * Also, particularly if you are employing some variation on the "thread-cached access to central state" pattern,
114 * it is potentially critical to read the relevant notes in the this_thread_state() doc header.
115 *
116 * ### Logging ###
117 * Logging is somewhat more subtle than is typically the case, because a Thread_local_state_registry is often
118 * declared `static` or even global, which means a log::Logger might not be available at times such as before `main()`,
119 * after `main()`, or inside `main()` but outside when some main `Logger` is available. Therefore it is
120 * often useful to, e.g., start with `Logger* = nullptr`, change it to something else, then change it back.
121 *
122 * Please use super-class log::Log_context_mt::set_logger() accordingly.
123 *
124 * However to avoid any trouble if this_thread_state() is called during a `Logger` change:
125 * - Note that it is `Log_context_mt`, not `Log_context`, so this is thread-safe.
126 * - Internally, perf-wise, we take steps to avoid this having any appreciable effect on fast-path performance.
127 * - If and only if `Thread_local_state*` is convertible to `Log_context_mt*` (i.e., the latter is a `public`
128 * super-class of the former), set_logger() shall invoke `state->set_logger()` to each `state` in
129 * state_per_thread() (i.e., all `Thread_local_state`s currently extant).
130 * - Note that you can override `Log_context_mt::set_logger()` in your `Thread_local_state` so as to, e.g.,
131 * further propagate the new logger to other parts of `Thread_local_state` internals.
132 *
133 * @tparam Thread_local_state_t
134 * Managed object type -- see above. We repeat: must have no-arg (default) ctor, or be compatible with your
135 * custom #m_create_state_func; dtor must perform
136 * appropriate cleanup which in particular shall be run from exactly one of exactly the following 2 contexts:
137 * (1) from the thread in which it was created via this_thread_state(), just before the thread exits;
138 * (2) from within the `*this` dtor from whichever thread that was invoked (which may be the creation-thread;
139 * one of the other creation-threads; or some other thread entirely). See dtor doc header.
140 */
141template<typename Thread_local_state_t>
143 public log::Log_context_mt,
144 private boost::noncopyable
145{
146public:
147 // Types.
148
149 /// Short-hand for template parameter type. See our class doc header for requirements.
150 using Thread_local_state = Thread_local_state_t;
151
152 /// General info (as of this writing for logging only) about a given entry (thread/object) in state_per_thread().
153 struct Metadata
154 {
155 // Data.
156
157 /// Thread nickname as per log::Logger::set_thread_info(). (Reminder: Might equal `m_thread_id`.)
158 std::string m_thread_nickname;
159 /// Thread ID.
161 };
162
163 /// Return type of state_per_thread().
164 using State_per_thread_map = boost::unordered_map<Thread_local_state*, Metadata>;
165
166 /// Short-hand for mutex lock; made public for use in while_locked() and state_per_thread().
168
169 // Constants.
170
171 /**
172 * `true` if and only if #Thread_local_state is a public sub-class of log::Log_context_mt which has
173 * implications on set_logger() and default #m_create_state_func behavior.
174 */
175 static constexpr bool S_TL_STATE_HAS_MT_LOG_CONTEXT = std::is_convertible_v<Thread_local_state*,
177
178 static_assert(!std::is_convertible_v<Thread_local_state*, log::Log_context*>,
179 "Thread_local_state_t template param type should not derive from log::Log_context, as "
180 "then set_logger() is not thread-safe; please use Log_context_mt (but be mindful of "
181 "locking-while-logging perf effects in fast-paths).");
182
183 // Data.
184
185 /**
186 * this_thread_state(), when needing to create a thread's local new #Thread_local_state to return, makes
187 * a stack copy of this member, calls that copy with no args, and uses the `Thread_local_state*` result
188 * as the return value for that thread.
189 *
190 * If, when needed, this value is null (`m_create_state_func.empty() == true`), then:
191 * - If #Thread_local_state is default-ctible and #S_TL_STATE_HAS_MT_LOG_CONTEXT is `false`:
192 * uses `new Thread_local_state`.
193 * - If #Thread_local_state is ctible in form `Thread_local_state{lgr}` where (`lgr` is a `log::Logger*`),
194 * and #S_TL_STATE_HAS_MT_LOG_CONTEXT is `true`:
195 * uses `new Thread_local_state{get_logger()}`.
196 * - Otherwise: Behavior is undefined (assertion may trip at this time).
197 *
198 * `m_create_state_func` must return a pointer that can be `delete`d in standard fashion.
199 *
200 * ### Thread safety ###
201 * It is not safe to assign this while a thread-first this_thread_state() is invoked.
202 */
204
205 // Constructors/destructor.
206
207 /**
208 * Create empty registry. Subsequently you may call this_thread_state() from any thread where you want to use
209 * (when called first time, create) thread-local state (a #Thread_local_state).
210 *
211 * @param logger_ptr
212 * Logger to use for logging subsequently.
213 * @param nickname_str
214 * See nickname().
215 * @param create_state_func
216 * Initial value for #m_create_state_func. Default is an `.empty()` (see member doc header for info).
217 */
218 explicit Thread_local_state_registry(log::Logger* logger_ptr, String_view nickname_str,
219 decltype(m_create_state_func)&& create_state_func = {});
220
221 /**
222 * Deletes each #Thread_local_state to have been created so far by calls to this_thread_state() from various
223 * threads (possibly but not necessarily including this thread).
224 *
225 * ### Careful! ###
226 * No thread (not the calling or any other thread) must access a #Thread_local_state returned from `*this`, once
227 * this dtor begins executing. This is usually pretty natural to guarantee by having
228 * your Thread_local_state_registry properly placed among various private data members and APIs accessing them.
229 *
230 * The dtor in the type #Thread_local_state itself must correctly run from *any* thread.
231 * - For many things that's no problem... just normal C++ data and `unique_ptr`s and such.
232 * - For some resources it might be a problem, namely for resources that are thread-local in nature that must
233 * be explicitly freed via API calls.
234 * Example: flushing a memory manager's thread-cache X created for/in thread T might be only possible
235 * in thread T; while also being a quite-natural thing to do in that thread, during thread T cleanup.
236 * From any other thread it might lead to undefined behavior.
237 * - In this case recall this *fact*:
238 * `~Thread_local_state()` shall run *either* from its relevant thread; *or* from
239 * the daddy `~Thread_local_state_registry()`.
240 * - Usually in the latter case, everything is going down anyway -- hence typically it is not necessary to
241 * specifically clean such per-thread resources as thread-caches.
242 * - So it is simple to:
243 * - Save a data member containing, e.g., `boost::this_thread::get_id()` in #Thread_local_state.
244 * - In its dtor check whether the thread-ID at *that* time equals the saved one. If so -- great,
245 * clean the thing. If not -- just don't (it is probably moot as shown above).
246 * - If it is not moot, you'll have to come up with something clever. Unlikely though.
247 */
249
250 // Methods.
251
252 /**
253 * Returns pointer to this thread's thread-local object, first constructing it via #m_create_state_func if
254 * it is the first `this->this_thread_state()` call in this thread. In a given thread this shall always return
255 * the same pointer.
256 *
257 * The pointee shall be destroyed from `*this` dtor or just before this thread's exit, from this thread, whichever
258 * occurs first. You may not call this, or use the returned pointer, after either routine begins executing.
259 *
260 * ### Thread-caching of central canonical state: Interaction with while_locked() and state_per_thread() ###
261 * The following is irrelevant in the fast-path, wherein this is *not* the first call to this method in the
262 * current thread. It is relevant only in the fast-path, wherein this *is* the first call to this method in the
263 * current thread. In that case we make the following formal guarantee:
264 *
265 * - A ctor of #Thread_local_state is invoked (as you know).
266 * - It is invoked while_locked().
267 *
268 * The most immediate consequence of the latter is simply: Do not call while_locked() inside #Thread_local_state
269 * ctor; it will deadlock. That aside though:
270 *
271 * What's the rationale for this guarantee? Answer: In many cases it does not matter, and other than the last bullet
272 * one would not need to worry about it. It *can* however matter in more complex setups, namely the
273 * pattern "thread-caching of central canonical state." In this pattern:
274 *
275 * - Some kind of *central state* (e.g., *canonical* info being distributed into thread-local caches) must be
276 * - seeded (copied, as a *pull*) into any new #Thread_local_state; and
277 * - updated (copied, as a *push*) into any existing #Thread_local_state, if the canonical state itself is
278 * modified (usually assumed to be rare).
279 * - Suppose one invokes while_locked() whenever modifying the central (canonical) state (perhaps infrequently).
280 * - And we also guarantee it is already in effect inside #Thread_local_state ctor.
281 * - Hence, as is natural, we do the seeding/pulling of the central state inside that ctor.
282 * - In that case while_locked() being active in the call stack (<=> its implied mutex being locked) guarantees
283 * the synchronization of the following state:
284 * - which `Thread_local_state`s exist (in the sense that that they might be returned via
285 * this_thread_state() in the future) in the process;
286 * - the cached copies of the canonical state in all existent (as defined in previous bullet)
287 * `Thread_local_state`s;
288 * - the canonical state (equal to every cached copy!).
289 *
290 * This simplifies thinking about the situation immensely, as to the extent that the central state is distributed
291 * to threads via thread-local `Thread_local_state` objects, the whole thing is monolithic: the state is synchronized
292 * to all relevant threads at all times. That said the following is an important corollary, at least for this
293 * use-case:
294 *
295 * - Assumption: To be useful, the central-state-copy must be accessed by users in relevant threads, probably
296 * via an accessor; so something like: `const auto cool_state_copy = X.this_thread_state()->cool_state()`.
297 * - There are of course variations on this; it could be a method of `Thread_local_state` that uses
298 * the value of the `private` central-state-copy for some computation. We'll use the accessor setup for
299 * exposition purposes.
300 * - Fact: The central-state-copy inside `*X.this_thread_state()` for the current thread can change at any time.
301 * - Therefore: cool_state() accessor, internally, *must* lock/unlock *some* mutex in order to guarantee
302 * synchronization.
303 * - It would be safe for cool_state() to "simply" use while_locked(). However, in any perf-sensitive scenario
304 * (which is essentially guaranteed to be the case: otherwise why setup thread-cached access to the cool-state
305 * in the first place?) this is utterly unacceptable. Now any threads using `->cool_state()` on their
306 * thread-local `Thread_local_state`s will contend for the same central mutex; it defeats the whole purpose.
307 * - Hence the corollary:
308 * - Probably you want to introduce your own additional mutex as a data member of #Thread_local_state. Call
309 * it `m_cool_state_mutex`, say.
310 * - In `->cool_state()` impl, lock it, get the copy of the central-state-copy cool-state, unlock it, return
311 * the copy.
312 * - In the *push* code invoked when the *canonical* central-state is updated -- as noted, this occurs
313 * while_locked() already -- similarly, when pushing to per-thread `Thread_local_state* x`, lock
314 * `x->m_cool_state_mutex`, update the central-state-copy of `*x`, unlock.
315 * - Since there are up to 2 mutexes involved (while_locked() central mutex, `x->m_cool_state_mutex` "little"
316 * mutex), there is some danger of deadlock; but if you are careful then it will be fine:
317 * - Fast-path is in `x->cool_state()`: Only lock `x->m_cool_state_mutex`.
318 * - Slow-path 1 is in the central-state-updating (push) code: `while_locked(F)`; inside `F()` lock
319 * `x->m_cool_state_mutex` for each `x` in state_per_thread().
320 * - Slow-path 2 is in the `*x` ctor: central-state-init (pull) code: `while_locked()` is automatically
321 * in effect inside this_thread_state(); no need to lock `x->m_cool_state_mutex`, since no-place has
322 * access to `*x` until its ctor finishes and hence this_thread_state() returns.
323 * - The bad news: Your impl is no longer lock-free even in the fast-path: `X.this_thread_state()->cool_state()`
324 * does lock and unlock a mutex.
325 * - The good news: This mutex is ~never contended: At most 2 threads can even theoretically vie for it
326 * at a time; and except when *canonical* state must be updated (typically rare), it is only 1 thread.
327 * A regular mutex being locked/unlocked, sans contention, is quite cheap. This should more than defeat
328 * the preceding "bad news" bullet.
329 *
330 * @internal
331 *
332 * @todo Add a Flow unit-test or functional-test for the pattern "thread-caching of central canonical state," as
333 * described in the doc header for Thread_local_state_registry::this_thread_state().
334 *
335 * @endinternal
336 *
337 * @return See above. Never null.
338 */
340
341 /**
342 * Returns pointer to this thread's thread-local object, if it has been created via an earlier this_thread_state()
343 * call; or null if that has not yet occurred.
344 *
345 * @return See above.
346 */
348
349 /**
350 * Returns reference to immutable container holding info for each thread in which this_thread_state() has been
351 * called: the keys are resulting `Thread_local_state*` pointers; the values are potentially interesting thread
352 * info such as thread ID.
353 *
354 * ### What you may do ###
355 * You may access the returned data structure, including the #Thread_local_state pointees, in read-only mode.
356 *
357 * You may write to each individual #Thread_local_state pointee. Moreover you are guaranteed (see
358 * "Thread safety" below) that no while_locked() user is doing the same simultaneously (by while_locked()
359 * contract).
360 *
361 * If you *do* write to a particular pointee, remember these points:
362 * - Probably (unless you intentionally avoid it) you're writing to it *not* from the thread to which it
363 * belongs (in the sense that this_thread_state() would be called to obtain the same pointer).
364 * - Therefore you must synchronize any such concurrent read/write accesses from this thread and the owner
365 * thread (your own code therein presumably). You can use a mutex, or the datum could be `atomic<>`; etc.
366 * - Generally speaking, one uses thread-local stuff to avoid locking, so think hard before you do this.
367 * That said, locking is only expensive assuming lock contention; and if state_per_thread() work
368 * from a not-owner thread is rare, this might not matter perf-wise. It *does* matter complexity-wise
369 * though (typically), so informally we'd recommend avoiding it.
370 * - Things like `atomic<bool>` flags are pretty decent in these situations. E.g., one can put into
371 * #Thread_local_state an `atomic<bool> m_do_flush{false}`; set it to `true` (with most-relaxed atomic mode)
372 * via while_locked() + state_per_thread() block when wanting a thread to perform an (e.g.) "flush" action;
373 * and in the owner-thread do checks like:
374 * `if (this_thread_state()->m_do_flush.compare_exchange_strong(true, false, relaxed)) { flush_stuff(); }`.
375 * It is speedy and easy.
376 * - You could also surround any access, from the proper owner thread, to that `Thread_local_state` pointee
377 * with while_locked(). Again, usually one uses thread-local stuff to avoid such central-locking actions;
378 * but it is conceivable to use it judiciously.
379 *
380 * ### Thread safety ###
381 * Behavior is undefined, if this is called *not* from within while_locked().
382 * Rationale: It might seem like it would have been safe to "just" make a copy of this container (while locking
383 * its contents briefly) and return that. In and of itself that's true, and as long as one never dereferences
384 * any `Thread_local_state` pointees, it is safe. (E.g., one could look at the thread IDs/nicknames in the
385 * thus-stored Metadata objects and log them. Not bad.) However dereferencing such a #Thread_local_state pointee
386 * is not safe outside while_locked(): at any moment its rightful-owning thread might exit and therefore
387 * `delete` it.
388 *
389 * @param safety_lock
390 * Please pass the argument to `task()` given to while_locked().
391 * @return See above.
392 */
393 const State_per_thread_map& state_per_thread(const Lock& safety_lock) const;
394
395 /**
396 * Locks the non-recursive registry mutex, such that no access or modification of the (deep or shallow) contents
397 * of state_per_thread() shall concurrently occur from within `*this`
398 * or other `this->while_locked()` call(s); executes given task; and unlocks said mutex.
399 *
400 * It is informally expected, though not required, that `task()` shall use state_per_thread().
401 * Please see state_per_thread() doc header.
402 *
403 * Behavior is undefined (actually: deadlock) if task() calls `this->while_locked()` (the mutex is non-recursive).
404 *
405 * ### Interaction with #Thread_local_state ctor ###
406 * See this_thread_state() doc header. To briefly restate, though: #Thread_local_state ctor, when invoked by
407 * this_thread_state() on first call in a given thread, is invoked inside a while_locked(). Therefore do not
408 * call while_locked() from such a ctor, as it will deadlock. From a more positive perspective, informally speaking:
409 * you may rely on while_locked() being active at all points inside a #Thread_local_state ctor.
410 *
411 * @tparam Task
412 * Function object matching signature `void F(const Lock&)`.
413 * @param task
414 * This will be invoked as follows: `task(lock)`.
415 */
416 template<typename Task>
417 void while_locked(const Task& task);
418
419 /**
420 * Returns nickname, a brief string suitable for logging. This is included in the output by the `ostream<<`
421 * operator as well. This always returns the same value.
422 *
423 * @return See above.
424 */
425 const std::string& nickname() const;
426
427 /**
428 * Performs `Log_context_mt::set_logger(logger_ptr)`; and -- if #S_TL_STATE_HAS_MT_LOG_CONTEXT is `true` --
429 * propagates it to each extant #Thread_local_state via `state->set_logger(logger_ptr)`.
430 *
431 * @see also #m_create_state_func doc header w/r/t the effect of #S_TL_STATE_HAS_MT_LOG_CONTEXT on that by
432 * default.
433 *
434 * ### Thread safety ###
435 * It is safe to call this concurrently with (any thread-first invocation of) this_thread_state() on `*this`.
436 *
437 * @param logger_ptr
438 * Logger to use for logging subsequently. Reminder: can be null.
439 */
440 void set_logger(log::Logger* logger_ptr);
441
442private:
443 // Types.
444
445 /// Short-hand for mutex type.
446 using Mutex = Lock::mutex_type;
447
448 /**
449 * The entirety of the cross-thread registry state, in a `struct` so as to be able to wrap it in a `shared_ptr`.
450 * See doc header for Registry_ctl::m_state_per_thread for key info.
451 */
453 {
454 // Data.
455
456 /// Protects the Registry_ctl (or `m_state_per_thread`; same difference).
457 mutable Mutex m_mutex;
458
459 /**
460 * Registry containing each #Thread_local_state, one per distinct thread to have created one via
461 * this_thread_state() and not yet exited (rather, not yet executed the on-thread-exit cleanup
462 * of its #Thread_local_state). In addition the mapped values are informational metadata Metadata.
463 *
464 * ### Creation and cleanup of each `Thread_local_state` (using this member) ###
465 * So, in a given thread T:
466 * - The first (user-invoked) this_thread_state() call shall: lock #m_mutex, insert into #m_state_per_thread,
467 * unlock.
468 * - If `*this` is around when T is exiting, the on-thread-exit cleanup function shall:
469 * obtain `shared_ptr<Registry_ctl>` (via `weak_ptr` observer); then lock, delete from #m_state_per_thread,
470 * `delete` the #Thread_local_state itself, unlock.
471 * - If `*this` is not around when T is exiting:
472 * - The cleanup function will not run at all, as the `thread_specific_ptr` controlling that is gone.
473 * - To free resources in timely fashion, the dtor shall (similarly to cleanup function):
474 * lock, delete from #m_state_per_thread (`.clear()` them all), `delete` the #Thread_local_state itself,
475 * unlock.
476 * - If `*this` is around when T is exiting, but `*this` is being destroyed, and `shared_ptr<Registry_ctl>`
477 * has been destroyed already (as seen via `weak_ptr` observer); then the `*this` dtor has run already,
478 * so cleanup function will do (almost) nothing and be right to do so.
479 */
481 }; // struct Registry_ctl
482
483 /**
484 * The actual user #Thread_local_state stored per thread as lazily-created in this_thread_state(); plus
485 * a small bit of internal book-keeping. What book-keeping, you ask? Why not just a #Thread_local_state, you ask?
486 * Answer:
487 *
488 * ### Rationale w/r/t the `weak_ptr` ###
489 * The essential problem is that in cleanup() (which is called by thread X that earlier issued
490 * `Thread_local_state* x` via this_thread_state() if and only if at X exit `*this` still exists, and therefore
491 * so does #m_this_thread_state_or_null) we cannot be sure that `x` isn't being concurrently `delete`d and
492 * removed from #m_ctl by the (unlikely but possibly) concurrently executing `*this` dtor. To do that
493 * we must first lock `m_ctl->m_mutex`. However, `*m_ctl` might concurrently disappear! This is perfect
494 * for `weak_ptr`: we can "just" capture a `weak_ptr` of `shared_ptr` #m_ctl and either grab a co-shared-pointer
495 * of `m_ctl` via `weak_ptr::lock()`; or fail to do so which simply means the dtor will do the cleanup anyway.
496 *
497 * Perfect! Only one small problem: `thread_specific_ptr` does take a cleanup function... but not a cleanup
498 * *function object*. It takes a straight-up func-pointer. Therefore we cannot "just capture" anything. This
499 * might seem like some bizarre anachronism, where boost.thread guys made an old-school API and never updated it.
500 * This is not the case though. A function pointer is a pointer to code -- which will always exist. A functor
501 * stores captures. So now they have to decide where/how to store that. To store it as regular non-thread-local
502 * data would mean needing a mutex, and in any case it breaks their guiding principle of holding only thread-local
503 * data -- either natively via pthreads/whatever of via `thread_local`. Storing it as thread-local means it's
504 * just more thread-local state that either itself has to be cleaned up -- which means user could just place it
505 * inside the stored type in the first place -- or something that will exist/leak beyond the `thread_specific_ptr`
506 * itself assigned to that thread.
507 *
508 * ### Rationale w/r/t the `weak_ptr` being in the `thread_specific_ptr` itself ###
509 * To summarize then: The member #m_ctl_observer is, simply, the (per-thread) `weak_ptr` to the registry's #m_ctl,
510 * so that `cleanup(X)` can obtain Registry_ctl::m_state_per_thread and delete the `Thread_local_state* X` from
511 * that map (see Registry_ctl::m_state_per_thread doc header). Simple, right? Well....
512 *
513 * If cleanup() runs and finishes before dtor starts, then things are simple enough! Grab `m_ctl` from
514 * `m_ctl_observer`. Delete the Tl_context from Registry_ctl::m_state_per_thread. Delete the `Thread_local_state`
515 * and the Tl_context (passed to cleanup() by `thread_specific_ptr`).
516 *
517 * If dtor runs before a given thread exits, then again: simple enough. Dtor can just do (for each thread's stuff)
518 * what cleanup() would have done; hence for the thread in question it would delete the `Thread_local_state` and
519 * `Tl_context` and delete the entry from Registry_ctl::m_state_per_thread. cleanup() will just not run.
520 *
521 * The problems begin in the unlikely but eminently possible, and annoying, scenario wherein they both run at
522 * about the same time, but the dtor gets to the `m_mutex` first and deletes all the `Tl_context`s as well as
523 * clearing the map. cleanup() is already running though... and it needs the `weak_ptr m_ctl_observer` so it
524 * can even try to cooperate with the dtor, via `m_ctl_observer.lock()` to begin-with... except the `Tl_context`
525 * was just deleted: crash/etc.
526 *
527 * It's a chicken/egg problem: *the* chicken/egg problem. The `weak_ptr` cannot itself be part of the watched/deleted
528 * state, as it is used to synchronize access to it between dtor and cleanup(), if they run concurrently.
529 * So what do we do? Well... won't lie to you... we leak the `weak_ptr` and `Tl_context` that stores it
530 * (roughly 24-32ish bytes in x86-64), in the case where dtor runs first, and cleanup() doesn't (meaning, a thread
531 * outlives the `*this`). (If cleanup() runs, meaning the `*this` outlives a thread, such as if `*this` is
532 * being stored `static`ally or globally, then no leak.) It is a tiny leak, per thread (that outlives
533 * a `Thread_local_state_registry` object), per `Thread_local_state_registry` object.
534 *
535 * Any way to avoid it? Probably. Possibly. One approach (which we tried) is to store a
536 * `static thread_local unordered_map<Thread_local_state*, weak_ptr<Registry_ctl>` and save the observer-ptr
537 * in that, while Tl_context is not necessary, and #m_this_thread_state_or_null holds a `Thread_local_state*`
538 * directly. The thread-outlives-`*this` scenario just then means any "leak" is only until the thread
539 * exits (at which point the whole `unordered_map` goes away by C++ rules of `thread_local`s, including any
540 * "leaked" entries in that map). That is better. The problem (which we observed -- painfully) is
541 * it cannot be guaranteed that this new `static thread_local` map de-init occurs *after* every cleanup() runs;
542 * it might happen before: then it's all over; cleanup() cannot trust the map's contents and might even crash.
543 *
544 * Now the problem is the interplay between a `thread_specific_ptr` and built-in `thread_local`; 2 `thread_local`s
545 * and their relative de-init order is already an obscure enough topic; but the `thread_specific_ptr` behavior in this
546 * sense is unspecified (and empirically speaking I (ygoldfel) couldn't see anything obvious; in terms of impl
547 * it might be doing native stuff in Linux as opposed to `thread_local`... but I digress... it is not workable).
548 *
549 * It is possibly (probably?) doable to abandon `thread_specific_ptr` and (essentially) reimplement that part
550 * by using `thread_local` directly. However that thing must be `static`, so now we have to reimplement a
551 * map from `this`es to `Thread_local_state*`... and so on. Having done that -- difficult/tedious enough -- now
552 * we have to wrangle `static thread_local` relative de-init order. Supposedly the order is guaranteed by the
553 * standard but... it's not particularly pleasant a prospect to deal with it. Hence I am not making this a formal
554 * to-do; even though a part of me thinks that's maybe the most solid approach and puts things in our control most
555 * firmly.
556 *
557 * Just, the Tl_context wrapper-with-small-possible-leak-per-thread design is fairly pragmatic without having to
558 * engage in all kinds of masochism. Still it's a bit yucky in an aesthetic sense.
559 */
561 {
562 /// Observer of (existent or non-existent) daddy's #m_ctl. See Tl_context doc header for explanation.
563 boost::weak_ptr<Registry_ctl> m_ctl_observer;
564 /**
565 * The main user state. Never null; but `*m_state` has been freed (`delete`d) if and only if the pointer
566 * `m_state` is no longer in `m_ctl_observer.lock()->m_state_per_thread`, or if `m_ctl_observer.lock() == nullptr`.
567 */
569 };
570
571 /**
572 * Simply wraps a `boost::thread_specific_ptr<Tl_context>`, adding absolutely no data or algorithms, purely to
573 * work-around a combination of (some?) clang versions and (some?) GNU STL impls giving a bogus compile error,
574 * when one tries `optional<Thread_local_state_registry>`. The error is
575 * "the parameter for this explicitly-defaulted copy constructor is const, but a member or base requires it to be
576 * non-const" and references impl details of STL's `optional`. The solution is to wrap the thing in a thing
577 * that is itself already noncopyable in the proper way, unlike `thread_specific_ptr` (which, at least as of
578 * Boost-1.87, still has a copy-forbidding ctor/assigner that takes non-`const` ref).
579 */
581 {
582 // Data.
583
584 /// What we wrap and forward-to-and-fro.
585 boost::thread_specific_ptr<Tl_context> m_tsp;
586
587 // Constructors/destructor.
588
589 /**
590 * Constructs payload.
591 * @param ctor_args
592 * Args to #m_tsp ctor.
593 */
594 template<typename... Ctor_args>
595 Tsp_wrapper(Ctor_args&&... ctor_args);
596
597 // Methods.
598
599 /**
600 * Forbid copy.
601 * @param src
602 * Yeah.
603 */
604 Tsp_wrapper(const Tsp_wrapper& src) = delete;
605 /**
606 * Forbid copy.
607 * @param src
608 * Yeah.
609 * @return Right.
610 */
611 Tsp_wrapper& operator=(const Tsp_wrapper& src) = delete;
612 }; // struct Tsp_wrapper
613
614 // Methods.
615
616 /**
617 * Called by `thread_specific_ptr` for a given thread's `m_this_thread_state_or_null.m_tsp.get()`,
618 * if `*this` dtor has not yet destroyed #m_this_thread_state_or_null. With proper synchronization:
619 * does `delete ctx->m_state` and `delete ctx` and removes the former from Registry_ctl::m_state_per_thread.
620 * It is possible that the `*this` dtor runs concurrently (if a relevant thread is exiting right around
621 * the time the user chooses to invoke dtor) and manages to `delete ctx->m_state` first; however it will *not*
622 * delete the surrounding `ctx`; so that cleanup() can be sure it can access `*ctx` -- but not necessarily
623 * `*ctx->m_state`.
624 *
625 * @param ctx
626 * Value stored in #m_this_thread_state_or_null; where `->m_state` was returned by at least one
627 * this_thread_state() in this thread. Not null.
628 */
629 static void cleanup(Tl_context* ctx);
630
631 // Data.
632
633 /// See nickname().
634 const std::string m_nickname;
635
636 /**
637 * In a given thread T, `m_this_thread_state_or_null.get()` is null if this_thread_state() has not yet been
638 * called by `*this` user; else (until either `*this` dtor runs, or at-thread-exit cleanup function runs)
639 * pointer to T's thread-local Tl_context object which consists mainly of a pointer to T's
640 * thread-local #Thread_local_state object; plus a bit of book-keeping. (See Tl_context for details on the
641 * latter.)
642 *
643 * ### Cleanup: key discussion ###
644 * People tend to declare `thread_specific_ptr x` either `static` or global, because in that case:
645 * - Either `delete x.get()` (default) or `cleanup_func(x.get())` (if one defines custom cleanup func)
646 * runs for each thread...
647 * - ...and *after* that during static/global deinit `x` own dtor runs. (It does do `x.reset()` in *that*
648 * thread but only that thread; so at "best" one thread's cleanup occurs during `thread_specific_ptr` dtor.)
649 *
650 * We however declare it as a non-`static` data member. That's different. When #m_this_thread_state_or_null
651 * is destroyed (during `*this` destruction), if a given thread T (that is not the thread in which dtor is
652 * executing) has called this_thread_state() -- thus has `m_this_thread_state_or_null.m_tsp.get() != nullptr` -- and
653 * is currently running, then its #Thread_local_state shall leak. Cleanup functions run only while the owner
654 * `thread_specific_ptr` exists. Boost.thread docs specifically say this.
655 *
656 * Therefore, in our case, we can make it `static`: but then any cleanup is deferred until thread exit;
657 * and while it is maybe not the end of the world, we strive to be better; a major part of the point of the registry
658 * is to do timely cleanup. So then instead of that we:
659 * - keep a non-thread-local registry Registry_ctl::m_state_per_thread of each thread's thread-local
660 * #Thread_local_state;
661 * - in dtor iterate through that registry and delete 'em.
662 *
663 * Let `p` stand for `m_this_thread_state_or_null.m_tsp.get()->m_state`: if `p != nullptr`, that alone does not
664 * guarantee that `*p` is valid. It is valid if and only if #m_ctl is a live `shared_ptr` (as determined
665 * via `weak_ptr`), and `p` is in Registry_ctl::m_state_per_thread. If #m_ctl is not live
666 * (`weak_ptr::lock()` is null), then `*this` is destroyed or very soon to be destroyed, and its dtor thus
667 * has `delete`d `p`. If #m_ctl is live, but `p` is not in `m_ctl->m_state_per_thread`, then
668 * the same is true: just we happened to have caught the short time period after the dtor deleting all states
669 * and clearing `m_state_per_thread`, but while the surrounding Registry_ctl still exists.
670 *
671 * So is it safe to access `*p`, when we do access it? Answer: We access it in exactly 2 places:
672 * - When doing `delete p` (in dtor, or in on-thread-exit cleanup function for the relevant thread).
673 * This is safe, because `p` is live if and only if it is not in Registry_ctl::m_state_per_thread
674 * (all this being mutex-synchronized).
675 * - By user code, probably following this_thread_state() to obtain `p`. This is safe, because:
676 * It is illegal for them to access `*this`-owned state after destroying `*this`.
677 *
678 * As for the stuff in `m_this_thread_state_or_null.m_tsp.get()` other than `p` -- the Tl_context surrounding
679 * it -- again: see Tl_context doc header.
680 */
682
683 /// The non-thread-local state. See Registry_ctl docs. `shared_ptr` is used only for `weak_ptr`.
684 boost::shared_ptr<Registry_ctl> m_ctl;
685}; // class Thread_local_state_registry
686
687/**
688 * Optional-use companion to Thread_local_state_registry that enables the `Polled_share_state` pattern wherein
689 * from some arbitrary thread user causes the extant thread-locally-activated threads opportunistically collaborate
690 * on/using locked shared state, with the no-op fast-path being gated by a high-performance-low-strictness
691 * atomic-flag being `false`.
692 *
693 * This `Polled_shared_state` pattern (I, ygoldfel, made that up... don't know if it's a known thing) is
694 * maybe best explained by example. Suppose we're using Thread_local_state_registry with
695 * `Thread_local_state` type being `T`. Suppose that sometimes some event occurs, in an arbitrary thread (for
696 * simplicity let's say that is not in any thread activated by the `Thread_local_state_registry<T>`) that
697 * requires each state to execute `thread_locally_launch_rocket()`. Lastly, suppose that upon launching the
698 * *last* rocket required, we must report success via `report_success()` from whichever thread did it.
699 *
700 * However there are 2-ish problems at least:
701 * - We're not in any of those threads; we need to inform them somehow they each need to
702 * `thread_locally_launch_rocket()`. There's no way to signal them to do it immediately necessarily;
703 * but we can do it opportunistically to any thread that has already called `this_thread_state()` (been activated).
704 * - Plus there's the non-trivial accounting regarding "last one to launch does a special finishing step" that
705 * requires keeping track of work-load in some shared state.
706 * - Not to mention the fact that the "let's launch missiles!" event might occur before the planned launches
707 * have had a chance to proceed; since then more threads may have become activated and would need to be
708 * added to the list of "planned launches."
709 * - Typically we don't need to launch any rockets; and the *fast-path* is that we in fact don't.
710 * It is important that each activated thread can ask "do we need to launch-rocket?" and get the probable
711 * answer "no" extremely quickly: without locking any mutex, and even more importantly without any contention if
712 * we do. If the answer is "yes," which is assumed to be rare, *then* even lock-contended-locking is okay.
713 *
714 * To handle these challenges the pattern is as follows.
715 * - The #Shared_state template param here is perhaps `set<T*>`: the set of `T`s (each belonging to an
716 * activated thread that has called `Thread_local_state_registry<T>::this_thread_state()`) that should execute,
717 * and have not yet executed, `thread_locally_launch_rocket()`.
718 * - Wherever the `Thread_local_state_registry<T>` is declared/instantiated -- e.g., `static`ally --
719 * also declare `Polled_shared_state<set<T*>>`, *immediately before* the registry.
720 * - In `T` ctor -- which by definition executes only in an activated thread and only once -- prepare
721 * an opaque atomic-flag-state by executing this_thread_poll_state() and saving the returned `void*`
722 * into a non-`static` data member of `T` (say, `void* const m_missile_launch_needed_poll_state`).
723 * - If the "let's launch missiles" event occurs, in its code do:
724 *
725 * ~~~
726 * registry.while_locked([&](const auto& lock) // Any access across per-thread state is done while_locked().
727 * {
728 * const auto& state_per_thread = registry.state_per_thread(lock);
729 * if (state_per_thread.empty()) { return; } // No missiles to launch for sure; forget it.
730 *
731 * // Load the shared state (while_locked()):
732 * missiles_to_launch_polled_shared_state.while_locked([&](set<T*>* threads_that_shall_launch_missiles)
733 * {
734 * // *threads_that_shall_launch_missiles is protected against concurrent change.
735 * for (const auto& state_and_mdt)
736 * {
737 * T* const active_per_thread_t = state_and_mdt.first;
738 * threads_that_shall_launch_missiles->insert(active_per_thread_t);
739 * }
740 * });
741 *
742 * // *AFTER!!!* loading the shared state, arm the poll-flag:
743 * for (const auto& state_and_mdt)
744 * {
745 * T* const active_per_thread_t = state_and_mdt.first;
746 * missiles_to_launch_polled_shared_state.arm_next_poll(active_per_thread_t->m_missile_launch_needed_poll_state);
747 * // (We arm every per-thread T; but it is possible and fine to do it only for some.)
748 * // Also note it might already be armed; this would keep it armed; no problem. Before the for()
749 * // the set<> might already have entries (launches planned, now we're adding possibly more to it).
750 * }
751 * }
752 * ~~~
753 *
754 * So that's the setup/arming; and now to consume it:
755 * - In each relevant thread, such that `this_thread_state()` has been called in it (and therefore a `T` exists),
756 * whenever the opportunity arises, check the poll-flag, and in the rare case where it is armed,
757 * do `thread_locally_launch_rocket()`:
758 *
759 * ~~~
760 * void opportunistically_launch_when_triggered() // Assumes: bool(registry.this_thread_state_or_null()) == true.
761 * {
762 * T* const this_thread_state = registry.this_thread_state();
763 * if (!missiles_to_launch_polled_shared_state.poll_armed(this_thread_state->m_missile_launch_needed_poll_state))
764 * { // Fast-path! Nothing to do re. missile-launching.
765 * return;
766 * }
767 * // else: Slow-path. Examine the shared-state; do what's needed. Note: poll_armed() would now return false.
768 * missiles_to_launch_polled_shared_state.while_locked([&](set<T*>* threads_that_shall_launch_missiles)
769 * {
770 * if (threads_that_shall_launch_missiles->erase(this_thread_state) == 0)
771 * {
772 * // Already-launched? Bug? It depends on your algorithm. But the least brittle thing to do is likely:
773 * return; // Nothing to do (for us) after all.
774 * }
775 * // else: Okay: we need to launch, and we will, and we've marked our progress about it.
776 * thread_locally_launch_rocket();
777 *
778 * if (threads_that_shall_launch_missiles->empty())
779 * {
780 * report_success(); // We launched the last required missile... report success.
781 * }
782 * }
783 * }
784 * ~~~
785 *
786 * Hopefully that explains it. It is a little rigid and a little flexible; the nature of #Shared_state is
787 * arbitrary, and the above is probably the simplest form of it (but typically we suspect it will usually involve
788 * some container(s) tracking some subset of extant `T*`s).
789 *
790 * Though, perhaps an even simpler scenario might be #Shared_state being an empty `struct Dummy {};`,
791 * so that the atomic-flags being armed are the only info actually being transmitted.
792 * In the above example that would have been enough -- if not for the requirement to `report_success()`,
793 * when the last missile is launched.
794 *
795 * ### Performance ###
796 * The fast-path reasoning is that (1) the arming event occurs rarely and therefore is not part of any fast-path;
797 * and (2) thread-local logic can detect `poll_armed() == false` first-thing and do nothing further.
798 * Internally we facilitate speed further by poll_armed() using an `atomic<bool>` with an optimized memory-ordering
799 * setting that is nevertheless safe (impl details omitted here). Point is, `if (!....poll_armed()) { return }` shall
800 * be a quite speedy check.
801 *
802 * Last but not least: If #Shared_state is empty (formally: `is_empty_v<Shared_state> == true`; informally:
803 * use, e.g., `struct Dummy {};`), then while_locked() will not be generated, and trying to write code that
804 * calls it will cause a compile-time `static_assert()` fail. As noted earlier using Polled_shared_state, despite
805 * the name, for not any shared state but just the thread-local distributed flag arming/polling = a perfectly
806 * valid approach.
807 *
808 * @tparam Shared_state_t
809 * A single object of this type shall be constructed and can be accessed, whether for reading or writing,
810 * using Polled_shared_state::while_locked(). It must be constructible via the ctor signature you choose
811 * to use when constructing `*this` Polled_shared_state ctor (template). The ctor args shall be forwarded
812 * to the `Shared_state_t` ctor. Note that it is not required to actually use a #Shared_state and
813 * Polled_shared_state::while_locked(). In that case please let `Shared_state_t` be an empty `struct` type.
814 */
815template<typename Shared_state_t>
817 private boost::noncopyable
818{
819public:
820 // Types.
821
822 /// Short-hand for template parameter type.
823 using Shared_state = Shared_state_t;
824
825 // Constructors/destructor.
826
827 /**
828 * Forwards to the stored object's #Shared_state ctor. You should also, in thread-local context,
829 * memorize ptr returned by this_thread_poll_state().
830 *
831 * Next: outside thread-local context use while_locked() to check/modify #Shared_state contents safely; then
832 * for each relevant per-thread context `this->arm_next_poll(x)`, where `x` is the saved this_thread_poll_state();
833 * this shall cause `this->poll_armed()` in that thread-local context to return `true` (once, until you
834 * again arm_next_poll() it).
835 *
836 * @tparam Ctor_args
837 * See above.
838 * @param shared_state_ctor_args
839 * See above.
840 */
841 template<typename... Ctor_args>
842 Polled_shared_state(Ctor_args&&... shared_state_ctor_args);
843
844 // Methods.
845
846 /**
847 * Locks the non-recursive shared-state mutex, such that no access or modification of the contents
848 * of the #Shared_state shall concurrently occur; executes given task; and unlocks said mutex.
849 *
850 * Behavior is undefined (actually: deadlock) if task() calls `this->while_locked()` (the mutex is non-recursive).
851 *
852 * @tparam Task
853 * Function object matching signature `void F(Shared_state*)`.
854 * @param task
855 * This will be invoked as follows: `task(shared_state)`. `shared_state` shall point to the object
856 * stored in `*this` and constructed in our ctor.
857 */
858 template<typename Task>
859 void while_locked(const Task& task);
860
861 /**
862 * To be called from a thread-local context in which you'll be checking poll_armed(), returns opaque pointer
863 * to save in your Thread_local_state_registry::Thread_local_state and pass to poll_armed().
864 *
865 * @return See above.
866 */
868
869 /**
870 * To be called from any context (typically not the targeted thread-local context in which you'll be checking
871 * poll_armed, though that works too), this causes the next poll_armed() called in the thread in which
872 * `thread_poll_state` was returned to return `true` (once).
873 *
874 * Tip: Typically one would use arm_next_poll() inside a Thread_local_state_registry::while_locked()
875 * statement, perhaps cycling through all of Thread_local_state_registry::state_per_thread() and
876 * arming the poll-flags of all or some subset of those `Thread_local_state`s.
877 *
878 * @param thread_poll_state
879 * Value from this_thread_poll_state() called from within the thread whose next poll_armed() you are
880 * targeting.
881 */
882 void arm_next_poll(void* thread_poll_state);
883
884 /**
885 * If the given thread's poll-flag is not armed, no-ops and returns `false`; otherwise returns `true` and resets
886 * poll-flag to `false`. Use arm_next_poll(), typically from a different thread, to affect when
887 * this methods does return `true`. Usually that means there has been some meaningful change to
888 * the stored #Shared_state, and therefore you should look there (and/or modify it) while_locked() immediately.
889 *
890 * @param thread_poll_state
891 * See arm_next_poll().
892 * @return See above.
893 */
894 bool poll_armed(void* thread_poll_state);
895
896private:
897 // Data.
898
899 /**
900 * An atomic "do-something" flag per thread; usually/initially `false`; armed to `true` by arm_next_poll()
901 * and disarmed by poll_armed().
902 */
904
905 /// Protects #m_shared_state.
907
908 /// The managed #Shared_state.
910}; // class Polled_shared_state
911
912// Free functions: in *_fwd.hpp.
913
914// Thread_local_state_registry template implementations.
915
916template<typename Thread_local_state_t>
918 (log::Logger* logger_ptr, String_view nickname_str,
919 decltype(m_create_state_func)&& create_state_func) :
920
921 log::Log_context_mt(logger_ptr, Flow_log_component::S_UTIL),
922
923 m_create_state_func(std::move(create_state_func)),
924 m_nickname(nickname_str),
925 m_this_thread_state_or_null(cleanup),
926 m_ctl(boost::make_shared<Registry_ctl>())
927{
928 FLOW_LOG_INFO("Tl_registry[" << *this << "]: "
929 "Registry created (watched type has ID [" << typeid(Thread_local_state).name() << "]).");
930}
931
932template<typename Thread_local_state_t>
935{
936 const auto ctx = m_this_thread_state_or_null.m_tsp.get();
937 return ctx ? ctx->m_state : nullptr;
938}
939
940template<typename Thread_local_state_t>
943{
944 using log::Logger;
945
946 auto ctx = m_this_thread_state_or_null.m_tsp.get();
947 if (!ctx)
948 {
949 // (Slow-path. It is OK to log and do other not-so-fast things.)
950
951 /* We shall be accessing (inserting into) m_state_per_thread which understandably requires while_locked().
952 * So bracket the following with while_locked(). Notice, though, that we do this seemingly earlier that needed:
953 * Inside, we (1) construct the new Thread_local_state; and only then (2) add it into m_state_per_thread.
954 * The mutex-lock is only necessary for (2). So why lock it now? Answer: We promised to do so. Why did we?
955 * Answer: See method doc header for rationale. */
956
957 while_locked([&](auto&&...) // Versus this_thread_state()/cleanup().
958 {
959 // Time to lazy-init. As advertised:
960 decltype(m_create_state_func) create_state_func;
961 if (m_create_state_func.empty())
962 {
963 if constexpr(S_TL_STATE_HAS_MT_LOG_CONTEXT && std::is_constructible_v<Thread_local_state, Logger*>)
964 {
965 create_state_func = [&]() -> auto { return new Thread_local_state{get_logger()}; };
966 }
967 else if constexpr((!S_TL_STATE_HAS_MT_LOG_CONTEXT) && std::is_default_constructible_v<Thread_local_state>)
968 {
969 create_state_func = []() -> auto { return new Thread_local_state; };
970 }
971 else
972 {
973 FLOW_LOG_FATAL("Chose not to supply m_create_state_func at time of needing a new Thread_local_state. "
974 "In this case you must either have <derived from Log_context_mt and supplied ctor "
975 "form T_l_s{lgr} (where lgr is a Logger*)> or <*not* derived from Log_context_mt but "
976 "supplied made T_l_s default-ctible>. Breaks contract; aborting.");
977 assert(false && "Chose not to supply m_create_state_func at time of needing a new Thread_local_state. "
978 "In this case you must either have <derived from Log_context_mt and supplied ctor "
979 "form T_l_s{lgr} (where lgr is a Logger*)> or <*not* derived from Log_context_mt but "
980 "supplied made T_l_s default-ctible>. Breaks contract.");
981 std::abort();
982 }
983 /* Subtlety: The is_*_constructible_v checks may seem like mere niceties -- why not just let it not-compile
984 * if they don't provide the expected ctor form given S_TL_STATE_HAS_MT_LOG_CONTEXT being true or false --
985 * or even actively bad (why not just let it not-compile, so they know the problem at compile-time; or
986 * why not static_assert() it?). Not so: Suppose they *did* provide m_create_state_func, always, but
987 * lack the ctor form needed for the case where they hadn't. Then this code path would still try to get
988 * compiled -- and fail to compile -- even though there's no intention of it even ever executing. That would
989 * be annoying and unjust. The only downside of the solution is the null-m_create_state_func path can only
990 * fail at run-time, not compile-time; on balance that is better than the unjust alternative. */
991 } // if (m_create_state_func.empty())
992 else // if (!m_create_state_func.empty())
993 {
994 create_state_func = m_create_state_func; // We specifically said we'd copy it.
995 }
996
997 ctx = new Tl_context;
998 ctx->m_ctl_observer = m_ctl;
999 ctx->m_state = create_state_func();
1000
1001 m_this_thread_state_or_null.m_tsp.reset(ctx);
1002
1003 /* Now to set up the later cleanup, either at thread-exit, or from our ~dtor(), whichever happens first;
1004 * and also to provide access to us via enumeration via state_per_thread(). */
1005
1006 typename decltype(Registry_ctl::m_state_per_thread)::value_type state_and_mdt{ctx->m_state, Metadata{}};
1007 auto& mdt = state_and_mdt.second;
1008 /* Save thread info (for logging). (Note: Logger::set_thread_info() semantics are a bit surprising, out of
1009 * the log-writing context. It outputs nickname if available; else thread ID if not.) */
1010 log::Logger::set_thread_info(&mdt.m_thread_nickname,
1011 &mdt.m_thread_id);
1012 if (mdt.m_thread_id == Thread_id{})
1013 {
1014 mdt.m_thread_id = this_thread::get_id(); // Hence get it ourselves.
1015 }
1016 // else { nickname is blank. Nothing we can do about that though. }
1017
1018 FLOW_LOG_INFO("Tl_registry[" << *this << "]: Adding thread-local-state @[" << ctx->m_state << "] "
1019 "for thread ID [" << mdt.m_thread_id << "]/nickname [" << mdt.m_thread_nickname << "]; "
1020 "watched type has ID [" << typeid(Thread_local_state).name() << "]).");
1021#ifndef NDEBUG
1022 const auto result =
1023#endif
1024 m_ctl->m_state_per_thread.insert(state_and_mdt);
1025 assert(result.second && "How did `state` ptr value get `new`ed, if another thread has not cleaned up same yet?");
1026 }); // while_locked()
1027 } // if (!ctx)
1028 // else if (ctx) { Fast path: state already init-ed. Do not log or do anything unnecessary. }
1029
1030 return ctx->m_state;
1031} // Thread_local_state_registry::this_thread_state()
1032
1033template<typename Thread_local_state_t>
1035{
1036 using std::vector;
1037
1038 FLOW_LOG_INFO("Tl_registry[" << *this << "]: "
1039 "Registry shutting down (watched type has ID [" << typeid(Thread_local_state).name() << "]). "
1040 "Will now delete thread-local-state for each thread that has not exited before this point.");
1041 vector<Thread_local_state*> states_to_delete;
1042 while_locked([&](auto&&...) // Versus cleanup() (possibly even 2+ of them).
1043 {
1044 for (const auto& state_and_mdt : m_ctl->m_state_per_thread)
1045 {
1046 const auto state = state_and_mdt.first;
1047 const auto& mdt = state_and_mdt.second;
1048 FLOW_LOG_INFO("Tl_registry[" << *this << "]: Deleting thread-local-state @[" << state << "] "
1049 "for thread ID [" << mdt.m_thread_id << "]/nickname [" << mdt.m_thread_nickname << "].");
1050
1051 // Let's not `delete state` while locked, if only to match cleanup() avoiding it.
1052 states_to_delete.push_back(state);
1053 }
1054 m_ctl->m_state_per_thread.clear();
1055 }); // while_locked()
1056
1057 for (const auto state : states_to_delete)
1058 {
1059 delete state;
1060 /* Careful! We delete `state` (the Thread_local_state) but *not* the Tl_context (we didn't even store
1061 * it in the map) that is actually stored in the thread_specific_ptr m_this_thread_state_or_null.m_tsp.
1062 * See Tl_context doc header for explanation. In short by leaving it alive we leave cleanup() able to
1063 * run concurrently with ourselves -- unlikely but possible. */
1064 }
1065
1066 /* Subtlety: When m_this_thread_state_or_null is auto-destroyed shortly, it will auto-execute
1067 * m_this_thread_state_or_null.m_tsp.reset() -- in *this* thread only. If in fact this_thread_state() has been
1068 * called in this thread, then it'll try to do cleanup(m_this_thread_state_or_null.m_tsp.get()); nothing good
1069 * can come of that really. We could try to prevent it by doing m_this_thread_state_or_null.m_tsp.reset()... but
1070 * same result. Instead we do the following which simply replaces the stored (now useless) Tl_context* with null, and
1071 * that's it. We already deleted its ->m_state, so that's perfect. (Again: per Tl_context doc header, it is
1072 * intentional that we don't the release()d `delete m_this_thread_state_or_null.m_tsp.get()` but only "most" of it,
1073 * namely (it)->m_state.) */
1074 m_this_thread_state_or_null.m_tsp.release();
1075
1076 // After the }, m_ctl is nullified, and lastly m_this_thread_state_or_null is destroyed (a no-op in our context).
1077} // Thread_local_state_registry::~Thread_local_state_registry()
1078
1079template<typename Thread_local_state_t>
1081{
1082 /* If the relevant *this has been destroyed, typically we would not be called.
1083 * However it is possible that our thread T is exiting, and just then user in another thread chose to
1084 * invoke *this dtor. Therefore we must carefully use locking and weak_ptr (as you'll see) to contend
1085 * with this possibility; it might be worthwhile to read cleanup() and the dtor in parallel.
1086 *
1087 * By the way: Among other things, the relevant *this's Log_context might be around at one point but not another;
1088 * and by contract same with the underlying Logger. So we cannot use ->get_logger()/log using that necessarily; we
1089 * will just have to be quiet; that's life. */
1090
1091 auto& weak_ptr_to_ctl = ctx->m_ctl_observer;
1092 const auto shared_ptr_to_ctl = weak_ptr_to_ctl.lock();
1093 if (!shared_ptr_to_ctl)
1094 {
1095 /* Relevant Thread_local_state_registry dtor was called late enough to coincide with current thread about to exit
1096 * but not quite late enough for its thread_specific_ptr m_this_thread_state_or_null.m_tsp to be destroyed
1097 * (hence we, cleanup(), were called for this thread -- possibly similarly for other thread(s) too).
1098 * Its shared_ptr m_ctl did already get destroyed though. So -- we need not worry about cleanup after all.
1099 * This is rare and fun, but it is no different from that dtor simply running before this thread exited.
1100 * It will be/is cleaning up our stuff (and everything else) -- except the *ctx wrapper itself. So clean that
1101 * up (not actual ctx->m_state payload!) -- as T_l_s_r dtor specifically never does -- and GTFO. */
1102 delete ctx;
1103 return;
1104 }
1105 // else
1106
1107 /* Either the relevant Thread_local_state_registry dtor has not at all run yet, or perhaps it has started to run --
1108 * but we were able to grab the m_ctl fast enough. So now either they'll grab m_ctl->m_mutex first, or we will. */
1109 bool do_delete;
1110 {
1111 Lock lock{shared_ptr_to_ctl->m_mutex}; // Versus this_thread_state()/dtor/other thread's/threads' cleanup()(s).
1112 do_delete = (shared_ptr_to_ctl->m_state_per_thread.erase(ctx->m_state) == 1);
1113 } // shared_ptr_to_ctl->m_mutex unlocks here.
1114
1115 /* We don't want to `delete ctx->m_state` inside the locked section; it might not be necessarily always criminal --
1116 * but in some exotic but real situations the Thread_local_state dtor might launch a new, presumably detached, thread
1117 * that would itself call this_thread_state() which would deadlock trying to lock the same mutex, if the
1118 * dtor call doesn't finish fast enough. */
1119 if (do_delete)
1120 {
1121 delete ctx->m_state; // We got the mutex first. Their ~Thread_local_state() dtor runs here.
1122 }
1123 /* else { Guess the concurrently-running dtor got there first! It `delete`d ctx->m_state and
1124 * m_state_per_thread.clear()ed instead of us. } */
1125
1126 delete ctx; // Either way we can free the little Tl_context; dtor never does that (known/justified leak by dtor).
1127} // Thread_local_state_registry::cleanup() // Static.
1128
1129template<typename Thread_local_state_t>
1130template<typename Task>
1132{
1133 Lock lock{m_ctl->m_mutex};
1134 task(lock);
1135}
1136
1137template<typename Thread_local_state_t>
1140{
1141 assert(safety_lock.owns_lock() && "Please call with value while_locked() passed to your task().");
1142
1143 return m_ctl->m_state_per_thread;
1144}
1145
1146template<typename Thread_local_state_t>
1148{
1149 return m_nickname;
1150}
1151
1152template<typename Thread_local_state_t>
1154{
1155 using log::Log_context_mt;
1156
1157 Log_context_mt::set_logger(logger_ptr);
1158
1159 if constexpr(S_TL_STATE_HAS_MT_LOG_CONTEXT)
1160 {
1161 while_locked([&](auto&&...)
1162 {
1163 for (const auto& state_and_mdt : m_ctl->m_state_per_thread)
1164 {
1165 const auto state = state_and_mdt.first;
1166
1167 state->set_logger(logger_ptr);
1168 }
1169 });
1170 }
1171} // Thread_local_state_registry::set_logger()
1172
1173template<typename Thread_local_state_t>
1174template<typename... Ctor_args>
1176 m_tsp(std::forward<Ctor_args>(ctor_args)...)
1177{
1178 // Yeah.
1179}
1180
1181template<typename Thread_local_state_t>
1182std::ostream& operator<<(std::ostream& os, const Thread_local_state_registry<Thread_local_state_t>& val)
1183{
1184 return os << '[' << val.nickname() << "]@" << &val;
1185}
1186
1187// Polled_shared_state template implementations.
1188
1189template<typename Shared_state_t>
1190template<typename... Ctor_args>
1191Polled_shared_state<Shared_state_t>::Polled_shared_state(Ctor_args&&... shared_state_ctor_args) :
1192 m_poll_flag_registry(nullptr, "",
1193 []() -> auto
1194 { using Atomic_flag = typename decltype(m_poll_flag_registry)::Thread_local_state;
1195 return new Atomic_flag{false}; }),
1196 m_shared_state(std::forward<Ctor_args>(shared_state_ctor_args)...)
1197{
1198 // Yep.
1199}
1200
1201template<typename Shared_state_t>
1202template<typename Task>
1204{
1205 static_assert(!(std::is_empty_v<Shared_state>),
1206 "There is no need to call while_locked(), when your Shared_state type is empty; "
1207 "doing the latter is useful when only Polled_shared_state thread-local flag arm/poll feature "
1208 "is needed; but then there's no need to lock anything.");
1209
1210 flow::util::Lock_guard<decltype(m_shared_state_mutex)> lock{m_shared_state_mutex};
1211 task(&m_shared_state);
1212}
1213
1214template<typename Shared_state_t>
1216{
1217 return static_cast<void*>(m_poll_flag_registry.this_thread_state());
1218}
1219
1220template<typename Shared_state_t>
1222{
1223 using Atomic_flag = typename decltype(m_poll_flag_registry)::Thread_local_state;
1224
1225 static_cast<Atomic_flag*>(thread_poll_state)->store(true, std::memory_order_acquire);
1226
1227 /* Explanation of memory_order_release here + memory_order_acquire when loading it in
1228 * if_requested_destroy_tcaches_and_possibly_finish_arena_kills():
1229 *
1230 * Our goal is to signal thread_poll_state's thread to -- when it next has a chance (opportunistic piggy-backing) --
1231 * do stuff. Yet since it is opportunistic piggy-backing, the fast-path (where nothing needs to be done)
1232 * needs to be lightning-fast. So we "just" set that bool flag. However
1233 * we also need to tell it some more info and/or even have it update it, namely whatever
1234 * Shared_state the user should have set-up before calling this arm_next_poll(). So we do it in the order:
1235 * -1- user updates Shared_state while_locked()
1236 * -2- set flag = true
1237 * and in the proper thread later
1238 * -A- check flag (poll_armed()), and if that returns true
1239 * -B- user reads/updates Shared_state while_locked()
1240 * The danger is that, to the other thread, our steps -1-2- will be reordered to -2-1-, and it will
1241 * see flag=true with Shared_state not-ready/empty/whatever (disaster). However by using RELEASE for -2- and ACQUIRE
1242 * for -A-, together with the presence of mutex-locking around -1- and -B- (while_locked(); plus the if-relationship
1243 * between -A-B-), we guarantee the ordering -1-2-A-B- as required.
1244 *
1245 * Regarding perf: probably even the strict memory_order_seq_cst in both here and poll_armed() would've been
1246 * reasonably quick; but we did better than that, approaching the minimally-strict memory_order_relaxed (but
1247 * not quite). */
1248} // Polled_shared_state::arm_next_poll()
1249
1250template<typename Shared_state_t>
1252{
1253 using Atomic_flag = typename decltype(m_poll_flag_registry)::Thread_local_state; // some_namespace::atomic<bool>.
1254
1255 /* Replace true (armed) with false (no longer armed); return true (was armed)...
1256 * ...but if was already false (not armed), do nothing ("replace" it with false); return false (was not armed).
1257 * memory_order_release: See explanation in arm_next_poll(). */
1258 return static_cast<Atomic_flag*>(thread_poll_state)->exchange(false, std::memory_order_release);
1259
1260 /* (I (ygoldfel) initially wrote it as:
1261 * bool exp = true; return ...->compare_exchange_strong(exp, false, ...release);
1262 * because it "felt" somehow more robust to "formally" do-nothing, if it is already `false`... but it is clearly
1263 * slower/weirder to most eyes.) */
1264}
1265
1266} // namespace flow::util
Identical to Log_context but is safe w/r/t to set_logger(), assignment, and swap() done concurrently ...
Definition: log.hpp:1774
Interface that the user should implement, passing the implementing Logger into logging classes (Flow'...
Definition: log.hpp:1286
static void set_thread_info(std::string *call_thread_nickname, util::Thread_id *call_thread_id)
Same as set_thread_info_in_msg_metadata() but targets the given two variables as opposed to a Msg_met...
Definition: log.cpp:125
Optional-use companion to Thread_local_state_registry that enables the Polled_share_state pattern whe...
Definition: thread_lcl.hpp:818
void arm_next_poll(void *thread_poll_state)
To be called from any context (typically not the targeted thread-local context in which you'll be che...
Thread_local_state_registry< std::atomic< bool > > m_poll_flag_registry
An atomic "do-something" flag per thread; usually/initially false; armed to true by arm_next_poll() a...
Definition: thread_lcl.hpp:903
bool poll_armed(void *thread_poll_state)
If the given thread's poll-flag is not armed, no-ops and returns false; otherwise returns true and re...
Shared_state_t Shared_state
Short-hand for template parameter type.
Definition: thread_lcl.hpp:823
void * this_thread_poll_state()
To be called from a thread-local context in which you'll be checking poll_armed(),...
void while_locked(const Task &task)
Locks the non-recursive shared-state mutex, such that no access or modification of the contents of th...
Shared_state m_shared_state
The managed Shared_state.
Definition: thread_lcl.hpp:909
Mutex_non_recursive m_shared_state_mutex
Protects m_shared_state.
Definition: thread_lcl.hpp:906
Polled_shared_state(Ctor_args &&... shared_state_ctor_args)
Forwards to the stored object's Shared_state ctor.
Similar to boost::thread_specific_ptr<T> but with built-in lazy-init semantics; and more importantly ...
Definition: thread_lcl.hpp:145
Thread_local_state * this_thread_state()
Returns pointer to this thread's thread-local object, first constructing it via m_create_state_func i...
Definition: thread_lcl.hpp:942
Thread_local_state_registry(log::Logger *logger_ptr, String_view nickname_str, decltype(m_create_state_func)&&create_state_func={})
Create empty registry.
Definition: thread_lcl.hpp:918
~Thread_local_state_registry()
Deletes each Thread_local_state to have been created so far by calls to this_thread_state() from vari...
std::ostream & operator<<(std::ostream &os, const Thread_local_state_registry< Thread_local_state_t > &val)
Serializes a Thread_local_state_registry to a standard output stream.
Function< Thread_local_state *()> m_create_state_func
this_thread_state(), when needing to create a thread's local new Thread_local_state to return,...
Definition: thread_lcl.hpp:203
void set_logger(log::Logger *logger_ptr)
Performs Log_context_mt::set_logger(logger_ptr); and – if S_TL_STATE_HAS_MT_LOG_CONTEXT is true – pro...
Lock::mutex_type Mutex
Short-hand for mutex type.
Definition: thread_lcl.hpp:446
void while_locked(const Task &task)
Locks the non-recursive registry mutex, such that no access or modification of the (deep or shallow) ...
boost::unordered_map< Thread_local_state *, Metadata > State_per_thread_map
Return type of state_per_thread().
Definition: thread_lcl.hpp:164
const std::string & nickname() const
Returns nickname, a brief string suitable for logging.
const State_per_thread_map & state_per_thread(const Lock &safety_lock) const
Returns reference to immutable container holding info for each thread in which this_thread_state() ha...
Thread_local_state * this_thread_state_or_null()
Returns pointer to this thread's thread-local object, if it has been created via an earlier this_thre...
Definition: thread_lcl.hpp:934
Tsp_wrapper m_this_thread_state_or_null
In a given thread T, m_this_thread_state_or_null.get() is null if this_thread_state() has not yet bee...
Definition: thread_lcl.hpp:681
static void cleanup(Tl_context *ctx)
Called by thread_specific_ptr for a given thread's m_this_thread_state_or_null.m_tsp....
boost::shared_ptr< Registry_ctl > m_ctl
The non-thread-local state. See Registry_ctl docs. shared_ptr is used only for weak_ptr.
Definition: thread_lcl.hpp:684
static constexpr bool S_TL_STATE_HAS_MT_LOG_CONTEXT
true if and only if Thread_local_state is a public sub-class of log::Log_context_mt which has implica...
Definition: thread_lcl.hpp:175
Thread_local_state_t Thread_local_state
Short-hand for template parameter type. See our class doc header for requirements.
Definition: thread_lcl.hpp:150
Lock_guard< Mutex_non_recursive > Lock
Short-hand for mutex lock; made public for use in while_locked() and state_per_thread().
Definition: thread_lcl.hpp:167
const std::string m_nickname
See nickname().
Definition: thread_lcl.hpp:634
#define FLOW_LOG_INFO(ARG_stream_fragment)
Logs an INFO message into flow::log::Logger *get_logger() with flow::log::Component get_log_component...
Definition: log.hpp:197
#define FLOW_LOG_FATAL(ARG_stream_fragment)
Logs a FATAL message into flow::log::Logger *get_logger() with flow::log::Component get_log_component...
Definition: log.hpp:167
Function< void()> Task
Short-hand for a task that can be posted for execution by a Concurrent_task_loop or flow::util::Task_...
Definition: async_fwd.hpp:96
Flow module containing miscellaneous general-use facilities that don't fit into any other Flow module...
Definition: basic_blob.hpp:31
Thread::id Thread_id
Short-hand for an OS-provided ID of a util::Thread.
Definition: util_fwd.hpp:102
boost::unique_lock< Mutex > Lock_guard
Short-hand for advanced-capability RAII lock guard for any mutex, ensuring exclusive ownership of tha...
Definition: util_fwd.hpp:277
boost::mutex Mutex_non_recursive
Short-hand for non-reentrant, exclusive mutex. ("Reentrant" = one can lock an already-locked-in-that-...
Definition: util_fwd.hpp:227
Basic_string_view< char > String_view
Commonly used char-based Basic_string_view. See its doc header.
Flow_log_component
The flow::log::Component payload enumeration comprising various log components used by Flow's own int...
Definition: common.hpp:627
General info (as of this writing for logging only) about a given entry (thread/object) in state_per_t...
Definition: thread_lcl.hpp:154
std::string m_thread_nickname
Thread nickname as per log::Logger::set_thread_info(). (Reminder: Might equal m_thread_id....
Definition: thread_lcl.hpp:158
The entirety of the cross-thread registry state, in a struct so as to be able to wrap it in a shared_...
Definition: thread_lcl.hpp:453
Mutex m_mutex
Protects the Registry_ctl (or m_state_per_thread; same difference).
Definition: thread_lcl.hpp:457
State_per_thread_map m_state_per_thread
Registry containing each Thread_local_state, one per distinct thread to have created one via this_thr...
Definition: thread_lcl.hpp:480
The actual user Thread_local_state stored per thread as lazily-created in this_thread_state(); plus a...
Definition: thread_lcl.hpp:561
boost::weak_ptr< Registry_ctl > m_ctl_observer
Observer of (existent or non-existent) daddy's m_ctl. See Tl_context doc header for explanation.
Definition: thread_lcl.hpp:563
Thread_local_state * m_state
The main user state.
Definition: thread_lcl.hpp:568
Simply wraps a boost::thread_specific_ptr<Tl_context>, adding absolutely no data or algorithms,...
Definition: thread_lcl.hpp:581
Tsp_wrapper(const Tsp_wrapper &src)=delete
Forbid copy.
boost::thread_specific_ptr< Tl_context > m_tsp
What we wrap and forward-to-and-fro.
Definition: thread_lcl.hpp:585
Tsp_wrapper & operator=(const Tsp_wrapper &src)=delete
Forbid copy.
Tsp_wrapper(Ctor_args &&... ctor_args)
Constructs payload.