Flow-IPC 1.0.2
Flow-IPC project: Public API.
|
Here we discuss the direct use of SHM arenas for allocation of C++ data structures; and how to share such structures between processes via ipc::transport. (Or go back to the prerequisite preceding page: Structured Message Transport: Messages As Long-lived Data Structures / Advanced Topics.)
If you've been reading this Manual sequentially to this point, you've seen shared memory (SHM) mentioned many times already. Nevertheless our philosophy on how much the user – you – should have to think about SHM is as follows.
We don't want you to think about it any more than absolutely necessary. It is a means to an end: generally speaking avoiding copying data shared among processes, for performance and algorithmic ease. As such much of the work with Flow-IPC should proceed without code mentioning SHM at all. Case in point: struc::Channel
allows one to transmit messages with zero-copy performance, which means SHM is internally used, but you, the user, get that benefit automatically: All you have to do, essentially, is say in code, "I want zero-copy performance." To do that, ultimately, all you do is (on each side): when setting up your IPC session(s), specify a certain Session
or Session_server
base type which will enable SHM-backed performance with no further references to that fact in the rest of your code. To recap the recipes given in Sessions: Setting Up an IPC Context and Structured Message Transport –
Session-server:
Then it just works. Now, naturally, when working with structured data in, e.g., our example Cool_structured_channel
, we may want to think beyond mere messaging and treat messages as shared data structures, in which case you will need to think about shared memory somewhat – but in a limited way. We went over that in Structured Message Transport: Messages As Long-lived Data Structures / Advanced Topics. Even there, though, SHM-related thinking is limited to algorithmic decisions: when is a given Msg_out
in existence; when does its lifetime end; which side should write; how to prevent concurrent non-read-only access (synchronization). You still don't need to think about SHM pools or SHM arenas. Notably, annoying and difficult things involving SHM pool naming and cleanup, including in case of crash, are performed invisibly and without your participation.
As pointed out in Structured Message Transport: Messages As Long-lived Data Structures / Advanced Topics, however, working on shared data structures living directly and exclusively inside capnp-generated trees may be insufficient for some use cases. To recap possible ways in which it may be insufficient:
struct
s, union
s, and dynamic-length lists... But: high-performance structures such as sorted trees and hash tables would require at least an entire layer of code to conveniently treat a capnp-struct
substrate in that fashion. Plus the way capnp allocates space inside its segments, when a data structure is modified is not space-efficient or necessarily fast: It's not even trying to replicate a high-performance heap allocation algorithm. (The simplest example is what happens when a List
size increases: capnp does not as of this writing allow the old-sized list's space to be reused subsequently at all; yet it continues taking RAM.)Msg_in
) is a read-only view of the original out-message (even if, as we recommend, you use SHM-backing, and therefore it's really "viewing" the original structure directly where it was originally written). (As noted in Structured Message Transport: Messages As Long-lived Data Structures / Advanced Topics this could be changed in the foreseeable future; but not as of this writing.) What if you'd like the recipient to write to the data structure?All that said, perhaps the best way to look at it is not: what can't you do with struc::Channel
message directly? Rather look at it affirmatively: When two threads in a monolithic process want to participate in an algorithm on data structure X, how is that written? Well, one thread or the other creates on the heap and writes it initially; then a pointer is passed to the other thread; and then both of them simply access X collaboratively: reading, writing, synchronizing via mutex if necessary; and so on. So struc::Msg_out
and struc::Msg_in
are all well and good, but what if you simply want two threads in two different processes to collaborate in working on a given C++ data structure?
In that case you will want to use SHM directly.
Flow-IPC provides this ability and goes significantly further than known other SHM-access libraries, including the delightful boost.interprocess, in hiding pain from you. You won't have to worry about creating SHM pools, cleanup, naming. Perhaps most notably you won't have to write your own STL allocators or else forego the use of STL-compliant data structures due to their default code making them unsuitable for use with SHM. Flow-IPC provides a workflow to enable STL-compliant containers to be used directly in SHM. Manual/intrusive data structures, such as home-grown linked lists, are also supported. (boost.interprocess tries to do all this too, but using it involves painful stateful allocators and various other limitations. Certainly it does not leverage a commercial-grade memory manager like jemalloc; but we do. This is not actually a criticism of boost.interprocess; what it provides is very useful, and we leverage it ourselves. Just its ambition is lesser than ours: we build on it. Even excluding SHM-jemalloc and staying with SHM-classic we provide a powerful layer of usability on top of boost.interprocess.)
As you'll soon see, to allocate something in SHM and then share it with another process (where it can further access that "something," including modifying it if desired), one needs a SHM arena in which to allocate in the first place. Where do you get this arena, and how long will it keep existing? Generally there are 2 mutually exclusive answers to this question:
session.session_shm()
returns a pointer to a SHM-arena object. You can allocate via that, and off you go.In this page we limit discussion to way 1 (with ipc::session). For way 2: documentation is available in the Reference – see various items, starting at sub-namespaces, under ipc::shm. For example see ipc::shm::classic::Pool_arena docs.
There are two as of this writing:
The majority of the API is identical between them; much code could be written generically, with one being substituted for the other at compile-time at will. (In the code snippet above you can see where one would make the change, on either side.) SHM-classic does have a couple of capabilities that SHM-jemalloc lacks. To recap these here for convenience:
Conversely SHM-jemalloc, while having a somewhat reduced API set (enumerated above), has some major advantages too. See a safety-oriented discussion and nearby recap of non-safety considerations. A lengthier description can be found in the Reference.
In further discussion we will be agnostic as to the chosen SHM-provider wherever possible, with provider-specific notes where there's a difference in behavior or capability.
The first step is to select SHM-backed sessions at compile-time; everything flows from that decision. See above for recipe.
SHM-backed capabilities become available, like everything else in the ipc::session paradigm, once a session has been established in PEER state (Sessions: Setting Up an IPC Context). Internally during the session-opening procedure the following occurs. (It occurs invisibly, and without ipc::session you'd have to create relevant arena(s) yourself which is very different between the 2 SHM-providers and much more difficult with SHM-jemalloc, though even with SHM-classic very annoying naming decisions would need to be made; not to mention subsequent cleanup headaches.)
.session_shm()
) is created. This arena's lifetime (and therefore the max lifetime of any objects allocated by either side within it) is until the end of the session..app_shm()
), for that specific distinct Client_app
, is created. This arena's lifetime is until the Session_server
is destroyed. That is: its lifetime spans all future sessions, not just the one that triggered its creation (in on-demand fashion)..app_shm()
accessors..app_shm()
accesors do not exist on the session-client end.pool_size_limit_mi()
. Rest assured that all these gigabytes are not taken-away from general OS use from the get-go: Rather a given page (default size 4Ki) is taken-away from general use, only once it is "touched" by an allocation or write.Unfortunately,however,at | least in Linux there are some kernel parameters governing how much virtual SHM space can be reserved in such a way, even though ~no physical RAM is taken by just creating a pool sized in the gigabytes. In particular if system-wide active pools exceed a certain parameter Session_server::async_accept() can yield "No space left on device" (ENOSPC ) error. In this case one can: (1) tweak the kernel parameter(s) as admin; (2) reduce a given Session_server s' pool-size via the aforementioned mutator; or (3) use SHM-jemalloc provider which adjusts dynamically and creates/destroys smaller pools internally as needed. |
Before something can be put into SHM and subsequently shared, one must allocate in a SHM arena. This idea is no different from how the regular heap is used, obviously. However instead of saying "allocate in the heap," one says "allocate in specific arena X."
First step is specify the arena. To do so use a .session_shm()
or .app_shm()
accessor; in all cases it returns a pointer to an arena object.
Second step is to construct something within it. While certain lower-level capabilities exist, we omit them here and with rare advanced exceptions recommend against their use. As far as we are concerned, to allocate, you must use: x = arena->construct<T>(...)
, where ...
are constructor args to T
(possibly none); it returns Arena::Handle<T>
which is, simply, shared_ptr<T>
. (Whether it's boost::shared_ptr
or std::shared_ptr
is formally unspecified; but they have equivalent semantics.)
To distinguish between it and other things, the returned thing from .construct<T>()
– x
above – is called a first-class SHM handle (or just SHM handle or handle). It is a shared_ptr<T>
, but this particular shared-pointer has an important property that is not otherwise obvious given its apparent type. Namely it is fitted (invisibly) with a custom deleter supplied by Flow-IPC. Equally importantly, the handle is the entity that can be transmitted (lent) to another process. That is the main reason it is a first class item. Certainly subordinate allocations occur under that handle in the future – for example a vector
allocating its buffer – but these are not themselves transmitted (lent) between processes. Only the first-class handle is.
.construct<>()
, is such that the underlying RAM is deallocated automatically. (Goes without saying possibly but do not try to manually delete x.get()
or save an x.get()
beyond x
lifetime, or anything like that. Once a shared_ptr
, always a shared_ptr
.) Deallocation means return at least for further allocations in the same arena. (Internally it might also lead to return of RAM for general OS use in some situations. That's a hairy optimization detail that depends on the SHM-provider. Generally SHM-jemalloc is into such things, while SHM-classic less so.)x
via IPC, then it acts like any other shared_ptr
: ref-count-zero is reached; so the resource is returned (albeit to SHM-arena as opposed to the general heap). If one transmits it via IPC N (N >= 1) times, however, then: The underlying data structure is deallocated once all first-class handles have reached ref-count-zero in their respective processes. That includes:shared_ptr
group of the original x = ....construct<T>(...)
call.x_borrowed = ....borrow<T>(...)
upon receipt of x
by a process over IPC: The shared_ptr
group of that x_borrowed
.shared_ptr
, plus each similarly-functioning shared_ptr
obtained by a receiving process upon IPC transmission of the original, together form a multi-process meta-shared-pointer group; and the underlying memory is deallocated no earlier than that entire meta-shared-pointer group's ref-count reaches zero. (To-do: a diagram would help here.)A super-important topic is what T
can be. Without any complications, it can at least be a pointer-free plain-old data-type (POD) of arbitrary (but known at compile-time) depth/complexity:
struct
, union
, class
aggregating any of the above (including themselves – meaning multi-level struct
s et al).array<..., N>
is allowed in a POD: it is a class
or struct
containing a native fixed-size array of ...
s. N
is a constant known at compile time. Informally we recommend the use of array<T, N> x
over T x[N]
: there is literally no downside (including perf).T
).That's pretty good. Yet it is not good enough for a huge-range of algorithms. Ultimately that's because a POD can't have pointers (or another way of saying it: dynamically-sized arrays). Support for pointers/dynamically-sized arrays is required, among other things, to be able to store STL-compliant containers. For that matter intrusive/manual data structures (such as explicit linked lists or trees) also require pointers. Fortunately:
Flow-IPC provides extensive support for SHM-stored pointers and, by extension, STL-compliant containers (and intrusive/manual data structures).
At its formal core, in order for such a T
(one involving pointers) to be .construct<T>()
ible, a pointer field m_x
inside (directly or indirectly via more such pointers) T
must be as follows. Suppose the pointee's type is P
.
P*
(raw pointer) but rather: Arena::Pointer<P>
, where Arena::construct<T>()
was the method used to construct.Pointer<P>
is called a fancy pointer (yes, actual technical name in C++-world; no, it is not the same as smart pointer) to P
. (For the curious: With SHM-classic it is boost::interprocess::offset_ptr<P>
. With SHM-jemalloc it is a custom type written by us, internally storing a SHM pool ID and offset within the IDed pool.)m_p
(of type Pointer<P>
) must have been obtained as follows: m_p = Pointer<P>(static_cast<P*>(arena.allocate(sizeof(P)))
.T::~T()
is invoked, it must ensure that the following occurs: arena.deallocate(static_cast<void*>(m_p.get()))
.Indeed, if your plan is to store actual pointer-like fields in the data structure rooted in T
(context: .construct<T>()
) – meaning T
is an intrusive/manual data structure – then you must ensure all that somewhat-hairy stuff. Whereas when working with the regular heap you'd just:
P* m_p
. No fancy-pointer needed.m_p = new P
to allocate.T::~T()
is invoked: delete m_p
.So that would definitely be taxing to code. And indeed, particularly with legacy code, such measures may be necessary. We recommend against it whenever possible. Instead use STL-compliant SHM-friendly containers. These include: boost::container::*
(basic_string
, list
, vector
, map
, deque
, etc.), boost::unordered_*
(map
, set
, etc.), and flow::util::Basic_blob
(and Blob
et al).
std::
containers are not SHM-friendly, at least in gcc as of gcc-9; they assume raw-pointer-using allocators. By contrast boost::container::*
corrected those issues. std::vector
happens to be okay in gcc-8 at least, but it's safer to just go with boost::container
. E.g., its std::list
bro is broken in this regard.Once you've chosen your STL-compliant SHM-friendly container type – or developed your own! – you must take care to, also, specify (as the Allocator
template paramater to the container type) the SHM-allocating allocator we've provided. (boost.interprocess provides an allocator template for similar use; but it is stateful, which is a huge pain in the butt – plus it uses additional RAM to store the allocator pointer.) Use this allocator:
For example here's an aggregate T
consisting of a few things, including scalars and containers, being constructed:
As you can see the power is there; one just needs to remember to keep supplying the proper allocator and SHM-friendly container templates at all levels. The code is not exactly the same, but with some convenience aliases it's quite similar.
T
-typed object via IPC, then of course you must construct<T>()
it as shown above.T
can also be used for other purposes and yet still be held partially in SHM. For example you might build up a T = Widget::String
from the example above and then move()
or copy it onto x->m_str
, where x = arena.construct<Widget>()
, such that you intend to actually IPC-transmit first-class SHM-handle x
. In that case you can still construct<T>()
the intermediate Widget::String
– no problem – and use it in whatever way you need, even if you aren't going to transmit that guy but just use it locally and then let it be deallocated.T
on the stack (among other things; could be heap too). E.g.: x = session.session_shm()->construct<Widget>(); Widget::String temp_str; ...mutate temp_str...; x->m_str = std::move(temp_str);
. Note temp_str
did not need to be itself construct<>()
ed. The String
itself (the outer fixed-size thing of size sizeof(String)
) is on the stack; but whatever it needed to actually allocate on its own behalf it would have used SHM for (due to the allocator properly being configured as part of the String
alias).T
So you've constructed it. How do you fill it out further? Well, if T
is a POD as defined above, then you just do it; no different from the usual. E.g. here we modify the POD-ish parts of a Widget
:
If, however, what you're doing will require some part of T
to allocate on its behalf, then it'll need to use its allocator. But wait... why won't that just work? After all we've specified the Allocator
template param so nicely in our Widget
declaration! Answer: We've declared the allocator type to use, yes, and that is critical. However, the container code needs to know which arena to actually allocate-in. arena->construct<T>()
is Flow-IPC code, and it knows it's being invoked on *arena
, so it does (internally) what's needed. However consider this:
basic_string::resize()
, internally, will invoke its "stored" allocator of type Shm_allocator
to allocate at least a 128-buffer. But Shm_allocator = Stateless_allocator<...>
is a stateless allocator. That means the allocator object "stored" inside the container outer structure has size 0: it takes no space and has no state. So the container code can't "know" from what Arena
to allocate space!
To make allocating mutators (in this case .resize()
) work you must activate the arena. This is done in thread-local fashion by using a RAII-style object, an ipc::shm::stl::Arena_activator:
Note well: Arena_activator
s stack as shown. So you can work with multiple arenas in close proximity. The key point is it affects only the current thread.
Allocator
template parameter equal to Stateless_allocator<something>
, and that operation might want to allocate or deallocate (via its Allocator
). In typical use of STL-compliant containers we don't usually have to think about such technicalities; most people don't really know or care about STL-allocators at all. Hence knowing when specifically you must do it is arguably a somewhat tall order. Possibly.const
in nature – you're not modifying your T
at all – then it is never necessary. (In point of fact, with SHM-jemalloc not only is it not necessary, it is also impossible on the receiving side of an IPC-transmission of a SHM-handle. There is simply no arena to activate in that situation. And as of this writing, writing on the receiving side is disallowed at the kernel level anyway. With SHM-classic, though, both allocation and writing is possible. Nevertheless even then: if your receiving-side algorithm is read-only w/r/t a SHM-stored T
, then you need not – and should not – activate any arena.)const
in nature – you are making some modifications to T
within it:const
block..construct<T>()
ing the outer object (first-class SHM-handle). .construct<T>()
will take care of it.We've made references to transmission/lending a few times already. E.g., we've mentioned a construct()
ed T
gets deallocated once all handles – the original and any borrowed ones – have reached ref-count zero. Time to fill that big gap. How would those other handles spring into existence, presumably in other processes?
It's all well and good to construct and modify a thing in SHM, but it's not useful until another process receives it via IPC and begins their own work, whether read-only or read-write. Until then you might as well have just constructed in the regular heap in the first place.
These are the steps in sharing a first-class SHM-handle x = some_arena.construct<T>(...)
. Note that only a first-class handle can be shared: you can't share "just" x->m_str
or something. And recall that x
is shared_ptr<T>
(which is also aliased from Arena::Handle<T>
for stylistic purposes). The original creator of the handle – the guy to call construct()
– is called the handle's owner or owner process. The owner can transmit a handle to another process, called the borrower (process). When doing so the owner is called the lender (process).
x
): It prepares x
for transmission by calling: const auto lend_blob = session.lend_object(x)
. lend_blob
is a small encoding of certain information (a few bytes). It needs to be copied into/out of a transport: which is far superior to doing that to the entirety of the actual object!lend_blob
to the former from the latter, via any IPC technique whatsoever.x_borrowed
, like x
): It obtains x_borrowed
by calling: auto x_borrowed = session.borrow_object<T>(lend_blob)
.Et voilà! You've got yourself a guy just like x
but in another process. x_borrowed
is, also, a shared_ptr<T>
. One thing to note here is that the arena is not a part of this procedure. The session is; and the session in and of itself determines who's the recipient. Another thing to note is that the borrower code of yours must know the type T
; if it does not match then behavior is undefined.
.lend_object()
and .borrow_object()
method on SHM-arena and SHM-session objects provided by ipc::shm. However this opens up various subtleties beyond our scope here. The Reference (in particular at least here, here, and here) will help shed light. This technique may be useful in particular if working with arenas outside the ipc::session paradigm.Note that we wrote the above in terms of lender and borrower; not specifically owner and borrower. That is because a borrower can act as a lender and transmit the handle to yet another process. This is called proxying. However, in its current version, SHM-jemalloc does not support proxying (a future version likely will, as the feature was designed from the start). SHM-classic does fully support proxying. That said, within an ipc::session
-based IPC universe proxying is somewhat unlikely. We'll cut that discussion off here, as it gets into hairy super-advanced topics.
T
and the borrower-T
(template param to borrow_object()
and therefore to shared_ptr
for x_borrowed
) must "match." What does that mean? Does it mean they must be the same type/bit-compatible?T
is a POD (as defined earlier in this Manual page), then yes: T
should just be the same type, and that's that. Now let's say T
uses STL-compliant members (at any depth) and/or pointers (ditto). Then: The short answer depends on whether your code is generically meant to work with SHM-classic or SHM-jemalloc interchangeably – or only one of them specifically.T
must (as we've explained) use Session::Allocator
-equipped types and/or Session::Allocator::Pointer
-typed pointers. By contrast borrower-side T
must be an identical type except the allocator type for both of those things within T
shall be not Session::Allocator
but rather: Session::Borrower_allocator
.T
can be identical on both sides. In any case ipc::session::shm::classic::Session_mv::Borrower_allocator just aliases to "...::Allocator"
.Session::Borrower_allocator
.T
on all sides. If it uses STL-compliant/pointer stuff, then the most generic way it to use Session::Allocator
in owner code; Session::Borrower_allocator
in borrower code. And if targeting SHM-classic specifically, it is okay – for conciseness though not genericness – to just use the same type T
on both sides, period.To close the loop: how to, in fact, IPC-transmit lend_blob
– the thing returned by lend_object()
and fed to borrow_object()
? Well, it's just a little blob, so you can do it however you want. However, if you're using struc::Channel
(Structured Message Transport) – or for some odd reason capnp but without struc::Channel
– then we've made a couple of utilities to reduce your boiler-plate. Here's how to use it:
Example schema:
Example owner/lender code:
And counterpart borrower code:
Simple! That said we opportunistically note: The borrower-side is using Widget_brw
, a type we have not explicitly provided the code for in the actual example. Per the side-bar above, with SHM-classic it could just be Widget
(same as in owner); but with SHM-jemalloc and generically you'd need to define a mirror of Widget
called Widget_brw
which would use Session::Borrower_allocator
instead of Session::Allocator
all-over. We'll leave that as an exercise to the reader. Tip: You do not need to copy paste the same type twice. Use template trickery – perhaps std::conditional_t
– to conveniently pick between the two *llocator
templates depending on a compile-time bool S_OWN_ELSE_BRW
template parameter perhaps.
The answer to this is sprinkled throughout the above. Nevertheless it seemed prudent to put a fine point on the answer as well as perhaps provide some algorithmic tips.
Firstly, in every case, you can access it in read-only fashion. The syntax is just C++ syntax.
Secondly, with SHM-classic specifically, you can do anything else including modifying the data structure which in turn includes operations that would further allocate in SHM.
boost::interprocess::interprocess_mutex
(and a recursive variant) and boost::interprocess::interprocess_condition
(et al). Informally these appear to use the same stuff as boost.thread mutex
, condition_variable
(namely in POSIX they use pthread primitives), so you could probably just use those.mutex
(et al) writes to it in its memory location... so while your algorithm may conceptually avoid writing from 1 of the 2 sides, it is actually still writing which means safety worries still apply: Could crash during a lock... and other stuff. So that leaves basically (1). The answer is: possibly. By default, all else being equal, it is best avoided for safety reasons (detailed in Safety and Permissions). That said tons of applications and use cases need not be so paranoid, and they very well might enjoy using an algorithm, wherein 2 threads separated by a process boundary collaborate in read/write fashion on a common data structure.So that leaves 2 remaining situations:
Either way: Your code then shall not write to a borrowed in-SHM data structure: it must read only. If the owner does want to write post-transmission, it must ensure such access is synchronized with concurrent reads on the borrower side. You may not use a mutex and/or condition variable in-SHM to arrange such synchronization. (A mutex-lock operation is itself a write and must not be used.) Don't despair: as mentioned earlier one can arrange synchronization via other algorithmic means such as IPC-messaging.
Again: You're not giving away those abilities of SHM-classic by choosing SHM-jemalloc for free. You get goodies in return: safety goodies and allocation-perf goodies. See back here.
The next page is: Transport Core Layer: Transmitting Unstructured Data.