Flow-IPC 1.0.2
Flow-IPC project: Public API.
Shared Memory: Direct Allocation, Transport
MANUAL NAVIGATION: Preceding Page - Next Page - Table of Contents - Reference

Here we discuss the direct use of SHM arenas for allocation of C++ data structures; and how to share such structures between processes via ipc::transport. (Or go back to the prerequisite preceding page: Structured Message Transport: Messages As Long-lived Data Structures / Advanced Topics.)

How shared memory (SHM) fits into Flow-IPC

If you've been reading this Manual sequentially to this point, you've seen shared memory (SHM) mentioned many times already. Nevertheless our philosophy on how much the user – you – should have to think about SHM is as follows.

We don't want you to think about it any more than absolutely necessary. It is a means to an end: generally speaking avoiding copying data shared among processes, for performance and algorithmic ease. As such much of the work with Flow-IPC should proceed without code mentioning SHM at all. Case in point: struc::Channel allows one to transmit messages with zero-copy performance, which means SHM is internally used, but you, the user, get that benefit automatically: All you have to do, essentially, is say in code, "I want zero-copy performance." To do that, ultimately, all you do is (on each side): when setting up your IPC session(s), specify a certain Session or Session_server base type which will enable SHM-backed performance with no further references to that fact in the rest of your code. To recap the recipes given in Sessions: Setting Up an IPC Context and Structured Message Transport

Session-client:

template<ipc::session::schema::MqType S_MQ_TYPE_OR_NONE, bool S_TRANSMIT_NATIVE_HANDLES,
typename Mdt_payload = ::capnp::Void>
// Right here --------------------------v-- is where zero-copy (SHM-backed) engine is specified.
// Alternatively you could have written the following --v-- to specify a different SHM-backed engine.
// Or if you don't want zero copy: --v
// Now everything will flow from the above, alias to alias to alias... no SHM details being mentioned:
using Session = Client_session_t<session::schema::MqType::NONE, true>;
template<typename Message_body>
using Structured_channel_t = Session::Structured_channel<Message_body>;
using Cool_structured_channel = Structured_channel_t<my_meta_app::capnp::CoolMsg>;
// ...etc....
Implements Session concept on the Client_app end: a Session_mv that first achieves PEER state by conn...
Definition: client_session.hpp:145
Implements the SHM-related API common to shm::arena_lend::jemalloc::Server_session and shm::arena_len...
Definition: session.hpp:52
Implements the SHM-related API common to shm::classic::Server_session and shm::classic::Client_sessio...
Definition: session.hpp:44

Session-server:

template<ipc::session::schema::MqType S_MQ_TYPE_OR_NONE, bool S_TRANSMIT_NATIVE_HANDLES,
typename Mdt_payload = ::capnp::Void>
// Alternatively (different SHM-backed engine):
// Alternatively (no zero-copy/SHM-backing):
using Session_server_t = ipc::session::shm::Session_server<S_MQ_TYPE_OR_NONE, S_TRANSMIT_NATIVE_HANDLES, Mdt_payload>;
// Now everything will flow from the above, alias to alias to alias... no SHM details being mentioned:
using Session_server = Session_server_t<session::schema::MqType::NONE, true>;
using Session = Session_server::Server_session_obj;
template<typename Message_body>
using Structured_channel_t = Session::Structured_channel<Message_body>;
using Cool_structured_channel = Structured_channel_t<my_meta_app::capnp::CoolMsg>;
// ...etc....
This is to vanilla Session_server what shm::arena_lend::jemalloc::Server_session is to vanilla Server...
Definition: session_server.hpp:74
This is to vanilla Session_server what shm::classic::Server_session is to vanilla Server_session: it ...
Definition: session_server.hpp:74

Then it just works. Now, naturally, when working with structured data in, e.g., our example Cool_structured_channel, we may want to think beyond mere messaging and treat messages as shared data structures, in which case you will need to think about shared memory somewhat – but in a limited way. We went over that in Structured Message Transport: Messages As Long-lived Data Structures / Advanced Topics. Even there, though, SHM-related thinking is limited to algorithmic decisions: when is a given Msg_out in existence; when does its lifetime end; which side should write; how to prevent concurrent non-read-only access (synchronization). You still don't need to think about SHM pools or SHM arenas. Notably, annoying and difficult things involving SHM pool naming and cleanup, including in case of crash, are performed invisibly and without your participation.

As pointed out in Structured Message Transport: Messages As Long-lived Data Structures / Advanced Topics, however, working on shared data structures living directly and exclusively inside capnp-generated trees may be insufficient for some use cases. To recap possible ways in which it may be insufficient:

  • A capnp tree can express almost any data structure, as it supports various scalars, structs, unions, and dynamic-length lists... But: high-performance structures such as sorted trees and hash tables would require at least an entire layer of code to conveniently treat a capnp-struct substrate in that fashion. Plus the way capnp allocates space inside its segments, when a data structure is modified is not space-efficient or necessarily fast: It's not even trying to replicate a high-performance heap allocation algorithm. (The simplest example is what happens when a List size increases: capnp does not as of this writing allow the old-sized list's space to be reused subsequently at all; yet it continues taking RAM.)
  • As of this writing a received message instance (Msg_in) is a read-only view of the original out-message (even if, as we recommend, you use SHM-backing, and therefore it's really "viewing" the original structure directly where it was originally written). (As noted in Structured Message Transport: Messages As Long-lived Data Structures / Advanced Topics this could be changed in the foreseeable future; but not as of this writing.) What if you'd like the recipient to write to the data structure?

All that said, perhaps the best way to look at it is not: what can't you do with struc::Channel message directly? Rather look at it affirmatively: When two threads in a monolithic process want to participate in an algorithm on data structure X, how is that written? Well, one thread or the other creates on the heap and writes it initially; then a pointer is passed to the other thread; and then both of them simply access X collaboratively: reading, writing, synchronizing via mutex if necessary; and so on. So struc::Msg_out and struc::Msg_in are all well and good, but what if you simply want two threads in two different processes to collaborate in working on a given C++ data structure?

In that case you will want to use SHM directly.

Flow-IPC provides this ability and goes significantly further than known other SHM-access libraries, including the delightful boost.interprocess, in hiding pain from you. You won't have to worry about creating SHM pools, cleanup, naming. Perhaps most notably you won't have to write your own STL allocators or else forego the use of STL-compliant data structures due to their default code making them unsuitable for use with SHM. Flow-IPC provides a workflow to enable STL-compliant containers to be used directly in SHM. Manual/intrusive data structures, such as home-grown linked lists, are also supported. (boost.interprocess tries to do all this too, but using it involves painful stateful allocators and various other limitations. Certainly it does not leverage a commercial-grade memory manager like jemalloc; but we do. This is not actually a criticism of boost.interprocess; what it provides is very useful, and we leverage it ourselves. Just its ambition is lesser than ours: we build on it. Even excluding SHM-jemalloc and staying with SHM-classic we provide a powerful layer of usability on top of boost.interprocess.)

ipc::session and SHM

As you'll soon see, to allocate something in SHM and then share it with another process (where it can further access that "something," including modifying it if desired), one needs a SHM arena in which to allocate in the first place. Where do you get this arena, and how long will it keep existing? Generally there are 2 mutually exclusive answers to this question:

In this page we limit discussion to way 1 (with ipc::session). For way 2: documentation is available in the Reference – see various items, starting at sub-namespaces, under ipc::shm. For example see ipc::shm::classic::Pool_arena docs.

Choice of SHM-provider

There are two as of this writing:

The majority of the API is identical between them; much code could be written generically, with one being substituted for the other at compile-time at will. (In the code snippet above you can see where one would make the change, on either side.) SHM-classic does have a couple of capabilities that SHM-jemalloc lacks. To recap these here for convenience:

  • A session-client process can create app-scope objects. (In SHM-jemalloc only a session-server process can do so. This is a basic limitation of arena-lending SHM-providers and is unlikely to change in the foreseeable future.)
  • A receiving (borrowing) process – not just the original allocating process – can write to the data structure. (In SHM-jemalloc receiving-side writing is precluded at the kernel level. We could optionally disable this limitation, as internally it is nothing more than a flag to a sys-call. However this would lose valuable safety aspects that SHM-jemalloc holds over SHM-classic as an advantage for some applications. So it would be a trade-off. Nevertheless it's likely to become available in the foreseeable future.)

Conversely SHM-jemalloc, while having a somewhat reduced API set (enumerated above), has some major advantages too. See a safety-oriented discussion and nearby recap of non-safety considerations. A lengthier description can be found in the Reference.

In further discussion we will be agnostic as to the chosen SHM-provider wherever possible, with provider-specific notes where there's a difference in behavior or capability.

Direct use of SHM: How-to

The first step is to select SHM-backed sessions at compile-time; everything flows from that decision. See above for recipe.

The 1-2 arenas available upon session-establishment

SHM-backed capabilities become available, like everything else in the ipc::session paradigm, once a session has been established in PEER state (Sessions: Setting Up an IPC Context). Internally during the session-opening procedure the following occurs. (It occurs invisibly, and without ipc::session you'd have to create relevant arena(s) yourself which is very different between the 2 SHM-providers and much more difficult with SHM-jemalloc, though even with SHM-classic very annoying naming decisions would need to be made; not to mention subsequent cleanup headaches.)

  • A session-scope arena (a/k/a .session_shm()) is created. This arena's lifetime (and therefore the max lifetime of any objects allocated by either side within it) is until the end of the session.
  • If and only if this has not yet occurred earlier for the distinct ipc::session::Client_app involved in this session:
    • An app-scope arena (a/k/a .app_shm()), for that specific distinct Client_app, is created. This arena's lifetime is until the Session_server is destroyed. That is: its lifetime spans all future sessions, not just the one that triggered its creation (in on-demand fashion).
  • If the app-scope arena has already been created, it is not re-created; but it is made equally accessible via .app_shm() accessors.
Note
With SHM-jemalloc the app-scope arena is accessible (can be allocated in) only on the session-server end. The .app_shm() accesors do not exist on the session-client end.

SHM-arena capacity
How much "stuff" is it possible to allocate in each arena? The answer is: it is essentially unlimited.
More specifically, with SHM-jemalloc it is simply unlimited for all practical purposes.
With SHM-classic there is as of this writing a hard-coded limit in the gigabytes (query ipc::session::shm::classic::Session_server::pool_size_limit_mi() to get the value). If it proves insufficient, you can increase it via the similarly-named mutator pool_size_limit_mi(). Rest assured that all these gigabytes are not taken-away from general OS use from the get-go: Rather a given page (default size 4Ki) is taken-away from general use, only once it is "touched" by an allocation or write.
Parameters
Unfortunately,however,atleast in Linux there are some kernel parameters governing how much virtual SHM space can be reserved in such a way, even though ~no physical RAM is taken by just creating a pool sized in the gigabytes. In particular if system-wide active pools exceed a certain parameter Session_server::async_accept() can yield "No space left on device" (ENOSPC) error. In this case one can: (1) tweak the kernel parameter(s) as admin; (2) reduce a given Session_servers' pool-size via the aforementioned mutator; or (3) use SHM-jemalloc provider which adjusts dynamically and creates/destroys smaller pools internally as needed.

Allocating in an arena

Before something can be put into SHM and subsequently shared, one must allocate in a SHM arena. This idea is no different from how the regular heap is used, obviously. However instead of saying "allocate in the heap," one says "allocate in specific arena X."

First step is specify the arena. To do so use a .session_shm() or .app_shm() accessor; in all cases it returns a pointer to an arena object.

auto session_shm = session.session_shm(); // session_shm is a pointer to an arena object. *session_shm is the arena exclusively associated with `session`.
auto app_shm = session.app_shm(); // .app_shm() is the arena shared among all sessions with the same Client_app (by `.m_name`) as `session`.
// Alternative way to access this, in case a specific `session` is not easily available. Returns null, if no session with Client_app a_client_app has yet opened.
auto app_shm = session_server.app_shm(a_client_app);
auto cool_obj = session_shm.construct<Cool_obj>(...); // And off we go allocating in an arena.

Second step is to construct something within it. While certain lower-level capabilities exist, we omit them here and with rare advanced exceptions recommend against their use. As far as we are concerned, to allocate, you must use: x = arena->construct<T>(...), where ... are constructor args to T (possibly none); it returns Arena::Handle<T> which is, simply, shared_ptr<T>. (Whether it's boost::shared_ptr or std::shared_ptr is formally unspecified; but they have equivalent semantics.)

To distinguish between it and other things, the returned thing from .construct<T>()x above – is called a first-class SHM handle (or just SHM handle or handle). It is a shared_ptr<T>, but this particular shared-pointer has an important property that is not otherwise obvious given its apparent type. Namely it is fitted (invisibly) with a custom deleter supplied by Flow-IPC. Equally importantly, the handle is the entity that can be transmitted (lent) to another process. That is the main reason it is a first class item. Certainly subordinate allocations occur under that handle in the future – for example a vector allocating its buffer – but these are not themselves transmitted (lent) between processes. Only the first-class handle is.


Garbage-collection of SHM-allocated objects
A key feature and convention is that a SHM-handle, as returned by .construct<>(), is such that the underlying RAM is deallocated automatically. (Goes without saying possibly but do not try to manually delete x.get() or save an x.get() beyond x lifetime, or anything like that. Once a shared_ptr, always a shared_ptr.) Deallocation means return at least for further allocations in the same arena. (Internally it might also lead to return of RAM for general OS use in some situations. That's a hairy optimization detail that depends on the SHM-provider. Generally SHM-jemalloc is into such things, while SHM-classic less so.)
A very important point is that the auto-deallocation occurs on a cross-process basis. If one never transmits (lends) x via IPC, then it acts like any other shared_ptr: ref-count-zero is reached; so the resource is returned (albeit to SHM-arena as opposed to the general heap). If one transmits it via IPC N (N >= 1) times, however, then: The underlying data structure is deallocated once all first-class handles have reached ref-count-zero in their respective processes. That includes:
  • The shared_ptr group of the original x = ....construct<T>(...) call.
  • For each x_borrowed = ....borrow<T>(...) upon receipt of x by a process over IPC: The shared_ptr group of that x_borrowed.
Perhaps in plainer English (?): The original constructed handle shared_ptr, plus each similarly-functioning shared_ptr obtained by a receiving process upon IPC transmission of the original, together form a multi-process meta-shared-pointer group; and the underlying memory is deallocated no earlier than that entire meta-shared-pointer group's ref-count reaches zero. (To-do: a diagram would help here.)

A super-important topic is what T can be. Without any complications, it can at least be a pointer-free plain-old data-type (POD) of arbitrary (but known at compile-time) depth/complexity:

  • Integers, floating-point numbers, Booleans.
  • struct, union, class aggregating any of the above (including themselves – meaning multi-level structs et al).
  • Native, fixed-size arrays collecting any of the above (including themselves).
Note
An array<..., N> is allowed in a POD: it is a class or struct containing a native fixed-size array of ...s. N is a constant known at compile time. Informally we recommend the use of array<T, N> x over T x[N]: there is literally no downside (including perf).
Raw pointers are not allowed in a POD in this context (nor are they, generally, allowed in the larger context of what can go into SHM-stored T).

That's pretty good. Yet it is not good enough for a huge-range of algorithms. Ultimately that's because a POD can't have pointers (or another way of saying it: dynamically-sized arrays). Support for pointers/dynamically-sized arrays is required, among other things, to be able to store STL-compliant containers. For that matter intrusive/manual data structures (such as explicit linked lists or trees) also require pointers. Fortunately:

Allocating non-PODs (data structures with pointers, dynamically-sized items, and/or STL-compliant containers)

Flow-IPC provides extensive support for SHM-stored pointers and, by extension, STL-compliant containers (and intrusive/manual data structures).

At its formal core, in order for such a T (one involving pointers) to be .construct<T>()ible, a pointer field m_x inside (directly or indirectly via more such pointers) T must be as follows. Suppose the pointee's type is P.

  • Its type must not be P* (raw pointer) but rather: Arena::Pointer<P>, where Arena::construct<T>() was the method used to construct.
    • Pointer<P> is called a fancy pointer (yes, actual technical name in C++-world; no, it is not the same as smart pointer) to P. (For the curious: With SHM-classic it is boost::interprocess::offset_ptr<P>. With SHM-jemalloc it is a custom type written by us, internally storing a SHM pool ID and offset within the IDed pool.)
  • m_p (of type Pointer<P>) must have been obtained as follows: m_p = Pointer<P>(static_cast<P*>(arena.allocate(sizeof(P))).
  • If T::~T() is invoked, it must ensure that the following occurs: arena.deallocate(static_cast<void*>(m_p.get())).

Indeed, if your plan is to store actual pointer-like fields in the data structure rooted in T (context: .construct<T>()) – meaning T is an intrusive/manual data structure – then you must ensure all that somewhat-hairy stuff. Whereas when working with the regular heap you'd just:

  • Use P* m_p. No fancy-pointer needed.
  • m_p = new P to allocate.
  • If T::~T() is invoked: delete m_p.

So that would definitely be taxing to code. And indeed, particularly with legacy code, such measures may be necessary. We recommend against it whenever possible. Instead use STL-compliant SHM-friendly containers. These include: boost::container::* (basic_string, list, vector, map, deque, etc.), boost::unordered_* (map, set, etc.), and flow::util::Basic_blob (and Blob et al).

Warning
As a rule std:: containers are not SHM-friendly, at least in gcc as of gcc-9; they assume raw-pointer-using allocators. By contrast boost::container::* corrected those issues. std::vector happens to be okay in gcc-8 at least, but it's safer to just go with boost::container. E.g., its std::list bro is broken in this regard.

Once you've chosen your STL-compliant SHM-friendly container type – or developed your own! – you must take care to, also, specify (as the Allocator template paramater to the container type) the SHM-allocating allocator we've provided. (boost.interprocess provides an allocator template for similar use; but it is stateful, which is a huge pain in the butt – plus it uses additional RAM to store the allocator pointer.) Use this allocator:

template<typename T>
using Shm_allocator = Session::Allocator<T>;
// Alternatively:
using Arena = Session::Arena;
template<typename T>
Stateless allocator usable with STL-compliant containers to store (or merely read) them directly in S...
Definition: stateless_allocator.hpp:119

For example here's an aggregate T consisting of a few things, including scalars and containers, being constructed:

struct Widget
{
struct Node
{
int m_int;
boost::container::vector<float, Shm_allocator<float>> m_float_vec;
};
using String = boost::container::basic_string<char, std::char_traits<char>, Shm_allocator<char>>;
bool m_flag;
String m_str;
boost::unordered_map<String, Node, std::hash<String>, std::equal_to<String>,
Shm_allocator<std::pair<const String, Node>>
m_str_to_node_map;
};
auto x = session.session_shm()->construct<Widget>(); // This uses a default constructor, but generally one can (carefully, when initializing) use args.
x->m_flag = true;
// ...
// Here is what it would have looked like without SHM support (for comparison):
struct Widget
{
struct Node
{
int m_int;
std::vector<float> m_float_vec;
};
bool m_flag;
std::string m_str;
boost::unordered_map<std::string, Node> m_str_to_node_map;
};
auto x = new Widget;
x->m_flag = true;
// ...

As you can see the power is there; one just needs to remember to keep supplying the proper allocator and SHM-friendly container templates at all levels. The code is not exactly the same, but with some convenience aliases it's quite similar.


Stack (et al) use
If you intend to transmit (lend) a T-typed object via IPC, then of course you must construct<T>() it as shown above.
However a T can also be used for other purposes and yet still be held partially in SHM. For example you might build up a T = Widget::String from the example above and then move() or copy it onto x->m_str, where x = arena.construct<Widget>(), such that you intend to actually IPC-transmit first-class SHM-handle x. In that case you can still construct<T>() the intermediate Widget::String – no problem – and use it in whatever way you need, even if you aren't going to transmit that guy but just use it locally and then let it be deallocated.
However in such situations it would make your code less verbose and potentially faster – by avoiding unnecessary SHM use – to place the T on the stack (among other things; could be heap too). E.g.: x = session.session_shm()->construct<Widget>(); Widget::String temp_str; ...mutate temp_str...; x->m_str = std::move(temp_str);. Note temp_str did not need to be itself construct<>() ed. The String itself (the outer fixed-size thing of size sizeof(String)) is on the stack; but whatever it needed to actually allocate on its own behalf it would have used SHM for (due to the allocator properly being configured as part of the String alias).
If you are familiar with C++ allocators, then this is perhaps no surprise.

Mutating a SHM-stored T

So you've constructed it. How do you fill it out further? Well, if T is a POD as defined above, then you just do it; no different from the usual. E.g. here we modify the POD-ish parts of a Widget:

x->m_flag = true;
x->m_str[0] = 'X'; // Assumes, of course, that by this point: x->m_str.size() > 0. Else this would be undefined behavior/buffer overflow.

If, however, what you're doing will require some part of T to allocate on its behalf, then it'll need to use its allocator. But wait... why won't that just work? After all we've specified the Allocator template param so nicely in our Widget declaration! Answer: We've declared the allocator type to use, yes, and that is critical. However, the container code needs to know which arena to actually allocate-in. arena->construct<T>() is Flow-IPC code, and it knows it's being invoked on *arena, so it does (internally) what's needed. However consider this:

assert(x->m_str.empty());
x->m_str.resize(128); // This should make x->m_str contain 128 NUL characters.

basic_string::resize(), internally, will invoke its "stored" allocator of type Shm_allocator to allocate at least a 128-buffer. But Shm_allocator = Stateless_allocator<...> is a stateless allocator. That means the allocator object "stored" inside the container outer structure has size 0: it takes no space and has no state. So the container code can't "know" from what Arena to allocate space!

To make allocating mutators (in this case .resize()) work you must activate the arena. This is done in thread-local fashion by using a RAII-style object, an ipc::shm::stl::Arena_activator:

auto x = session.session_shm()->construct<Widget>();
auto y = session.app_shm()->construct<Widget>();
{
Activator ctx_sess(session.session_shm()); // When modifying Shm_allocator-using things best to do this at the top.
x->m_flag = true; // ctx_sess is active but has zero effect on this (harmless and zero perf impact).
x->m_str.resize(128); // ctx_sess is active and will cause proper arena to be used for allocation.
{
Activator ctx_app(session.app_shm()); // Activators "stack": the latest one to be cted and not dted is active.
y->m_str.resize(128); // ctx_app is active and will cause proper arena to be used for allocation.
}
// ctx_sess is active again.
x->m_str.resize(1024);
}
// If no Activator is active here (in this thread) this would trip internal assert() inside Shm_allocator.
x->m_str.resize(128);
RAII-style class operating a stack-like notion of a the given thread's currently active SHM-aware Are...
Definition: arena_activator.hpp:41

Note well: Arena_activators stack as shown. So you can work with multiple arenas in close proximity. The key point is it affects only the current thread.

Note
boost.interprocess provides a stateful allocator. Using those is quite painful, especially when using containers of containers, but let's not get into it here. Our syntax is much less painful, though there's no free lunch: we have to use the activator statement to make it work, and there is some compute used internally to those to make the thread-local state apply. We feel this trade-off is well worth it in our favor.

When MUST/SHOULD you activate an arena?
We've answered this already: whenever you're modifying a data structure living in SHM, such that it is of a type fitted with Allocator template parameter equal to Stateless_allocator<something>, and that operation might want to allocate or deallocate (via its Allocator). In typical use of STL-compliant containers we don't usually have to think about such technicalities; most people don't really know or care about STL-allocators at all. Hence knowing when specifically you must do it is arguably a somewhat tall order. Possibly.
So, informally, a better question might be: what are convenient rules of thumb that'll ensure you do the right thing, without worrying too much about allocator technicalities? We suggest these:
  • If your code block is const in nature – you're not modifying your T at all – then it is never necessary. (In point of fact, with SHM-jemalloc not only is it not necessary, it is also impossible on the receiving side of an IPC-transmission of a SHM-handle. There is simply no arena to activate in that situation. And as of this writing, writing on the receiving side is disallowed at the kernel level anyway. With SHM-classic, though, both allocation and writing is possible. Nevertheless even then: if your receiving-side algorithm is read-only w/r/t a SHM-stored T, then you need not – and should not – activate any arena.)
  • If your code block is non-const in nature – you are making some modifications to T within it:
    • Generally you should just activate the arena at the top of the code block: better safe than sorry. Those statements that don't touch an allocator won't be affected; but those that do will just work nicely. And you needn't worry about the details of which is which, if it's active at the top.
    • However, if you're extremely perf-conscious, and some code paths don't touch anything allocator-related, then you could be somewhat conditional about it, activating the arena in smaller sub-blocks of your overall non-const block.
As a reminder, it is not necessary to activate an arena when .construct<T>()ing the outer object (first-class SHM-handle). .construct<T>() will take care of it.
It is also emphatically not necessary to activate an arena when the outer object might get destroyed (due to the SHM-handle reaching ref-count zero). Our internal custom-deleter code will take care of it.

Transmission of SHM-stored data structures: Lend and borrow

We've made references to transmission/lending a few times already. E.g., we've mentioned a construct()ed T gets deallocated once all handles – the original and any borrowed ones – have reached ref-count zero. Time to fill that big gap. How would those other handles spring into existence, presumably in other processes?

It's all well and good to construct and modify a thing in SHM, but it's not useful until another process receives it via IPC and begins their own work, whether read-only or read-write. Until then you might as well have just constructed in the regular heap in the first place.

These are the steps in sharing a first-class SHM-handle x = some_arena.construct<T>(...). Note that only a first-class handle can be shared: you can't share "just" x->m_str or something. And recall that x is shared_ptr<T> (which is also aliased from Arena::Handle<T> for stylistic purposes). The original creator of the handle – the guy to call construct() – is called the handle's owner or owner process. The owner can transmit a handle to another process, called the borrower (process). When doing so the owner is called the lender (process).

  • Lender process (has x): It prepares x for transmission by calling: const auto lend_blob = session.lend_object(x). lend_blob is a small encoding of certain information (a few bytes). It needs to be copied into/out of a transport: which is far superior to doing that to the entirety of the actual object!
  • Lender process + borrower process in tandem: Transmit lend_blob to the former from the latter, via any IPC technique whatsoever.
  • Borrower process (wants its own x_borrowed, like x): It obtains x_borrowed by calling: auto x_borrowed = session.borrow_object<T>(lend_blob).

Et voilà! You've got yourself a guy just like x but in another process. x_borrowed is, also, a shared_ptr<T>. One thing to note here is that the arena is not a part of this procedure. The session is; and the session in and of itself determines who's the recipient. Another thing to note is that the borrower code of yours must know the type T; if it does not match then behavior is undefined.

Note
You can alternatively use similar .lend_object() and .borrow_object() method on SHM-arena and SHM-session objects provided by ipc::shm. However this opens up various subtleties beyond our scope here. The Reference (in particular at least here, here, and here) will help shed light. This technique may be useful in particular if working with arenas outside the ipc::session paradigm.

Note that we wrote the above in terms of lender and borrower; not specifically owner and borrower. That is because a borrower can act as a lender and transmit the handle to yet another process. This is called proxying. However, in its current version, SHM-jemalloc does not support proxying (a future version likely will, as the feature was designed from the start). SHM-classic does fully support proxying. That said, within an ipc::session-based IPC universe proxying is somewhat unlikely. We'll cut that discussion off here, as it gets into hairy super-advanced topics.


What does it mean for the owner and borrower types to match?
We glossed over it; we simply said the owner T and the borrower-T (template param to borrow_object() and therefore to shared_ptr for x_borrowed) must "match." What does that mean? Does it mean they must be the same type/bit-compatible?
Yes and no. Depends. First if T is a POD (as defined earlier in this Manual page), then yes: T should just be the same type, and that's that. Now let's say T uses STL-compliant members (at any depth) and/or pointers (ditto). Then: The short answer depends on whether your code is generically meant to work with SHM-classic or SHM-jemalloc interchangeably – or only one of them specifically.
  • If it should work generically with either:
    • Owner-side T must (as we've explained) use Session::Allocator-equipped types and/or Session::Allocator::Pointer-typed pointers. By contrast borrower-side T must be an identical type except the allocator type for both of those things within T shall be not Session::Allocator but rather: Session::Borrower_allocator.
  • If it intends to specifically work with (and use specific features/properties of) SHM-classic:
  • If it intends to specifically work with (and use specific features/properties of) SHM-jemalloc: Use Session::Borrower_allocator.
To recap: if it's a POD, same T on all sides. If it uses STL-compliant/pointer stuff, then the most generic way it to use Session::Allocator in owner code; Session::Borrower_allocator in borrower code. And if targeting SHM-classic specifically, it is okay – for conciseness though not genericness – to just use the same type T on both sides, period.
What's going on here, you ask? We'd rather not get into it here; we've provided the recipe. But various docs inside ipc::shm explain all the subleties. Long story short: SHM-classic is highly symmetric and (relatively) simple, so the borrower and owner are really internally operating on the same SHM-pool. In SHM-jemalloc only the owner even has the arena per se; the borrower has only a read-only view into parts of it – so it needs a special, degenerate "borrower" allocator which is used not to allocate but to only interpret pointers properly. (Whereas on the owner side it is used for that and allocation code.)

To close the loop: how to, in fact, IPC-transmit lend_blob – the thing returned by lend_object() and fed to borrow_object()? Well, it's just a little blob, so you can do it however you want. However, if you're using struc::Channel (Structured Message Transport) – or for some odd reason capnp but without struc::Channel – then we've made a couple of utilities to reduce your boiler-plate. Here's how to use it:

Example schema:

@0xa780a4869d13f307;
using Cxx = import "/capnp/c++.capnp";
using Common = import "/ipc/transport/struc/shm/schema/common.capnp"; # Flow-IPC supplies this.
using ShmHandle = Common.ShmHandle; # It's just a blob holder really.
$Cxx.namespace("my_meta_app::capnp");
# ...
struct SomeMsg
{
# Example of message struct or sub-struct that among possible other things conveys a first-class SHM-handle.
# ...
widgetHandle @7 :ShmHandle;
# ...
}
# ...

Example owner/lender code:

auto x = session.session_shm()->construct<Widget>();
// ...Fill out *x....
auto msg = cool_channel.create_msg();
// Ready the lend_blob-storage inside the out-message.
auto widget_handle_root = msg.body_root()->initSomeMsg().initWidgetHandle();
// Perform the lend_object() step and load the result into the out-message.
session.lend_object(x)); // Input.
// IPC-transmit it via struc::Channel.
cool_channel.send(msg);
void capnp_set_lent_shm_handle(schema::ShmHandle::Builder *shm_handle_root, const flow::util::Blob_sans_log_context &lend_result)
Utility that saves the result of a Shm_session1::lend_object<T>(const shared_ptr<T>&) result into the...
Definition: util.cpp:28

And counterpart borrower code:

flow::util::Blob_sans_log_context lend_blob;
ipc::transport::struc::shm::capnp_get_shm_handle_to_borrow(msg->body_root().getSomeMsg().getWidgetHandle(), &lend_blob);
auto x_borrowed = session.borrow_object<Widget_brw>(lend_blob);
FLOW_LOG_INFO("Hey, let's read inside SHM after receiving SHM-handle: [" << x_borrowed->m_flag << "].");
void capnp_get_shm_handle_to_borrow(const schema::ShmHandle::Reader &shm_handle_root, flow::util::Blob_sans_log_context *arg_to_borrow)
Utility that's the reverse of capnp_set_lent_shm_handle() to be invoked on the deserializing side.
Definition: util.cpp:50

Simple! That said we opportunistically note: The borrower-side is using Widget_brw, a type we have not explicitly provided the code for in the actual example. Per the side-bar above, with SHM-classic it could just be Widget (same as in owner); but with SHM-jemalloc and generically you'd need to define a mirror of Widget called Widget_brw which would use Session::Borrower_allocator instead of Session::Allocator all-over. We'll leave that as an exercise to the reader. Tip: You do not need to copy paste the same type twice. Use template trickery – perhaps std::conditional_t – to conveniently pick between the two *llocator templates depending on a compile-time bool S_OWN_ELSE_BRW template parameter perhaps.

What can you do with a borrowed (received) data structured in SHM?

The answer to this is sprinkled throughout the above. Nevertheless it seemed prudent to put a fine point on the answer as well as perhaps provide some algorithmic tips.

Firstly, in every case, you can access it in read-only fashion. The syntax is just C++ syntax.

Secondly, with SHM-classic specifically, you can do anything else including modifying the data structure which in turn includes operations that would further allocate in SHM.

Note
The fact you can doesn't mean you should. Not that we're saying you shouldn't either. In any case consider you may need to synchronize access, if (upon receipt of a handle) at least one side will be writing, while another side would be reading. If your synchronization scheme isn't based on an IPC messaging protocol (e.g., "you write, then send something, then I write, then I send something, then you write..."), then you might need a mutex and/or condition variables. In this context you'd simply place a regular mutex and/or condition variable into SHM and have each side lock on it. Formally boost.interprocess provides boost::interprocess::interprocess_mutex (and a recursive variant) and boost::interprocess::interprocess_condition (et al). Informally these appear to use the same stuff as boost.thread mutex, condition_variable (namely in POSIX they use pthread primitives), so you could probably just use those.
So... even if you can (which you can with SHM-classic): Should you (1) have both sides write to an in-SHM structure; or at least (2) have only 1 side write but synchronize using an in-SHM mutex? Answer: Firstly realize that (2) is really (1) in disguise: Locking a mutex (et al) writes to it in its memory location... so while your algorithm may conceptually avoid writing from 1 of the 2 sides, it is actually still writing which means safety worries still apply: Could crash during a lock... and other stuff. So that leaves basically (1). The answer is: possibly. By default, all else being equal, it is best avoided for safety reasons (detailed in Safety and Permissions). That said tons of applications and use cases need not be so paranoid, and they very well might enjoy using an algorithm, wherein 2 threads separated by a process boundary collaborate in read/write fashion on a common data structure.

So that leaves 2 remaining situations:

  • You're writing code for SHM-jemalloc specifically.
  • You're writing generic code that would work with either SHM-provider.

Either way: Your code then shall not write to a borrowed in-SHM data structure: it must read only. If the owner does want to write post-transmission, it must ensure such access is synchronized with concurrent reads on the borrower side. You may not use a mutex and/or condition variable in-SHM to arrange such synchronization. (A mutex-lock operation is itself a write and must not be used.) Don't despair: as mentioned earlier one can arrange synchronization via other algorithmic means such as IPC-messaging.

Again: You're not giving away those abilities of SHM-classic by choosing SHM-jemalloc for free. You get goodies in return: safety goodies and allocation-perf goodies. See back here.

The next page is: Transport Core Layer: Transmitting Unstructured Data.


MANUAL NAVIGATION: Preceding Page - Next Page - Table of Contents - Reference