Locality Group APIs
This chapter describes the APIs that applications use to interact with locality groups.
This chapter discusses the following topics:
-
Locality Groups Overview describes the locality group abstraction.
-
Verifying the Interface Version describes the functions that give information about the interface.
-
Initializing the Locality Group Interface describes function calls that initialize and shut down the portion of the interface that is used to traverse the locality group hierarchy and to discover the contents of a locality group.
-
Locality Group Hierarchy describes function calls that navigate the locality group hierarchy and functions that get characteristics of the locality group hierarchy.
-
Locality Group Contents describes function calls that retrieve information about a locality group's contents.
-
Locality Group Characteristics describes function calls that retrieve information about a locality group's characteristics.
-
Locality Groups and Thread and Memory Placement describes how to affect the locality group placement of a thread and its memory.
-
Examples of API Usage contains code that performs example tasks by using the APIs that are described in this chapter.
1.1. Locality Groups Overview
Shared memory multiprocessor computers contain multiple CPUs. Each CPU can access all of the memory in the machine. In some shared memory multiprocessors, the memory architecture enables each CPU to access some areas of memory more quickly than other areas.
When a machine with such a memory architecture runs the illumos software, providing information to the kernel about the shortest access times between a given CPU and a given area of memory can improve the system's performance. The locality group (lgroup) abstraction has been introduced to handle this information. The lgroup abstraction is part of the Memory Placement Optimization (MPO) feature.
An lgroup is a set of CPU–like and memory–like devices in which each CPU in the set can access any memory in that set within a bounded latency interval. The value of the latency interval represents the least common latency between all the CPUs and all the memory in that lgroup. The latency bound that defines an lgroup does not restrict the maximum latency between members of that lgroup. The value of the latency bound is the shortest latency that is common to all possible CPU-memory pairs in the group.
Lgroups are hierarchical. The lgroup hierarchy is a Directed Acyclic Graph (DAG) and is similar to a tree, except that an lgroup might have more than one parent. The root lgroup contains all the resources in the system and can include child lgroups. Furthermore, the root lgroup can be characterized as having the highest latency value of all the lgroups in the system. All of its child lgroups will have lower latency values. The lgroups closer to the root have a higher latency while lgroups closer to leaves have lower latency.
A computer in which all the CPUs can access all the memory in the same amount of time can be represented with a single lgroup (see Single Locality Group Schematic). A computer in which some of the CPUs can access some areas of memory in a shorter time than other areas can be represented by using multiple lgroups (see Multiple Locality Groups Schematic).


The organization of the lgroup hierarchy simplifies the task of finding the nearest resources in the system. Each thread is assigned a home lgroup upon creation. The operating system attempts to allocate resources for the thread from the thread's home lgroup by default. For example, the illumos kernel attempts to schedule a thread to run on the CPUs in the thread's home lgroup and allocate the thread's memory in the thread's home lgroup by default. If the desired resources are not available from the thread's home lgroup, the kernel can traverse the lgroup hierarchy to find the next nearest resources from parents of the home lgroup. If the desired resources are not available in the home lgroup's parents, the kernel continues to traverse the lgroup hierarchy to the successive ancestor lgroups of the home lgroup. The root lgroup is the ultimate ancestor of all other lgroups in a machine and contains all of the machine's resources.
The lgroup APIs export the lgroup abstraction for applications to use
for observability and performance tuning. A new library, called liblgrp
,
contains the new APIs. Applications can use the APIs to perform the following
tasks:
-
Traverse the group hierarchy
-
Discover the contents and characteristics of a given lgroup
-
Affect the thread and memory placement on lgroups
1.2. Verifying the Interface Version
The lgrp_version(3LGRP) function must be used to verify the presence of a
supported lgroup interface before using the lgroup API. The lgrp_version
function has the following syntax:
#include <sys/lgrp_user.h>
int lgrp_version(const int version);
The lgrp_version
function takes a version number
for the lgroup interface as an argument and returns the lgroup interface version
that the system supports. When the current implementation of the lgroup API
supports the version number in the version
argument, the lgrp_version
function returns that version number. Otherwise, the lgrp_version
function returns LGRP_VER_NONE
.
#include <sys/lgrp_user.h>
if (lgrp_version(LGRP_VER_CURRENT) != LGRP_VER_CURRENT) {
fprintf(stderr, "Built with unsupported lgroup interface %d\n",
LGRP_VER_CURRENT);
exit (1);
}
1.3. Initializing the Locality Group Interface
Applications must call lgrp_init(3LGRP) in order to use the APIs
for traversing the lgroup hierarchy and to discover the contents of the lgroup
hierarchy. The call to lgrp_init
gives the application
a consistent snapshot of the lgroup hierarchy. The application developer can
specify whether the snapshot contains only the resources that are available
to the calling thread specifically or the resources that are available to
the operating system in general. The lgrp_init
function
returns a cookie that is used for the following tasks:
-
Navigating the lgroup hierarchy
-
Determining the contents of an lgroup
-
Determining whether the snapshot is current
1.3.1. Using lgrp_init
The lgrp_init
function initializes the lgroup interface
and takes a snapshot of the lgroup hierarchy.
#include <sys/lgrp_user.h>
lgrp_cookie_t lgrp_init(lgrp_view_t view);
When the lgrp_init
function is called with LGRP_VIEW_CALLER
as the view, the function returns a snapshot that contains only
the resources that are available to the calling thread. When the lgrp_init
function is called with LGRP_VIEW_OS
as the
view, the function returns a snapshot that contains the resources that are
available to the operating system. When a thread successfully calls the lgrp_init
function, the function returns a cookie that is used
by any function that interacts with the lgroup hierarchy. When a thread no
longer needs the cookie, call the lgrp_fini
function
with the cookie as the argument.
The lgroup hierarchy consists of a root lgroup that contains all of the machine's CPU and memory resources. The root lgroup might contain other locality groups bounded by smaller latencies.
The lgrp_init
function can return two errors. When
a view is invalid, the function returns EINVAL
. When
there is insufficient memory to allocate the snapshot of the lgroup hierarchy,
the function returns ENOMEM
.
1.3.2. Using lgrp_fini
The lgrp_fini(3LGRP) function ends the usage of a given cookie and frees the corresponding lgroup hierarchy snapshot.
#include <sys/lgrp_user.h>
int lgrp_fini(lgrp_cookie_t cookie);
The lgrp_fini
function takes a cookie that represents
an lgroup hierarchy snapshot created by a previous call to lgrp_init
.
The lgrp_fini
function frees the memory that is allocated
to that snapshot. After the call to lgrp_fini
, the cookie
is invalid. Do not use that cookie again.
When the cookie passed to the lgrp_fini
function
is invalid, lgrp_fini
returns EINVAL
.
1.4. Locality Group Hierarchy
The APIs that are described in this section enable the calling thread to navigate the lgroup hierarchy. The lgroup hierarchy is a directed acyclic graph that is similar to a tree, except that a node might have more than one parent. The root lgroup represents the whole machine and contains all of that machine's resources. The root lgroup is the lgroup with the highest latency value in the system. Each of the child lgroups contains a subset of the hardware that is in the root lgroup. Each child lgroup is bounded by a lower latency value. Locality groups that are closer to the root have more resources and a higher latency. Locality groups that are closer to the leaves have fewer resources and a lower latency. An lgroup can contain resources directly within its latency boundary. An lgroup can also contain leaf lgroups that contain their own sets of resources. The resources of leaf lgroups are available to the lgroup that encapsulates those leaf lgroups.
1.4.1. Using lgrp_cookie_stale
The lgrp_cookie_stale(3LGRP) function determines whether the snapshot of the lgroup hierarchy represented by the given cookie is current.
#include <sys/lgrp_user.h>
int lgrp_cookie_stale(lgrp_cookie_t cookie);
The cookie returned by the lgrp_init
function can
become stale due to several reasons that depend on the view that the snapshot
represents. A cookie that is returned by calling the lgrp_init
function
with the view set to LGRP_VIEW_OS
can become stale due
to changes in the lgroup hierarchy such as dynamic reconfiguration or a change
in a CPU's online status. A cookie that is returned by calling the lgrp_init
function with the view set to LGRP_VIEW_CALLER
can
become stale due to changes in the calling thread's processor set or changes
in the lgroup hierarchy. A stale cookie is refreshed by calling the lgrp_fini
function with the old cookie, followed by calling lgrp_init
to generate a new cookie.
The lgrp_cookie_stale
function returns EINVAL
when the given cookie is invalid.
1.4.2. Using lgrp_view
The lgrp_view(3LGRP) function determines the view with which a given lgroup hierarchy snapshot was taken.
#include <sys/lgrp_user.h>
lgrp_view_t lgrp_view(lgrp_cookie_t cookie);
The lgrp_view
function takes a cookie that represents
a snapshot of the lgroup hierarchy and returns the snapshot's view of the
lgroup hierarchy. Snapshots that are taken with the view LGRP_VIEW_CALLER
contain only the resources that are available to the calling thread.
Snapshots that are taken with the view LGRP_VIEW_OS
contain
all the resources that are available to the operating system.
The lgrp_view
function returns EINVAL
when
the given cookie is invalid.
1.4.3. Using lgrp_nlgrps
The lgrp_nlgrps(3LGRP) function returns the number of locality groups in the system. If a system has only one locality group, memory placement optimizations have no effect.
#include <sys/lgrp_user.h>
int lgrp_nlgrps(lgrp_cookie_t cookie);
The lgrp_nlgrps
function takes a cookie that represents
a snapshot of the lgroup hierarchy and returns the number of lgroups available
in the hierarchy.
The lgrp_nlgrps
function returns EINVAL
when
the cookie is invalid.
1.4.4. Using lgrp_root
The lgrp_root(3LGRP) function returns the root lgroup ID.
#include <sys/lgrp_user.h>
lgrp_id_t lgrp_root(lgrp_cookie_t cookie);
The lgrp_root
function takes a cookie that represents
a snapshot of the lgroup hierarchy and returns the root lgroup ID.
1.4.5. Using lgrp_parents
The lgrp_parents(3LGRP) function takes a cookie that represents a snapshot of the lgroup hierarchy and returns the number of parent lgroups for the specified lgroup.
#include <sys/lgrp_user.h>
int lgrp_parents(lgrp_cookie_t cookie, lgrp_id_t child,
lgrp_id_t *lgrp_array, uint_t lgrp_array_size);
If lgrp_array
is not NULL
and
the value of lgrp_array_size
is not zero, the lgrp_parents
function fills the array with parent lgroup IDs until the array
is full or all parent lgroup IDs are in the array. The root lgroup has zero
parents. When the lgrp_parents
function is called for
the root lgroup, lgrp_array
is not filled in.
The lgrp_parents
function returns EINVAL
when
the cookie is invalid. The lgrp_parents
function returns ESRCH
when the specified lgroup ID is not found.
1.4.6. Using lgrp_children
The lgrp_children(3LGRP) function takes a cookie that represents the calling thread's snapshot of the lgroup hierarchy and returns the number of child lgroups for the specified lgroup.
#include <sys/lgrp_user.h>
int lgrp_children(lgrp_cookie_t cookie, lgrp_id_t parent,
lgrp_id_t *lgrp_array, uint_t lgrp_array_size);
If lgrp_array
is not NULL
and
the value of lgrp_array_size
is not zero, the lgrp_children
function fills the array with child lgroup IDs until the array
is full or all child lgroup IDs are in the array.
The lgrp_children
function returns EINVAL
when
the cookie is invalid. The lgrp_children
function returns ESRCH
when the specified lgroup ID is not found.
1.5. Locality Group Contents
The following APIs retrieve information about the contents of a given lgroup.
The lgroup hierarchy organizes the domain's resources to simplify the process of locating the nearest resource. Leaf lgroups are defined with resources that have the least latency. Each of the successive ancestor lgroups of a given leaf lgroup contains the next nearest resources to its child lgroup. The root lgroup contains all of the resources that are in the domain.
The resources of a given lgroup are contained directly within that lgroup or indirectly within the leaf lgroups that the given lgroup encapsulates. Leaf lgroups directly contain their resources and do not encapsulate any other lgroups.
1.5.1. Using lgrp_resources
The lgrp_resources
function returns the number
of resources contained in a specified lgroup.
#include <sys/lgrp_user.h>
int lgrp_resources(lgrp_cookie_t cookie, lgrp_id_t lgrp, lgrp_id_t *lgrpids,
uint_t count, lgrp_rsrc_t type);
The lgrp_resources
function takes a cookie that
represents a snapshot of the lgroup hierarchy. That cookie is obtained from
the lgrp_init
function. The lgrp_resources
function
returns the number of resources that are in the lgroup with the ID that is
specified by the value of the lgrp
argument. The lgrp_resources
function represents the resources with a set of lgroups that directly
contain CPU or memory resources. The lgrp_rsrc_t
argument
can have the following two values:
LGRP_RSRC_CPU
-
The
lgrp_resources
function returns the number of CPU resources. LGRP_RSRC_MEM
-
The
lgrp_resources
function returns the number of memory resources.
When the value passed in the lgrpids[]
argument is
not null and the count
argument is not zero, the lgrp_resources
function stores lgroup IDs in the lgrpids[]
array.
The number of lgroup IDs stored in the array can be up to the value of the count
argument.
The lgrp_resources
function returns EINVAL
when
the specified cookie, lgroup ID, or type are not valid. The lgrp_resources
function returns ESRCH
when the function
does not find the specified lgroup ID.
1.5.2. Using lgrp_cpus
The lgrp_cpus(3LGRP) function takes a cookie that represents a snapshot of the lgroup hierarchy and returns the number of CPUs in a given lgroup.
#include <sys/lgrp_user.h>
int lgrp_cpus(lgrp_cookie_t cookie, lgrp_id_t lgrp, processorid_t *cpuids,
uint_t count, int content);
If the cpuid[]
argument is not NULL
and
the CPU count is not zero, the lgrp_cpus
function fills
the array with CPU IDs until the array is full or all the CPU IDs are in the
array.
The content
argument can have the following two
values:
LGRP_CONTENT_ALL
-
The
lgrp_cpus
function returns IDs for the CPUs in this lgroup and this lgroup's descendants. LGRP_CONTENT_DIRECT
-
The
lgrp_cpus
function returns IDs for the CPUs in this lgroup only.
The lgrp_cpus
function returns EINVAL
when
the cookie, lgroup ID, or one of the flags is not valid. The lgrp_cpus
function
returns ESRCH
when the specified lgroup ID is not found.
1.5.3. Using lgrp_mem_size
The lgrp_mem_size(3LGRP) function takes a cookie that represents a snapshot
of the lgroup hierarchy and returns the size of installed or free memory in
the given lgroup. The lgrp_mem_size
function reports
memory sizes in bytes.
#include <sys/lgrp_user.h>
lgrp_mem_size_t lgrp_mem_size(lgrp_cookie_t cookie, lgrp_id_t lgrp,
int type, int content)
The type
argument can have the following two
values:
LGRP_MEM_SZ_FREE
-
The
lgrp_mem_size
function returns the amount of free memory in bytes. LGRP_MEM_SZ_INSTALLED
-
The
lgrp_mem_size
function returns the amount of installed memory in bytes.
The content
argument can have the following two
values:
LGRP_CONTENT_ALL
-
The
lgrp_mem_size
function returns the amount of memory in this lgroup and this lgroup's descendants. LGRP_CONTENT_DIRECT
-
The
lgrp_mem_size
function returns the amount of memory in this lgroup only.
The lgrp_mem_size
function returns EINVAL
when
the cookie, lgroup ID, or one of the flags is not valid. The lgrp_mem_size
function returns ESRCH
when the specified
lgroup ID is not found.
1.6. Locality Group Characteristics
The following API retrieves information about the characteristics of a given lgroup.
1.6.1. Using lgrp_latency_cookie
The lgrp_latency(3LGRP) function returns the latency between a CPU in one lgroup to the memory in another lgroup.
#include <sys/lgrp_user.h>
int lgrp_latency_cookie(lgrp_cookie_t cookie, lgrp_id_t from, lgrp_id_t to.
lat_between_t between);
The lgrp_latency_cookie
function takes a cookie
that represents a snapshot of the lgroup hierarchy. The lgrp_init
function
creates this cookie. The lgrp_latency_cookie
function
returns a value that represents the latency between a hardware resource in
the lgroup given by the value of the from
argument
and a hardware resource in the lgroup given by the value of the to
argument.
If both arguments point to the same lgroup, the lgrp_latency_cookie
function
returns the latency value within that lgroup.
The latency value returned by the lgrp_latency_cookie
function
is defined by the operating system and is platform-specific. This value does
not necessarily represent the actual latency between hardware devices. Use this value
only for comparison within one domain.
When the value of the between
argument is LGRP_LAT_CPU_TO_MEM
, the lgrp_latency_cookie
function measures
the latency from a CPU resource to a memory resource.
The lgrp_latency_cookie
function returns EINVAL
when the lgroup ID is not valid. When the lgrp_latency_cookie
function does not find the specified lgroup ID, the “from”
lgroup does not contain any CPUs, or the “to” lgroup does not
have any memory, the lgrp_latency_cookie
function returns ESRCH
.
1.7. Locality Groups and Thread and Memory Placement
This section discusses the APIs used to discover and affect thread and memory placement with respect to lgroups.
-
The lgrp_home(3LGRP) function is used to discover thread placement.
-
The meminfo(2) system call is used to discover memory placement.
-
The
MADV_ACCESS
flags to the madvise(3C) function are used to affect memory allocation among lgroups. -
The lgrp_affinity_set(3LGRP) function can affect thread and memory placement by setting a thread's affinity for a given lgroup.
-
The affinities of an lgroup may specify an order of preference for lgroups from which to allocate resources.
-
The kernel needs information about the likely pattern of an application's memory use in order to allocate memory resources efficiently.
-
The
madvise
function and its shared object analoguemadv.so.1
provide this information to the kernel. -
A running process can gather memory usage information about itself by using the
meminfo
system call.
1.7.1. Using lgrp_home
The lgrp_home
function returns the home lgroup
for the specified process or thread.
#include <sys/lgrp_user.h>
lgrp_id_t lgrp_home(idtype_t idtype, id_t id);
The lgrp_home
function returns EINVAL
when
the ID type is not valid. The lgrp_home
function returns EPERM
when the effective user of the calling process is not the
superuser and the real or effective user ID of the calling process does not
match the real or effective user ID of one of the threads. The lgrp_home
function returns ESRCH
when the specified
process or thread is not found.
1.7.2. Using madvise
The madvise
function advises the kernel that a
region of user virtual memory in the range starting at the address specified
in addr
and with length equal to the value of the len
parameter is expected to follow a particular pattern of use.
The kernel uses this information to optimize the procedure for manipulating
and maintaining the resources associated with the specified range. Use of
the madvise
function can increase system performance
when used by programs that have specific knowledge of their access patterns
over memory.
#include <sys/types.h>
#include <sys/mman.h>
int madvise(caddr_t addr, size_t len, int advice);
The madvise
function provides the following flags
to affect how a thread's memory is allocated among lgroups:
MADV_ACCESS_DEFAULT
-
This flag resets the kernel's expected access pattern for the specified range to the default.
MADV_ACCESS_LWP
-
This flag advises the kernel that the next LWP to touch the specified address range is the LWP that will access that range the most. The kernel allocates the memory and other resources for this range and the LWP accordingly.
MADV_ACCESS_MANY
-
This flag advises the kernel that many processes or LWPs will access the specified address range randomly across the system. The kernel allocates the memory and other resources for this range accordingly.
The madvise
function can return the following values:
EAGAIN
-
Some or all of the mappings in the specified address range, from
addr
toaddr
+len
, are locked for I/O. EINVAL
-
The value of the
addr
parameter is not a multiple of the page size as returned by sysconf(3C), the length of the specified address range is less than or equal to zero, or the advice is invalid. EIO
-
An I/O error occurs while reading from or writing to the file system.
ENOMEM
-
Addresses in the specified address range are outside the valid range for the address space of a process or the addresses in the specified address range specify one or more pages that are not mapped.
ESTALE
-
The NFS file handle is stale.
1.7.3. Using madv.so.1
The madv.so.1
shared object enables the selective
configuration of virtual memory advice for launched processes and their descendants.
To use the shared object, the following string must be present in the environment:
LD_PRELOAD=$LD_PRELOAD:madv.so.1
The madv.so.1
shared object applies memory advice
as specified by the value of the MADV
environment variable.
The MADV
environment variable specifies the virtual memory
advice to use for all heap, shared memory, and mmap regions in the process
address space. This advice is applied to all created processes. The following
values of the MADV
environment variable affect resource allocation
among lgroups:
access_default
-
This value resets the kernel's expected access pattern to the default.
access_lwp
-
This value advises the kernel that the next LWP to touch an address range is the LWP that will access that range the most. The kernel allocates the memory and other resources for this range and the LWP accordingly.
access_many
-
This value advises the kernel that many processes or LWPs will access memory randomly across the system. The kernel allocates the memory and other resources accordingly.
The value of the MADVCFGFILE
environment variable is
the name of a text file that contains one or more memory advice configuration
entries in the form exec-name:advice-opts.
The value of exec-name is the name of an application or executable. The value of exec-name can be a full pathname, a base name, or a pattern string.
The value of advice-opts is of the form region=advice. The values of advice are the same as the values for the MADV
environment
variable. Replace region with any of the following
legal values:
madv
-
Advice applies to all heap, shared memory, and mmap(2) regions in the process address space.
heap
-
The heap is defined to be the brk(2) area. Advice applies to the existing heap and to any additional heap memory allocated in the future.
shm
-
Advice applies to shared memory segments. See shmat(2) for more information on shared memory operations.
ism
-
Advice applies to shared memory segments that are using the
SHM_SHARE_MMU
flag. Theism
option takes precedence overshm
. dsm
-
Advice applies to shared memory segments that are using the
SHM_PAGEABLE
flag. Thedsm
option takes precedence overshm
. mapshared
-
Advice applies to mappings established by the
mmap
system call using theMAP_SHARED
flag. mapprivate
-
Advice applies to mappings established by the
mmap
system call using theMAP_PRIVATE
flag. mapanon
-
Advice applies to mappings established by the
mmap
system call using theMAP_ANON
flag. Themapanon
option takes precedence when multiple options apply.
The value of the MADVERRFILE
environment variable is
the name of the path where error messages are logged. In the absence of a MADVERRFILE
location, the madv.so.1
shared object
logs errors by using syslog(3C) with
a LOG_ERR
as the severity level and LOG_USER
as
the facility descriptor.
Memory advice is inherited. A child process has the same advice as its
parent. The advice is set back to the system default advice after a call to exec(2) unless a different level of
advice is configured using the madv.so.1
shared object.
Advice is only applied to mmap
regions explicitly created
by the user program. Regions established by the run-time linker or by system
libraries that make direct system calls are not affected.
madv.so.1 Usage Examples
The following examples illustrate specific aspects of the madv.so.1
shared object.
This configuration applies advice to all ISM segments for applications
with exec names that begin with foo
.
$ LD_PRELOAD=$LD_PRELOAD:madv.so.1
$ MADVCFGFILE=madvcfg
$ export LD_PRELOAD MADVCFGFILE
$ cat $MADVCFGFILE
foo*:ism=access_lwp
This configuration sets advice for all applications with the exception
of ls
.
$ LD_PRELOAD=$LD_PRELOAD:madv.so.1
$ MADV=access_many
$ MADVCFGFILE=madvcfg
$ export LD_PRELOAD MADV MADVCFGFILE
$ cat $MADVCFGFILE
ls:
Because the configuration specified in MADVCFGFILE
takes
precedence over the value set in MADV
, specifying *
as
the exec-name of the last configuration entry is
equivalent to setting MADV
. This example is equivalent to the
previous example.
$ LD_PRELOAD=$LD_PRELOAD:madv.so.1
$ MADVCFGFILE=madvcfg
$ export LD_PRELOAD MADVCFGFILE
$ cat $MADVCFGFILE
ls:
*:madv=access_many
This configuration applies one type of advice for mmap
regions
and different advice for heap and shared memory regions for applications whose exec
names begin with foo
.
$ LD_PRELOAD=$LD_PRELOAD:madv.so.1
$ MADVCFGFILE=madvcfg
$ export LD_PRELOAD MADVCFGFILE
$ cat $MADVCFGFILE
foo*:madv=access_many,heap=sequential,shm=access_lwp
1.7.4. Using meminfo
The meminfo
function gives the calling process
information about the virtual memory and physical memory that the system has
allocated to that process.
#include <sys/types.h>
#include <sys/mman.h>
int meminfo(const uint64_t inaddr[], int addr_count,
const uint_t info_req[], int info_count, uint64_t outdata[],
uint_t validity[]);
The meminfo
function can return the following types
of information:
MEMINFO_VPHYSICAL
-
The physical memory address corresponding to the given virtual address
MEMINFO_VLGRP
-
The lgroup to which the physical page corresponding to the given virtual address belongs
MEMINFO_VPAGESIZE
-
The size of the physical page corresponding to the given virtual address
MEMINFO_VREPLCNT
-
The number of replicated physical pages that correspond to the given virtual address
MEMINFO_VREPL|n
-
The nth physical replica of the given virtual address
MEMINFO_VREPL_LGRP|n
-
The lgroup to which the nth physical replica of the given virtual address belongs
MEMINFO_PLGRP
-
The lgroup to which the given physical address belongs
The meminfo
function takes the following parameters:
inaddr
-
An array of input addresses.
addr_count
-
The number of addresses that are passed to
meminfo
. info_req
-
An array that lists the types of information that are being requested.
info_count
-
The number of pieces of information that are requested for each address in the
inaddr
array. outdata
-
An array where the
meminfo
function places the results. The array's size is equal to the product of the values of theinfo_req
andaddr_count
parameters. validity
-
An array of size equal to the value of the
addr_count
parameter. Thevalidity
array contains bitwise result codes. The 0th bit of the result code evaluates the validity of the corresponding input address. Each successive bit in the result code evaluates the validity of the response to the members of theinfo_req
array in turn.
The meminfo
function returns EFAULT
when
the area of memory to which the outdata
or validity
arrays point cannot be written to. The meminfo
function
returns EFAULT
when the area of memory to which the info_req
or inaddr
arrays point cannot
be read from. The meminfo
function returns EINVAL
when the value of info_count
exceeds 31
or is less than 1. The meminfo
function returns EINVAL
when the value of addr_count
is less than
zero.
void
print_info(void **addrvec, int how_many)
{
static const int info[] = {
MEMINFO_VPHYSICAL,
MEMINFO_VPAGESIZE};
uint64_t * inaddr = alloca(sizeof(uint64_t) * how_many);
uint64_t * outdata = alloca(sizeof(uint64_t) * how_many * 2;
uint_t * validity = alloca(sizeof(uint_t) * how_many);
int i;
for (i = 0; i < how_many; i++)
inaddr[i] = (uint64_t *)addr[i];
if (meminfo(inaddr, how_many, info,
sizeof (info)/ sizeof(info[0]),
outdata, validity) < 0)
...
for (i = 0; i < how_many; i++) {
if (validity[i] & 1 == 0)
printf("address 0x%llx not part of address
space\n",
inaddr[i]);
else if (validity[i] & 2 == 0)
printf("address 0x%llx has no physical page
associated with it\n",
inaddr[i]);
else {
char buff[80];
if (validity[i] & 4 == 0)
strcpy(buff, "<Unknown>");
else
sprintf(buff, "%lld", outdata[i * 2 +
1]);
printf("address 0x%llx is backed by physical
page 0x%llx of size %s\n",
inaddr[i], outdata[i * 2], buff);
}
}
}
1.7.5. Locality Group Affinity
The kernel assigns a thread to a locality group when the lightweight process (LWP) for that thread is created. That lgroup is called the thread's home lgroup. The kernel runs the thread on the CPUs in the thread's home lgroup and allocates memory from that lgroup whenever possible. If resources from the home lgroup are unavailable, the kernel allocates resources from other lgroups. When a thread has affinity for more than one lgroup, the operating system allocates resources from lgroups chosen in order of affinity strength. Lgroups can have one of three distinct affinity levels:
-
LGRP_AFF_STRONG
indicates strong affinity. If this lgroup is the thread's home lgroup, the operating system avoids rehoming the thread to another lgroup if possible. Events such as dynamic reconfiguration, processor, offlining, processor binding, and processor set binding and manipulation might still result in thread rehoming. -
LGRP_AFF_WEAK
indicates weak affinity. If this lgroup is the thread's home lgroup, the operating system rehomes the thread if necessary for load balancing purposes. -
LGRP_AFF_NONE
indicates no affinity. If a thread has no affinity to any lgroup, the operating system assigns a home lgroup to the thread .
The operating system uses lgroup affinities as advice when allocating resources for a given thread. The advice is factored in with the other system constraints. Processor binding and processor sets do not change lgroup affinities, but might restrict the lgroups on which a thread can run.
Using lgrp_affinity_get
The lgrp_affinity_get(3LGRP) function returns the affinity that a LWP has for a given lgroup.
#include <sys/lgrp_user.h>
lgrp_affinity_t lgrp_affinity_get(idtype_t idtype, id_t id, lgrp_id_t lgrp);
The idtype
and id
arguments
specify the LWP that the lgrp_affinity_get
function examines.
If the value of idtype
is P_PID
,
the lgrp_affinity_get
function gets the lgroup affinity
for one of the LWPs in the process whose process ID matches the value of the id
argument. If the value of idtype
is P_LWPID
, the lgrp_affinity_get
function gets
the lgroup affinity for the LWP of the current process whose LWP ID matches
the value of the id
argument. If the value of idtype
is P_MYID
, the lgrp_affinity_get
function
gets the lgroup affinity for the current LWP.
The lgrp_affinity_get
function returns EINVAL
when the given lgroup or ID type is not valid. The lgrp_affinity_get
function returns EPERM
when the effective
user of the calling process is not the superuser and the ID of the calling
process does not match the real or effective user ID of one of the LWPs. The lgrp_affinity_get
function returns ESRCH
when
a given lgroup or LWP is not found.
Using lgrp_affinity_set
The lgrp_affinity_set(3LGRP) function sets the affinity that a LWP or set of LWPs have for a given lgroup.
#include <sys/lgrp_user.h>
int lgrp_affinity_set(idtype_t idtype, id_t id, lgrp_id_t lgrp,
lgrp_affinity_t affinity);
The idtype
and id
arguments
specify the LWP or set of LWPs the lgrp_affinity_set
function
examines. If the value of idtype
is P_PID
,
the lgrp_affinity_set
function sets the lgroup affinity
for all of the LWPs in the process whose process ID matches the value of the id
argument to the affinity level specified in the affinity
argument. If the value of idtype
is P_LWPID
, the lgrp_affinity_set
function sets
the lgroup affinity for the LWP of the current process whose LWP ID matches
the value of the id
argument to the affinity level
specified in the affinity
argument. If the value of idtype
is P_MYID
, the lgrp_affinity_set
function
sets the lgroup affinity for the current LWP or process to the affinity level
specified in the affinity
argument.
The lgrp_affinity_set
function returns EINVAL
when the given lgroup, affinity, or ID type is not valid. The lgrp_affinity_set
function returns EPERM
when
the effective user of the calling process is not the superuser and the ID
of the calling process does not match the real or effective user ID of one
of the LWPs. The lgrp_affinity_set
function returns ESRCH
when a given lgroup or LWP is not found.
1.8. Examples of API Usage
This section contains code for example tasks that use the APIs that are described in this chapter.
The following code sample moves the memory in the address range between addr
and addr
+len
near
the next thread to touch that range.
#include <stdio.h>
#include <sys/mman.h>
#include <sys/types.h>
/*
* Move memory to thread
*/
void
mem_to_thread(caddr_t addr, size_t len)
{
if (madvise(addr, len, MADV_ACCESS_LWP) < 0)
perror("madvise");
}
This sample code uses the meminfo
function to determine
the lgroup of the physical memory backing the virtual page at the given address.
The sample code then sets a strong affinity for that lgroup in an attempt
to move the current thread near that memory.
#include <stdio.h>
#include <sys/lgrp_user.h>
#include <sys/mman.h>
#include <sys/types.h>
/*
* Move a thread to memory
*/
int
thread_to_memory(caddr_t va)
{
uint64_t addr;
ulong_t count;
lgrp_id_t home;
uint64_t lgrp;
uint_t request;
uint_t valid;
addr = (uint64_t)va;
count = 1;
request = MEMINFO_VLGRP;
if (meminfo(&addr, 1, &request, 1, &lgrp, &valid) != 0) {
perror("meminfo");
return (1);
}
if (lgrp_affinity_set(P_LWPID, P_MYID, lgrp, LGRP_AFF_STRONG) != 0) {
perror("lgrp_affinity_set");
return (2);
}
home = lgrp_home(P_LWPID, P_MYID);
if (home == -1) {
perror ("lgrp_home");
return (3);
}
if (home != lgrp)
return (-1);
return (0);
}
The following sample code walks through and prints out the lgroup hierarchy.
#include <stdio.h>
#include <stdlib.h>
#include <sys/lgrp_user.h>
#include <sys/types.h>
/*
* Walk and print lgroup hierarchy from given lgroup
* through all its descendants
*/
int
lgrp_walk(lgrp_cookie_t cookie, lgrp_id_t lgrp, lgrp_content_t content)
{
lgrp_affinity_t aff;
lgrp_id_t *children;
processorid_t *cpuids;
int i;
int ncpus;
int nchildren;
int nparents;
lgrp_id_t *parents;
lgrp_mem_size_t size;
/*
* Print given lgroup, caller's affinity for lgroup,
* and desired content specified
*/
printf("LGROUP #%d:\n", lgrp);
aff = lgrp_affinity_get(P_LWPID, P_MYID, lgrp);
if (aff == -1)
perror ("lgrp_affinity_get");
printf("\tAFFINITY: %d\n", aff);
printf("CONTENT %d:\n", content);
/*
* Get CPUs
*/
ncpus = lgrp_cpus(cookie, lgrp, NULL, 0, content);
printf("\t%d CPUS: ", ncpus);
if (ncpus == -1) {
perror("lgrp_cpus");
return (-1);
} else if (ncpus > 0) {
cpuids = malloc(ncpus * sizeof (processorid_t));
ncpus = lgrp_cpus(cookie, lgrp, cpuids, ncpus, content);
if (ncpus == -1) {
free(cpuids);
perror("lgrp_cpus");
return (-1);
}
for (i = 0; i < ncpus; i++)
printf("%d ", cpuids[i]);
free(cpuids);
}
printf("\n");
/*
* Get memory size
*/
printf("\tMEMORY: ");
size = lgrp_mem_size(cookie, lgrp, LGRP_MEM_SZ_INSTALLED, content);
if (size == -1) {
perror("lgrp_mem_size");
return (-1);
}
printf("installed bytes 0x%llx, ", size);
size = lgrp_mem_size(cookie, lgrp, LGRP_MEM_SZ_FREE, content);
if (size == -1) {
perror("lgrp_mem_size");
return (-1);
}
printf("free bytes 0x%llx\n", size);
/*
* Get parents
*/
nparents = lgrp_parents(cookie, lgrp, NULL, 0);
printf("\t%d PARENTS: ", nparents);
if (nparents == -1) {
perror("lgrp_parents");
return (-1);
} else if (nparents > 0) {
parents = malloc(nparents * sizeof (lgrp_id_t));
nparents = lgrp_parents(cookie, lgrp, parents, nparents);
if (nparents == -1) {
free(parents);
perror("lgrp_parents");
return (-1);
}
for (i = 0; i < nparents; i++)
printf("%d ", parents[i]);
free(parents);
}
printf("\n");
/*
* Get children
*/
nchildren = lgrp_children(cookie, lgrp, NULL, 0);
printf("\t%d CHILDREN: ", nchildren);
if (nchildren == -1) {
perror("lgrp_children");
return (-1);
} else if (nchildren > 0) {
children = malloc(nchildren * sizeof (lgrp_id_t));
nchildren = lgrp_children(cookie, lgrp, children, nchildren);
if (nchildren == -1) {
free(children);
perror("lgrp_children");
return (-1);
}
printf("Children: ");
for (i = 0; i < nchildren; i++)
printf("%d ", children[i]);
printf("\n");
for (i = 0; i < nchildren; i++)
lgrp_walk(cookie, children[i], content);
free(children);
}
printf("\n");
return (0);
}
#include <stdio.h>
#include <stdlib.h>
#include <sys/lgrp_user.h>
#include <sys/types.h>
#define INT_MAX 2147483647
/*
* Find next closest lgroup outside given one with available memory
*/
lgrp_id_t
lgrp_next_nearest(lgrp_cookie_t cookie, lgrp_id_t from)
{
lgrp_id_t closest;
int i;
int latency;
int lowest;
int nparents;
lgrp_id_t *parents;
lgrp_mem_size_t size;
/*
* Get number of parents
*/
nparents = lgrp_parents(cookie, from, NULL, 0);
if (nparents == -1) {
perror("lgrp_parents");
return (LGRP_NONE);
}
/*
* No parents, so current lgroup is next nearest
*/
if (nparents == 0) {
return (from);
}
/*
* Get parents
*/
parents = malloc(nparents * sizeof (lgrp_id_t));
nparents = lgrp_parents(cookie, from, parents, nparents);
if (nparents == -1) {
perror("lgrp_parents");
free(parents);
return (LGRP_NONE);
}
/*
* Find closest parent (ie. the one with lowest latency)
*/
closest = LGRP_NONE;
lowest = INT_MAX;
for (i = 0; i < nparents; i++) {
lgrp_id_t lgrp;
/*
* See whether parent has any free memory
*/
size = lgrp_mem_size(cookie, parents[i], LGRP_MEM_SZ_FREE,
LGRP_CONTENT_ALL);
if (size > 0)
lgrp = parents[i];
else {
if (size == -1)
perror("lgrp_mem_size");
/*
* Find nearest ancestor if parent doesn't
* have any memory
*/
lgrp = lgrp_next_nearest(cookie, parents[i]);
if (lgrp == LGRP_NONE)
continue;
}
/*
* Get latency within parent lgroup
*/
latency = lgrp_latency_cookie(lgrp, lgrp);
if (latency == -1) {
perror("lgrp_latency_cookie");
continue;
}
/*
* Remember lgroup with lowest latency
*/
if (latency < lowest) {
closest = lgrp;
lowest = latency;
}
}
free(parents);
return (closest);
}
/*
* Find lgroup with memory nearest home lgroup of current thread
*/
lgrp_id_t
lgrp_nearest(lgrp_cookie_t cookie)
{
lgrp_id_t home;
longlong_t size;
/*
* Get home lgroup
*/
home = lgrp_home(P_LWPID, P_MYID);
/*
* See whether home lgroup has any memory available in its hierarchy
*/
size = lgrp_mem_size(cookie, home, LGRP_MEM_SZ_FREE,
LGRP_CONTENT_ALL);
if (size == -1)
perror("lgrp_mem_size");
/*
* It does, so return the home lgroup.
*/
if (size > 0)
return (home);
/*
* Otherwise, find next nearest lgroup outside of the home.
*/
return (lgrp_next_nearest(cookie, home));
}
This example code finds the nearest lgroup with free memory to a given thread's home lgroup.
lgrp_id_t
lgrp_nearest(lgrp_cookie_t cookie)
{
lgrp_id_t home;
longlong_t size;
/*
* Get home lgroup
*/
home = lgrp_home();
/*
* See whether home lgroup has any memory available in its hierarchy
*/
if (lgrp_mem_size(cookie, lgrp, LGRP_MEM_SZ_FREE,
LGRP_CONTENT_ALL, &size) == -1)
perror("lgrp_mem_size");
/*
* It does, so return the home lgroup.
*/
if (size > 0)
return (home);
/*
* Otherwise, find next nearest lgroup outside of the home.
*/
return (lgrp_next_nearest(cookie, home));
}