What is alignment?
Let's say we have two sticks, one with tick marks every 4mm, and one with tick marks every 8mm. When we put these two sticks next to each-other we find all the places they match up.
1 2 3
4 5
6 7
8
012345678901234567890123456789012345678901234567890123456789012345678901234567890
: | : | : | : | : | : | :
| : | : | : | :
: : : : :
: : : :
: :
Our two stick match at every 8mm. That's pretty simple and strait
forward, so let's try something evil. Again with two sticks, one with marks every 3mm, and one with marks every 7mm.
1 2 3
4 5
6 7
8
012345678901234567890123456789012345678901234567890123456789012345678901234567890
: | | | | | | : | | | | | | : |
| | | | | : | | | | |
: | | :
| | : |
| : | |
Our sticks match at every 21mm, or align at 21mm, or have an alignment of
21mm. Now we know what alignment is: the interval at which two or more intervals meet. Which also happens to be mathematically
identical to the Least
Common Multiple.How do you find a common alignment?
In our first example above, it was easy. The second interval was a factor of the first interval, therefore the alignment was the same as the second interval. In the second example we need to find the least common multiple of 3 and 7. Here's how:
constexpr std::size_t
greatest_common_factor( std::size_t a ) noexcept
{
return a;
}
constexpr std::size_t
greatest_common_factor( std::size_t a, std::size_t b ) noexcept
{
return b ? greatest_common_factor( b, a % b ) : a;
}
template< typename ...Args >
constexpr std::size_t
greatest_common_factor( std::size_t a, std::size_t b,
std::size_t c, Args ... args) noexcept
{
return greatest_common_factor( greatest_common_factor( a, b ), c, args... );
}
constexpr std::size_t
least_common_multiple( std::size_t a ) noexcept
{
return a;
}
constexpr std::size_t
least_common_multiple( std::size_t a, std::size_t b ) noexcept
{
return a / greatest_common_factor( a, b ) * b;
}
template< typename ...Args >
constexpr std::size_t
least_common_multiple( std::size_t a, std::size_t b,
std::size_t c, Args ... args) noexcept
{
return least_common_multiple( least_common_multiple( a, b ), c, args... );
}
Where least_common_multiple( 3, 7 ) == 21.
So now that we know how to do arbitrary alignment, let's forget all about it.What is memory alignment?
Memory alignment is in reference to a type of data, an integer, a float, etc., that has an interval at which it should be placed in memory. An integer with an alignment of 2, placed at position 5, is unaligned, its position doesn't match its alignment interval. If it was placed at position 6, or 8, or any multiple of its alignment, it would be aligned.Typically data types, in a system, are all powers of 2. This makes it so all types with weaker (smaller interval) alignment can be positioned near types with stricter (larger interval) alignment easily.
struct stuff
{
std::int32_t a;
std::int16_t b;
std::int16_t c;
};
If every type in stuff has an alignment
equal to its size, then how could stuff
be laid out in memory? What's stuff
alignment?The alignment of1 2 3 4 5 6 7 8
012345678901234567890123456789012345678901234567890123456789012345678901234567890
aaaabbccaaaabbccaaaabbccaaaabbccaaaabbccaaaabbccaaaabbccaaaabbccaaaabbccaaaabbcc
aaaabbccaaaabbccaaaabbccaaaabbccaaaabbccaaaabbccaaaabbccaaaabbccaaaabbcc
stuff is 4. If std::int32_t had an alignment of 7, and std::int16_t had an alignment of 3, what
would the alignment of stuff be?
1 2 3
4 5
6 7
8
012345678901234567890123456789012345678901234567890123456789012345678901234567890
aaaaaaa bbbccc aaaaaaa bbbccc aaaaaaa
bbbccc aaaaaaa bbbccc
aaaaaaa bbbccc aaaaaaa
bbbccc aaaaaaa bbbccc
aaaaaaabbbccc
aaaaaaabbbccc aaaaaaabbbccc
Depending on the padding between a and b, the
origin, for where the alignment starts, changes. The aligment of stuff would be 21, for each version with
different padding, though. Aren't powers of 2 much nicer?How does memory alignment affect you?
As it turns out, keeping you data aligned is important. SSE/AVX data needs to be properly aligned in memory to get the most benefit out of them. Some times unaligned memory access is not allowed; ARMv4, ARMv5, and ARM device memory. I'm not picking on ARM, they're just a more recent example. In general you want all your data to be properly aligned all the time, and to copy any unaligned data into an aligned space before using it.Thankfully the compiler is very good at keeping everything aligned by default. Aggregate types all have corrected offsets for each member to maintain it's alignment, and padding at the end of the aggregate to keep all its members aligned when there is an array of the aggregate. Problems creep in when we use non-basic types that have extended alignment requirements (SSE/AVX), or access memory directly. For instance:
ifstream file( "example.bin", ios::in |
ios::binary );
auto bytes = new char[ 6 ];
file.read( bytes, 6 );
auto small = *reinterpret_cast< std::int16_t* >( bytes );
auto large = *reinterpret_cast< std::int32_t* >( bytes + 2 );
This is rather simple. We have a file that stores two values, one
16bit and one 32bit. Setting small
should always work, because new will
ways return memory aligned for at least all the basic types, or more depending on implementation. The problem comes with trying to set large. This leads to a few cases: - It's aligned, if your system's base alignment is 16bits. (most likely not)
- It's unaligned, and your system's allows unaligned 32bit reads (impacting performance)
- It's unaligned, and your compiler generates appropriate code/instructions (impacting performance)
- It's unaligned, and your program crashes. (severely impacting performance)
template< typename T, std::size_t S =
sizeof( T ) >
void safe_read( T* dst, void* src )
{
static_assert( !std::is_const< T >::value, "T cannot be constant" );
static_assert( std::is_trivial< T >::value, "T must be trivial" );
typedef std::array< char, S > raw_type;
*reinterpret_cast< raw_type* >( dst ) = *reinterpret_cast< raw_type* >( src );
}
template< typename T, std::size_t S = sizeof( T ) >
T safe_read( void* src )
{
T nrvo;
safe_read< T, S >( &nrvo, src );
return nrvo;
}
template< typename T, std::size_t S = sizeof( T ) >
T safe_read( T* src )
{
T nrvo;
safe_read< T, S >( &nrvo, src );
return nrvo;
}
template< typename T, std::size_t S = sizeof( T ) >
void safe_write( T const & src, void* dst )
{
static_assert( std::is_trivial< T >::value, "T must be trivial" );
typedef std::array< char, S > raw_type;
auto& s = *reinterpret_cast< raw_type const* >( &src );
auto& d = *reinterpret_cast< raw_type* >( dst );
d = s;
}
Then just change the assignment of large
to:
auto large = safe_read< std::int32_t >(
bytes + 2 );
I allowed overriding the size, for times when I know the real size of a
padded type. You should always read only the bytes you need, because reading into protected memory will cause a crash. This is only good
for trivial types. If you have a anything other than a trivial type unaligned in memory, your run-time/standard library is broken, or you used
placement new without taking into
account the alignment for the type. In case 3, from above, the compiler is most likely generating code identical to safe_read,
unless your system has special instructions for unaligned reads. If your system does have unaligned reads, the optimizer will replace safe_read
with those instructions, win, win!How do you get aligned memory?
Remember what I said before aboutnew,
here it is again from the standard
(§3.7.4.1-2): The pointer returned shall be suitably aligned so that
it can be converted to a pointer of any complete object type with a fundamental alignment requirement(3.11). (§3.11): A
fundamental alignment is represented by an alignment less than or equal to the greatest alignment supported by the implementation in all
contexts, which is equal to alignof( std::max_align_t ).
If your compiler, like mine, doesn't define std::max_align_t
this works:
#include <type_traits>
namespace std { typedef aligned_storage< 1 >::type max_align_t; }
This tells us something important about memory allocations, they are
always done in chunks of alignof( std::max_align_t ). Therefore any allocation smaller
than the chunk size is wasting memory. This is what makes memory pools so effective, all types of the same size are grouped close together
with no wasted space between them. Less wasted space, means less cache misses, mean better performance, means happy users. Let's see
what a simple solution looks like:
template< typename T >
T* align_pointer_up( T* ptr, std::size_t alignment ) noexcept
{
auto bytes = reinterpret_cast< std::uintptr_t >( ptr );
return reinterpret_cast< T* >(
( bytes + alignment - 1 ) / alignment * alignment );
}
void* get_aligned_memory( std::size_t size, std::size_t alignment )
{
size += sizeof( void* ); // include the origin
auto start = reinterpret_cast< char* >(
::operator new( size + alignment - 1 ) );
auto user = align_pointer_up( start + sizeof( void* ), alignment );
safe_write( start, user - sizeof( void* ) );
return user;
}
void return_aligned_memory( void* ptr ) noexcept
{
::operator delete( safe_read( reinterpret_cast< void** >( ptr ) - 1 ) );
}
This is a good start, and you'll see versions of it a lot on the
web. It over allocates memory with enough space to place the return pointer in the requested alignment, plus space for a pointer back to the
original pointer. This works great, but it could be better.If we know the alignment in both the
get
and return functions, we can avoid
saving the original pointer whenever: alignment ≤ chunk size. This works, because for each chunk, our alignment will appear at least
once. So we just need to move the pointer back to the beginning of the chuck to get the original pointer.
template< typename T >
T* align_pointer_down( T* ptr, std::size_t alignment ) noexcept
{
auto bytes = reinterpret_cast< std::uintptr_t >( ptr );
return reinterpret_cast< T* >( bytes / alignment * alignment );
}
constexpr auto max_align = alignof( std::max_align_t );
template< typename T >
T* get_aligned_buffer( std::size_t count )
{
// include the origin if needed
auto extra_bytes = alignof( T ) <= max_align ? 0 : sizeof( void* );
auto size = sizeof( T ) * count + extra_bytes;
auto start = reinterpret_cast< char* >(
::operator new( size + alignof( T ) - 1 ) );
auto user = align_pointer_up( start + extra_bytes, alignof( T ) );
if(extra_bytes) {
safe_write( start, user - extra_bytes ); }
return reinterpret_cast< T* >( user );
}
template< typename T >
void return_aligned_buffer( T* ptr ) noexcept
{
if ( alignof( T ) <= max_align ) {
::operator delete( align_pointer_down( ptr, max_align ) );
} else {
::operator delete( safe_read( reinterpret_cast< void** >( ptr ) - 1 ) );
}
}
No comments:
Post a Comment