After native UTF-8 support was finally added to Windows API a few years back, it's way overdue for it to be extended to support non-NUL-terminated strings. It's all converted to UNICODE_STRING inside anyway.
-
Modern C++ and other languages provide extensive facilities to efficiently work with strings, especially with substrings, that aren't NUL-terminated.
-
The vast majority of Windows API function calls, e.g. CreateFile, consume their string argument(s) via NUL-terminated strings, forcing the user to allocate a NUL-terminated copy of the string.
-
These API functions convert those parameters into UNICODE_STRINGs before passing it to appropiate NT API(s). UNICODE_STRING is a potentially-owning limited-size wstring_view.
This is unnecessary complex, performance pesimisation, and effectively redundant allocation and data copying.
As various sets of API functions eventually call one common function to initialize UNICODE_STRING, e.g.
RtlInitUnicodeString
or RtlDosPathNameToNtPathName_U
that's used by file APIs, it makes sense to:
Propagate tagged pointer down to those routines, and unpack wstring_view directly to UNICODE_STRING.
#ifdef _WIN64
static inline LPCWSTR PassStringView (const std::wstring_view & s) {
return (LPCWSTR) (0x4000000000000000uLL | ((DWORD_PTR) &s));
}
#else
#define PassStringView(s) std::wstring((s)).c_str()
#endif
std::wstring_view s (...);
CreateFile (PassStringView (s), ...);
The extension of coversion routines is obvious:
VOID RtlInitUnicodeString (PUNICODE_STRING DestinationString, PCWSTR SourceString) {
if (0x4000000000000000uLL & (DWORD_PTR) SourceString) {
auto view = (const std::wstring_view *) ((DWORD_PTR) SourceString & 0x00FFFFFFFFFFFFFFuLL);
/* TBD: check view->Length is in range */
DestinationString->Buffer = view->data ();
DestinationString->Length = sizeof (WCHAR) * view->size ();
DestinationString->MaximumLength = sizeof (WCHAR) * view->size ();
}
else {
/* original code */
}
}
NTSTATUS WINAPI RtlInitUnicodeStringEx (...); // extended in the same way
-
This approach would cement layout and binary representation of
std::wstring_view
into both MSVC and Windows ABI. -
32-bit - Yes, I'm ignoring 32-bit code. There won't be any new 32-bit Windows as Microsoft's people have openly acknowledged on social networks that 32-bit code is being removed from Windows kernel.
-
LA57 - Yes, on current era of x86-64 CPUs, it's possible to encode
USHORT Length
into upper 2 bytes of the pointer, and doing so skipping one indirection. But as of 2023, new server CPUs featuring 5-level paging are expected to appear on the market. Those support 57-bit pointers, leaving only 7 bits (8 in user mode) for pointer tagging. Which is clearly not enough.
LPCWSTR WINAPI CreateFile3 (SIZE_T len, LPCWSTR ptr, ...);
Obviously not feasible, but I wrote sample implementation of a few functions in /tringi/win32-wstring_view repository.
Remark: With CreateFile(2) additional approaches are actually possible, but we'll ignore tham, as they don't apply to other APIs.