UT library Documentation

0.1.0.0

Introduction

The UT library provides an alternative to MFC for object oriented development on Windows, and a cross-platform solution with liberal licensing under the GNU General Public License 2.0, a copy of which is included in the UT library distribution.

The UT library provides many interfaces which are cleaner and require less code to make use of than the C standard library, POSIX, and platform-specific equivalents. The class-based API includes strings, lists, B-trees, threads, dynamically self-resizing buffers, a threading and message passing architecture, mutexes, events, timers, streams, files, paths, pool allocation, resources, and more. That functionality is documented in the Classes tab above, or by clicking here: Classes.

The core functionality provided by including UT.h is documented in the following locations. None of these headers should ever be included. Instead, include UT.h which automatically include these subheaders.

UTMisc.h: Basic types and services

UTDebug.h: Debugging facilities

UTStatus.h: Integrated status value space for success or failure from platform-specific, POSIX, and UT APIs with an efficient provision for attaching extra explantory text for possible use by the caller

UTErrorCodes.h: UT-specific error codes within the integrated value space from UTStatus.h

Even though it isn't automatically included like UTMisc.h or UTDebug.h, there is more useful C-based API functionality provided through the following headers:

UTBufferValues.h: A variety of functions for doing buffer value manipulation: packing values into and unpacking values from a buffer to account for endianness and alignment constraints, and converting strings to integers

UTRegExp.h: A regular expression parser, see Regular Expressions

Why yet another toolkit?

Because I'm not satisfied with what's out there. I did force myself to revisit that decision when, well into the UT library project and starting to implement the UTgui library, I stumbled upon the wxWidgets project. It is established, popular, and feature complete. When I learned of it, I considered a polite inquiry with the authors whether they would be open to accepting a new contributor and suggest inclusion of whatever parts of the UT functionality I'd already implemented that happened to be lacking in wxWidgets. But first I decided to evaluate wxWidgets for myself, the first criteria being efficiency. The first thing I did was port the String_t class performance test to use wxString. For a million iterations, UT took 2.7 seconds. For the same test, wxWidgets took 30.9 seconds. Back to work...

I may never catch up in terms of feature completeness, but this is my hobby and it's fun.

Setup

For information on Windows environment setup, see Windows Environment Setup
For information on MacOS environment setup, see MacOS Environment Setup
Linux environment setup is trivial, just have the binutils-devel, libunwind-devel, and gtk2-devel packages installed.
For information on new project setup, see New Project Setup
To build the library, go into UT/bld, then the appropriate platform subdirectory and either open the project file or, for platforms with no canonical IDE, open a terminal and simply type 'make'.

Extended UT Functionality

The UT library provides only core services, but the overall UT system provides several additional libraries for much-needed cross-platform functionality which could not be described as being core services. The libraries are partitioned to differentiate between the service needed by command line applications, command line applications operating on graphical content, basic GUI applications, and media or gaming applications. The libraries are organized as follows:
UT: core services
UTgfx: graphical manipulation and rendering
UTgui: basic GUI support systems. See MessageLoop_t and Task_t for information on the messaging architecture, then GUI Design for a primer, then Classes.

Entry Point

In Windows applications, the entry point is not main for GUI applications, but is main for command line applications. The UT library fixes that. Use main as the entry point for any application when using this library. A preprocessor macro will cause the application's main to be named app_main after compilation. The real main, WinMain, AfxWinMain, etc on Windows are implemented in the library to set up extra debugging facilities before calling the application's main, AKA app_main. Under Linux, main is the entry point as it should be, but like the Windows UT library, the application's main will actually be called app_main, with the UT library providing a main function which sets up some necessary debugging facilities.

The library-provided entry point from the system sets up several debugging facilities and corrections for functional deficiencies in the system before calling main. These facilities are:

Enabling leak detection if available
Setting the locale to "" such that case-insensitive string comparisons work with non-ASCII letters (typically accented vowels; note that String_t provides case-insensitive comparison functions and use of that class for the generalized string type is encouraged)
If available, providing a hardware exception handler to provide useful diagnostics (function, offset, file, line, call stack) of where any exception, typically a crash, occured
Enforcement that when main exits, all threads must have already quit

Basic Types

The UT library provides typedefs for basic types with clean, sensible names. These types are uint8, int8, uint16, int16, uint32, int32, uint64, int64, and byte. The byte type is synonymous with uint8. C purists may disagree, but the author prefers uint64 over unsigned long long int.

Use of int is appropriate for signed integers or uint for unsigned integers which need only be 16 bits in size, but which can be longer if that would be more efficient on the target CPU. The uint type is provided by the UT library.

These typedefs are documented in the Typedefs section here: UTMisc.h

Strings

The ANSI C specification did not evolve to handle extended character sets very well. UTF8 encoding is ideal (in the author's opinion) in that it achieves the ease and efficiency of coding with the assumption of ASCII, yet supports the full unicode character set. The wchar_t data type is often defined as a 16-bit integer, yet that doesn't cover the full 21 bits of the unicode value space. But defining wchar_t to be a 32-bit integer would be very inefficient in virtually all practical use cases. Therefore, the UT library uses UTF8 internally for strings. Strings are locally converted to or from wchar_t when needed where UT-based code interacts with the system throughout the library.

Explicit string literals in code with non-ASCII contents is generally unnecessary, but it is still worth explaining, if nothing else, for the author to refer back to. "Hi..." with the UTF8 ellipsis (code 0x2026) instead of "..." can be expressed as "Hi\xe2\x80\xa6". The conversion is as follows:

	0x2026             = 0010 0000 0010 0110
	UTF8 prefixes      = 1110     10       10
	0x2026 distributed =     0010   000000   100110
	UTF8 binary        = 11100010 10000000 10100110
	UTF8 bytes         = 0xe2     0x80     0xa6

The UTF8 prefix referred to above is based on the following table.

	Char. number range | UTF-8 octet sequence
	(hexadecimal)      | (binary)
	-------------------+------------------------------------
	00000000-0000007F  | 0xxxxxxx
	00000080-000007FF  | 110xxxxx 10xxxxxx
	00000800-0000FFFF  | 1110xxxx 10xxxxxx 10xxxxxx
	00010000-0010FFFF  | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Unicode characters can also be rendered into a string using the String_t class, as in string.Printf( "Hi%lc", 0x2026 );

UTF8 bytes are best represented as an unsigned integer, but the basic char type of C is a signed character. Therefore, string literals cannot be directly passed to a function which accepts a pointer to UTF8 characters without reinterpret_cast < const utf8* >. This actually turns out to be beneficial, because the signed/unsigned distinction can be used with function overloading to facilitate differential treatment of parameters which are string literals and therefore are guaranteed to continue to exist as compared with an arbitrary UTF8 string which may have been dynamically allocated or be on the stack, and is not guaranteed to exist beyond the call to the function which uses it. This distinction can therefore be exploited to avoid unnecessary copying when the argument is a string literal. This optimization is used extensively in the String_t and Path_t classes. Use of String_t or classes derived from it achieve the benefit of that efficiency "for free" when string literals are used. Therefore, use of const String_t& as the generalized string parameter instead of const char* or worse, char*, is encouraged.

Use of the String_t class over use of the standard C string functions (strcmp, strcat, etc) for general string manipulation is encouraged. Likewise, to avoid buffer overruns with unexpectedly large inputs, use of the String_t functions Printf, PrintfAppend, VPrintf, and VPrintfAppend is encouraged over use of sprintf or vsprintf. String_t provides many other functions for common operations involving strings.

The UT library basic types associated with strings are ascii7, utf8, wchar_t, and stringliteral. The latter type allows code to, through function overloading, optimize cases where a parameter points to a string literal (the continued life of which is guaranteed) over cases where the contents of the parameter could be invalidated after the function returns, as described above.

These typedefs are documented in the Typedefs section here: UTMisc.h
The String_t class is documented here: String_t

Creating a string literal of wchar_t is rarely necessary, but when it is, the syntax is fairly obscure. The syntax is as follows:

	wchar_t* foo = L"foo";

wchar_t should generally be 32-bit, but on platforms for which it is 16-bit and does not cover the entire unicode value space, the following flag will be defined for use in #if of #if !:

WCHAR_IS_16_BIT

Regular Expressions

It is probably a universal experience among software developers that they have written very similar code countless times to extract information from strings, for example, stripping off a common prefix, then extracting an ASCII integer for conversion to an actual integer, then parsing a suffix of extra information in a flexible format. To avoid that tedium, a regular expresson parser is provided, and documented here: UTRegExp.h It is similar, but not identical to, existing regular expression formats. It differs from those existing conventions in order to make regular expressions easier to write. For example, / is used to begin an escape sequence instead of \ because the \ character would itself have to be escaped to satisfy C/C++ string syntax, making writing regular expression format strings error prone.

Resources

A common situation is the need to store resources (version information, icons, bitmaps, dialog layout information, or other "globs" of arbitrary data) as either a part of the executable or in file accompanying the application. Some systems like Windows and MacOS support including those resources as a part of the executable which is loaded on demand from a resource section. Other systems like Linux do not. The UT library provides a resource editor which generates resources in the format understood by the Microsoft Visual C++ resource compiler and a resource converter/compiler for all other target platforms. These resources can be loaded using the Resource_t class. On platforms other than Windows, the generic makefile driver recognizes the TARGET_RES option to indicate that a resource set should be converted to the format used by the target platform and built into the application or shared library.

All applications and libraries must include resources. As a bare minimum, this contains version information. Static libraries do not support resources. Therefore the UT resource mechanism supports building the resources of static libraries into an executable. That way, any project can be built in static library form, which is required to access the memory debugging facilities described below. Even if a trivial application doesn't need version information or resources, it still needs resources. If nothing else, this is to provide a context for importing the static UT library version resource.

The UT resource mechanism also supports localization of strings to the current locale if the application's resources include string translation to the current locale. If the current locale is not supported by the application, the translation defaults to the first locale in the application's supported locales list. The UTgui library provides a dialog box implementation to allow the user to select a supported locale in the event that the current locale is not supported by the application using the LocaleSelector_t class.

The cross-platform resource files generated by the UT resource editor are compatible with the Microsoft resource compiler, but do not match the specific format generated by the Visual C++ resource editor which is included in professional editions, nor do they adhere to Microsoft's methodology for defining and using resources. That being the case, DO NOT edit UT library resources with the Visual C++ resource editor. The UT resource editor should be used exclusively.

Stack Buffers (size and securtiy implications)

Stack buffers are useful for their efficiency, but vulnerable to overflow when the inputs to how the stack buffer is used exceed the anticipated requirements. Moreover, declaring and using a stack buffer of a specific size which may seem reasonable on one platform may cause a stack overflow on other platforms, particularly embedded systems. When a stack buffer is used in the receipt of data from an unknown source, this can also represent a security vulnerablility, opening the door to installing malicious code into the buffer, overflowing the buffer to corrupt the return address, and thereby causing the CPU to "return" to the malicious code. Therefore, the UT library provides constants for the sizes of typical use cases, which can be tailored to the target platform. The UT library also has several classes which can use a stack buffer for efficiency, but if that stack buffer proves not to be large enough for what is demanded of it, the classes will revert to dynamic memory allocation from the heap.

The constants for reasonable stack buffer sizes are:

c_small_string_buf: a small string buffer. On a normal PC operating system, this value would typically be about 400 bytes.

c_max_stack_buf: a maximum sized stack buffer, with the size being in bytes. On a normal PC operating system, this value would typically be about 8 kilobytes. Even though on some systems a stack buffer of substantially larger size could be allocated, this value seems reasonable. If the buffer were to contain much more data, the time needed to allocate from the heap would typically be minimal compared to the time spent operating on the buffer.

c_max_file_input_line_buf: the normally expected maximum size of a line from a text file.

None of those constants should ever be used without a fallback mechanism for reverting to heap allocation unless there is an iron-clad guarantee that the size will NEVER be exceeded. The classes which support stack buffers, reverting to heap allocation when necessary are Buffer_t, String_t, SimpleTypedBuffer_t<T>, and Array_t.

Data Ownership and Memory Leak Avoidance

The vast majority of memory leaks can be avoided by creating a design with clearly defined data ownership: whatever object owns the data is responsible for deleting it. To facilitate self-documenting code, the UT library provides three pseudo-qualifiers, to be used in the same manner as type qualifiers. These qualifiers are takes, gives, and out.

The "takes" pseudo-qualifier is applied to a pointer argument to a function to indicate that the function takes posession of and responsibility for deleting the object or array pointed to by that argument.

The "gives" pseudo-qualifier is applied to a pointer return value from a function to indicate that the caller takes posession of and responsibility for deleting the object or array pointed to by the returned pointer.

The "out" pseudo-qualifier designates a pointer argument to a function which belongs to the caller, but is to receive an output from the function.

While the implementation of clearly defined data ownership and use of these pseudo-qualifiers as aids in designing with data ownership in mind can avoid almost all memory leaks, the reality is that memory leaks do occur. Therefore, as mentioned above, the UT library enables leak detection and leak origin determination. If a memory leak occurs, when the application exits some very nonspecific debug information will be output, but it will merely indicate that a leak occurred; no useful diagnostics about the origin of the memory leak will be generated. The UT library provides a facility to generate useful diagnostics about memory leaks.

Memory Leak or Corruption Diagnostics

If the system detects a memory leak (only Windows does this), or if crashes are occurring which might be caused by memory leak accumulation or memory corruption, to determine the point at which the leaked memory or corrupted memory was allocated, try using the UT_MEMDEBUG version of UT. This is a separate library build from the normal static and shared versions. This type of build will detect damage before or after allocated blocks, and writes to blocks after they have been freed. To take advantage of this facility, simply switch to the DLLDebugMem or LibDebugMem configuration in Windows, dbg_so_memory or dbg_lib_memory BLD type in Linux, or the DebugMem configuration in MacOS.

On some platforms, valgrind seems tempting, but for now, from the valgrind manual, atomic instruction sequences are not properly supported, in the sense that their atomicity is not preserved. This will affect any use of synchronization via memory shared between [threads]. They will appear to work, but fail sporadically. Valgrind may still be useful to track down bugs that are proving to be impossible to find.

Because valgrind checks for leaks at a deeper level than the UT library's memory debugging, it can be used as a "sanity check" to make sure the UT library or non-UT-based code whose allocations aren't intercepted by UT isn't leaking memory. This set of parameters works well:

valgrind --num-callers=20 --leak-check=full --leak-resolution=high --show-reachable=yes --gen-suppressions=yes --suppressions=UT_valgrind.supp bld_dbg_so/EXECUTABLE

The UT_valgrind.supp suppressions file is in UT's bld/Linux directory. This suppresses leaks reported from dlopen and bfd_malloc (both of which are used by the DebugSymbolInfo_t. The suppressions file was generated using:

valgrind --num-callers=20 --leak-check=full --leak-resolution=high --show-reachable=yes --gen-suppressions=yes bld_dbg_so/testlib

Extended Debugging Facilities

The extended debugging facilities are declared in
UTDebug.h.

debug_printf

For printf-style debug output, the debug_printf function can be used. The output will be generated in debug builds, but not in release builds. In release builds, the debug_printf will be removed by the preprocessor. On Windows, output is directed to OutputDebugString. On linux, output is directed to stderr. Because debug_printf is actually a preprocessor macro, debug_printf in its generic form as in debug_printf("i=%d",i) will result in a compilation error in release builds. debug_printf with additional parameters can be used temporarily in debug builds, however to perform release builds without removing debug_printfs, uses of debug_printf with parameters must be replaced with debug_printf1 for one additional parameter, debug_printf2 for two additional parameters and so on.

rel_debug_printf

For printf-style debug output by a mechanism identical to debug_printf but which is output even in release builds, the rel_debug_printf function can be used.

enable_debug_print_to_file

If the generation of debug_printf or rel_debug_printf output is slowing down an application and thereby interfering with testing, it is possible to redirect that output to a file using the enable_debug_print_to_file function.

enable_debug_print_to_stream

If the generation of debug_printf or rel_debug_printf output is slowing down an application and thereby interfering with testing, it is possible to redirect that output to a stream using the enable_debug_print_to_stream function. There are several other debug output routing functions documented in UTDebug.h.

debug_error

debug_error, when executed, acts as if an assert failed, but prints a message and is excluded in release builds.

rel_error

rel_error, when executed, acts as if an assert failed, but prints a message and is included in release builds.

debugger

The debugger function drops into the debugger unconditionally and immediately.

running_in_debugger

The running_in_debugger function returns true if the application is running in a debugger.

suicide

If an error condition which cannot be recovered from is detected and the user should not be allowed to ignore an assert, the suicide function can be used to exit the process without performing any sort of cleanup or memory checking. This should only be done if some sort of feedback had been provided to the user previously as to the nature of the error, unless providing that feedback is impossible in the current context.

Augmented Assert Implementation

assert

On most systems, the diagnostic feedback when an assert fires is limited to an expression, source file name, and line number. This is frequently inadequate, particularly so when an assertion fails in a general utility class or function. Therefore, the UT library overrides the implementation of assert to provide a full call stack from the point of the error, and to mirror that information to an assert log file. The assert log file is particularly useful for being able to provide to an end user a debug build, or even use a debug build as an alpha or beta release, and obtain diagnostic feedback which is actually useful. The replacement implementation also ensures that the thread which asserted is blocked until the assertion failure dialog is dismissed, preventing the assert implementation from, on Windows at least, continuing to "pump" the main event loop and allowing assertions to fire down the call stack with the effect of assert dialogs piling up until the thread runs out of stack space and blows up.

rel_assert

The rel_assert macro can be used to provide an assert which is included even in release builds.

Profiling

On platforms like linux on which gprof is available, simply build the rel_lib_profile or rel_so_profile target and run the program as usual:

	make BLD=rel_so_profile
	bld_rel_so_profile/testlib

When the program exits successfully (returning from main), it will create a file called 'gmon.out' which contains the profile information. That information can be interpreted by gprof.

	gprof bld_rel_so_profile/testlib gmon.out >profile.txt

Microsoft's C++ development suite does support profiling, but not in the free Visual C++ Express 2008 version, only in the Professional and Enterprise editions. To profile an application under Windows, if you have one of the supported editions refer to the Visual C++ documentation.

On MacOS, use the Shark application on one of the release builds. Saturn doesn't work at all, at least not the PPC version on MacOS 10.4. gprof doesn't work either, showing all times as zero.

Error or Success Status Encapulation

The UT library provides the Status_t class which encompasses a success or failure status, with zero or positive values indicating success, and negative values indicating failure. The class also allows extra explanatory text to be attached to the status object and passed to or returned from functions by value with very high efficiency. In failure cases, the Status_t class facilitates an integrated value space encompassing system errors, POSIX errors, or user-defined errors. The contents of a Status_t, including the textual meaning of an error and attached explanatory text, can be rendered into a String_t object using the String_t::Printf function with the "%S" format (a nonstandard but reasonable extension to the normal C standard format strings). When using the "%S" format, a pointer to the Status_t should be passed in the varargs list.

User-defined errors can be declared in a header using a series of:

	#define eERR_sensible_name_for_the_error                         ERROR_CODE(eERR_user_base + N)
	DEFINE_USER_ERROR( eERR_sensible_name_for_the_error, "Sensible description of the error" )

where there is no overlap in N between the user errors which are defined. In order to register the error strings, somewhere in the project the following code should be created:

	#include "UTDefineUserErrorsPre.h"
	#include (project error codes header)
	#include "UTDefineUserErrorsPost.h"

That would typically be done in a source file dedicated to registering the user-defined errors.

Library authors should not use this facility. If the basic UT error codes are found to be lacking, new error codes can be added to the UT library. There is plenty of room for growth, 65535 error codes reserved before user base. Multiple, relatively independent error code sets can be defined in closely related and well integrated bodies of functionality, but there still can be no overlap in the error codes.

System Time

The UT library provides a pair of functions to obtain the system time in microseconds or milliseconds, with the option to obtain a simultaneous snapshot of the local time breakdown (hour, minute, second) through a caller-provided DateTime_t structure. These functions are system_time_us and system_time_ms, respectively.

Import and Export of Symbols

Different platforms have different mechanisms for defining what symbols should be exported or imported. Exporting a symbol could mean exporting a function from a shared library for import by an application using the library, or exporting a function from an application for use by a plug-in which links against the application and uses some of its facilities. To make these mechanisms available to all platforms, the UT library defines the preprocessor macros EXPORT and IMPORT. That is only the beginning though... The header files defining an interface will be used not only by "third parties" linking against the library definining the interface, but also when building the library which should export symbols. Therefore, IMPORT and EXPORT cannot be used directly. Any library which exports symbols must implement a mechanism to allow the compiler to know whether a symbol is to be imported or exported.

The UT library uses the convention that unless otherwise stated, its symbols are to be exported. That way, if an application is linked with static libraries, the symbols will be exported from the application and available to plug-ins which might link against the application. The only time symbols are imported is when an application or library is being built which links against the shared version of the UT library, in which case import is required. It is recommended that all libraries adhere to this convention. This is implemented as follows (from the target header UTLinuxTarget, UTWinTarget.h, etc):

	#if UT_IMPORT
		#define UT_EXPORT	IMPORT
	#else
		#define UT_EXPORT	EXPORT
	#endif

The exact same syntax can be used for any libary, defining for example LIB_EXPORT. The catch is that the place where this occurs must be included from ALL of the library's header files in which LIB_EXPORT is used, but that really isn't a very big catch.

The "hard part" is in using a library and setting the LIB_IMPORT when needed. To reiterate, by default symbols are exported. The only time LIB_IMPORT needs to be defined is when an application, library, or plug-in is linking against a shared library, or when a plug-in is linked against an application. In those cases, LIB_IMPORT should be defined to trigger import of that library or application's symbols. For linux makefiles, it is not necessary to explicitly define -DLIB_IMPORT when using a library.

LIB_EXPORT will always be used in header files. The "flagging" of a function or class for import or export when the header file is processed is carried forward to when the implementation is encountered by the compiler. Therfore, there are a limited set of syntactic constructs for its use. Those constructs are:

	class LIB_EXPORT Foo_t                 (defining a class to be exported)
	struct LIB_EXPORT Foo_t                (defining a struct to be exported)
	extern LIB_EXPORT void FooFunction();  (defining a function to be exported)

If a class is exported or imported, all member functions have the same export or import property.

Conventions

`basictype`	Basic type derived from a built-in C basic type, e.g. uint16, uint64, uint, utf8, stringliteral.
`simple_type_t`	Typedef derived from a basic type (not a struct or class), but not itself so fundamental as to deserve the basictype convention, or enum.
`ePFX_some_enum`	Enumerated value, where PFX is a prefix common to all enum members and which is a truncated form of the enum name. Also used for const variables which act as a group and effectively, if not actually, an enum.
`c_some_constant`	Constant (const keyword, or define).
`global_function()`	Global function (not a class member).
`g_variable_or_object`	Global variable or class or structure instance.
`StructName_t`	Structure type name.
`ClassName_t`	Class type name.
`ClassMemberFunction()`	Class member function.
`m_class_member`	Private class member variable.
`ms_static_member`	Private, static class member variable.
`class_member`	Public class member variable.
`FLAG`	Flag or block delimeter with a major functional impact, e.g. EXPORT, EXTERN_C_BEGIN, or one indended to be used as an #if condition, e.g. POINTER_IS_32_BIT.