C string types are arrays with shape
{ c-string encoding }, where
encoding is an encoding descriptor. The type
c-string is an alias for
{ c-string utf8 }. See
Encoding descriptors for information about encoding descriptors. In
TYPEDEF:,
FUNCTION:,
CALLBACK:, and
STRUCT: definitions, the shorthand syntax
c-string[encoding] can be used to specify the string encoding.
Using C string types triggers automatic conversions:
• | Passing a Factor string to a C function expecting a c-string allocates a byte-array in the Factor heap; the string is then encoded to the requested encoding and a raw pointer is passed to the function. Passing an already encoded byte-array also works and performs no conversion. |
• | Returning a C string from a C function allocates a Factor string in the Factor heap; the memory pointed to by the returned pointer is then decoded with the requested encoding into the Factor string. |
• | Reading c-string slots of STRUCT: or UNION-STRUCT: returns Factor strings. |
Care must be taken if the C function expects a pointer to a string with its length represented by another parameter rather than a null terminator. Passing the result of calling
length on the string object will not suffice. This is because a Factor string of
n characters will not necessarily encode to
n bytes. The correct idiom for C functions which take a string with a length is to first encode the string using
encode, and then pass the resulting byte array together with the length of this byte array.
Sometimes a C function has a parameter type of
void*, and various data types, among them strings, can be passed in. In this case, strings are not automatically converted to aliens, and instead you must call one of these words:
string>alien ( string encoding -- byte-array )
malloc-string ( string encoding -- alien )
The first allocates
byte-arrays, and the latter allocates manually-managed memory which is not moved by the garbage collector and has to be explicitly freed by calling
free. See
Byte arrays and the garbage collector for a discussion of the two approaches.
The C type
char* represents a generic pointer to
char; arguments with this type will expect and return
aliens, and won't perform any implicit string conversion.
A word to read strings from arbitrary addresses:
alien>string ( c-ptr encoding -- string/f )
For example, if a C function returns a
c-string but stipulates that the caller must deallocate the memory afterward, you must define the function as returning
char* and call
(free) yourself.