Integrating UspLib
14 minute read •
It’s finally here! - a new and improved Neatpad which demonstrates the rendering capabilities of UspLib. The purpose of this is tutorial to document the UspLib API, and secondly to mention a few details about how UspLib was integrated into Neatpad’s code. I very much hope that the design of UspLib is good enough that it will others to import it into their own editors and get instant styled-text support!
The image above shows Neatpad’s new Unicode text-rendering engine in action. Five different scripts are being displayed - Devanagari, Tamil, Thai, Arabic and of course Latin. Font-fallback is not currently supported in Neatpad, so to display all of these different scripts a suitable font must be selected. In the example above I used the “Arial Unicode MS” font which weighs in at a hefty 22Mb!
Now, don’t get too exited about this latest version. On the surface it is no different than before - it is only until you load a Unicode file containing lots of complex scripts that you will see where all the work has gone into.
The UspLib API has been documented below. Please let me know if you were successful in integrating this API into your own projects!
To use UspLib, include the single header-file usplib.h
, and link against usplib.lib
. There are no dependencies on the library itself other than the Uniscribe Script Processor DLL (usp10.dll
) which will be present on Windows2000 and above.
UspAllocate
USPDATA * UspAllocate();
UspAllocate
initializes and returns a new USPDATA
object, which must be used for all subsequent UspLib operations.
UspAnalyze
BOOL UspAnalyze (
USPDATA * uspData,
HDC hdc,
WCHAR * wstr,
int wlen,
ATTR * attrRunList, // optional
UINT flags,
USPFONT * uspFont, // optional
SCRIPT_CONTROL * scriptControl,
SCRIPT_STATE * scriptState,
SCRIPT_TABDEF * scriptTabdef, // optional
);
UspAnalyze
takes as input a single USPDATA
object and analyses the the specified paragraph of UTF-16 text, saving the results back in uspData
.
uspData
points to a singleUSPDATA
object which will hold the results of the analysis. This object can be reused from a previous call toUspAnalyze
, which results in less memory allocation overheads.hdc
is a handle to a device-context.wstr
andwlen
together identify the wide-character string to be analyzed.attrRunList
points to an optional array ofATTR
structures. The size of this array is not directly specified in the call toUspAnalyze
. The range of text represented by theATTR::len
field of each array element is assumed to bewlen
- the same length as the string being analyzed.
If attrRunList
is NULL, the string is intialized with a single default attribute spanning the entire range of text, using the default system colours and the font specified by uspFont
.
flags
is a single DWORD variable which should be set to zero.uspFont
points to an optional array ofUSPFONT
structures. Each array element must have been intialized usingUspInitFont
beforehand. IfuspFont
is NULL, the currently selected font in HDC is used instead. This same font must be re-selected into the target device-context when callingUspTextOut
.scriptControl
points to an optionalSCRIPT_CONTROL
structure. See MSDN for details.scriptState
points to an optionalSCRIPT_STATE
structure. See MSDN for details.scriptTabdef
points to an optionalSCRIPT_TABDEF
structure, which defines the tab-stop positions to be used when performing tab-expansion. See MSDN for details.
UspAnalyze
must be used to analyze an entire paragraph of text. The resulting USPDATA
object can be used in subsequent calls to UspTextOut
and UspSnapXtoOffset
.
struct ATTR
{
COLORREF fg; // foreground text colour
COLORREF bg; // background text colour
int len : 16; // length of this run (in WCHARs)
int font : 7; // font-index into the USPFONT table
int sel : 1; // selection flag (yes/no)
int ctrl : 1; // show as an isolated control-character
int eol : 1; // only valid for last character in line, prevents mouse selection
int reserved : 6; // unused
};
All fields of the ATTR
structure must be initialized before use. Any unrequired field should be set to zero. The ATTR::font
field is used as an index into the USPFONT
table. Any font in referenced by ATTR::font
must have initialized using UspInitFont
.
UspInitFont
void UspInitFont (
USPFONT * uspFont,
HDC hdc,
HFONT hFont
);
UspInitFont
must be called once for each font referenced by UspAnalyze
in the attrRunList
array. Several font-related resources are managed by the USPFONT
object, including the Uniscribe SCRIPT_CACHE
object, and the text-metrics for the font.
uspFont
points to a singleUSPFONT
structure.hdc
is a handle to a device-context.hFont
is a handle to the font resource.
The USPFONT
structure is defined below:
struct USPFONT
{
HFONT hFont;
SCRIPT_CACHE scriptCache;
TEXTMETRIC tm;
int yoffset; // height-adjustment when drawing font (set to zero)
};
The yoffset
field is user-defined and specifies the vertical adjustment to be applied to all text using this font. UspInitFont
initially sets this value to zero, however it can be modified after this call. All other structure members are managed by UspInitFont
and should not be modified by the caller.
UspFreeFont
void UspFreeFont (
USPFONT * uspFont
);
UspFreeFont
must be called when the specified USPFONT
resource is no longer required. The font-handle specified in the call to UspInitFont
is released, as well as the SCRIPT_CACHE
object held internally to the structure.
UspApplyAttributes
void UspApplyAttributes (
USPDATA * uspData,
ATTR * attrRunList
);
UspApplyAttributes
can be called at any time to re-apply the style-run attributes for the specified USPDATA
object. Only the colour and selection information is used - all other fields of the attribute-runs (including the font) are ignored.
attrRunList
specifies a new list of style-runs for the text.
The attribute-run list must reference a range of text the same length as the string that was previously analyzed by UspAnalyze
.
UspApplySelection
void UspApplySelection (
USPDATA * uspData,
int selStart,
int selEnd
);
UspApplySelection
performs a similar task to UspApplyAttributes
. However this time only the selection-flags are modified in the USPDATA
object.
selStart
is the starting position in the string where the selection-highlight should begin.selEnd
is the ending position of the selection-highlight.
UspSetSelColor
void UspSetSelColor (
USPDATA * uspData,
COLORREF fg,
COLORREF bg
);
UspSetSelColor
controls the selection-highlight colour to be used when calling UspTextOut
. Any character marked with the ATTR::sel
attribute, or any range of text identified by UspApplySelection
will be drawn using this colour. Note that by default, the Windows selection-highlight colours will be used.
fg
is theCOLORREF
value of the selection foreground (text) colour.bg
is theCOLORREF
value of the selection background colour.
UspTextOut
int UspTextOut (
USPDATA * uspData,
HDC hdc,
int xpos,
int ypos,
int lineHeight,
int lineOffsetY,
RECT * rect
);
UspTextOut
is the counterpart to ScriptStringOut
. It takes as input the USPDATA
object which was previously analyzed, and draws the text to the specified location. Any fonts, colours and selection-highlights are applied to the text as it is drawn.
hdc
is a handle to a device-context.xpos
is the x-coordinate where text-output should begin.ypos
is the y-coordinate where the text-output should begin.lineHeight
specifies the total height, in pixels, that will be occupied by each line. The text background will be filled to this extent. Applications will usually set this value to be the same as (rect.bottom - rect.top
)lineOffsetY
specifies the vertical distance in pixels - relative toypos
- from which the text will be offset. This value is in addition to any y-adjustment specified by theUSPFONT
structures. Can be zero.rect
is the bounding rectangle beyond which clipping will occur. This parameter must be specified and at a minimum should identify the client-area rectangle of the device-context.
It is recommend to “double-buffer” the output of this function as the multi-pass rendering will result in flickering. The alignment-mode, background-mode and device-context colours of the device-context are unspecified on this function’s return.
UspTextOut
will change in the future to support word-wrapping.
UspSnapXToOffset
BOOL UspSnapXToOffset (
USPDATA * uspData,
int xpos,
int * snappedX, // out, optional
int * charPos, // out
BOOL * fRTL // out, optional
);
UspSnapXtoOffset
converts an x-coordinate to the nearest character-offset. In addition it returns the x-coordinate of the selected character.
xpos
specifies the x coordinate.snappedX
points to an integer that receives the adjusted x-coordinate.charPos
points to an integer that receives the character position corresponding toxpos
.fRTL
points to aBOOL
that receives the direction of the item-run corresponding toxpos
. IfTRUE
it indicates a right-to-left run, ifFALSE
it indicates a left-to-right run.
The fRTL
parameter is useful for the case when the text-caret’s shape is modified to reflect the reading-direction of the run of text that corresponds to xpos
.
UspXToOffset
BOOL UspXToOffset (
USPDATA * uspData,
int xpos,
int * charPos, // out
BOOL * trailing, // out
BOOL * fRTL // out, optional
);
UspXToOffset
converts an x-coordinate to a character position.
xpos
specifies the x coordinate.charPos
points to a variable that receives the character position corresponding toxpos
.trailing
points to a variable that receives an indicator whether the position is the leading or trailing edge of the character.fRTL
points to a variable that receives the direction of the item-run corresponding toxpos
. IfTRUE
it indicates a right-to-left run, ifFALSE
it indicates a left-to-right run.
UspOffsetToX
BOOL UspOffsetToX (
USPDATA * uspData,
int offset,
BOOL trailing,
int * xpos // out
);
UspOffsetToX
returns the x-coordinate for the leading or trailing edge of a character position.
offset
specifies the character position in the string.trailing
indicates the edge of the character that corresponds to the x coordinate. IfTRUE
it indicates the trailing edge, ifFALSE
it indicates the leading edge.xpos
points to a variable that receives the corresponding x coordinate for the character-offset.
UspFree
void UspFree(USPDATA * uspData);
UspFree
should be called when the specified USPDATA
object is no longer required.
Changes to Neatpad
It was very straight-forward to import UspLib into Neatpad’s existing codebase. However there were several changes made to key aspects of the TextView library which made this possible. These changes are briefly mentioned below.
- Reference to
NeatTextOut
andNeatTextWidth
have been removed, as have the functions themselves. This means that all previous tutorials that discussed drawing and painting (prior to UspLib) are effectively obsolete. Although the ideas they presented were good, the method in which styled text was drawn (successive calls to ExtTextOut) have been superseded by the UspLib library. - The existing mouse-handling code has been substantially reduced in complexity. The caret hit-testing ideas presented in previous tutorials have again been superseded by the functionality provided by UspLib.
- Font-handling has been moved in part to UspLib.
- Control-character rendering is fully handled by UspLib so all of the related code has been removed from the TextView.
Whilst a large amount of code has removed from the TextView, in reality these areas of functionality have been transferred to UspLib which now handles all aspects of drawing, fonts and mouse hit-testing.
UspLib was designed primarily for use with Neatpad. However this does not mean that it cannot be used for other text-editor projects, or in fact any application that requires the use of complex, styled text. Remember, UspLib is Freeware and can be used in any project!
Problems with paragraphs
With Uniscribe (and therefore UspLib), the basic unit of text is the paragraph. For text-editors such as Neatpad, an entire line can be treated as a paragraph. This concept is important however, as it imposes a restriction on how UspLib should be used. Because whole lines must be analyzed, this effectively means that an entire line of text must be in memory at one time. The consequence of this means that we must impose a “line length” limit on text files that we load. In Neatpad, any line of text beyond a certain length will be truncanted.
I’m quite please about this limitation actually as I wasn’t looking forward to handling arbitrarily long lines. These are just a few of the issues that long-lines present:
- How to apply the Unicode bi-directional algorithm with
ScriptItemize
when the whole line must be in memory? - What would the maximum line-length be anyway? 2Gb. 4Gb?
- How would we represent x-coordinates on a line this long? The x-coordinate would overflow the limits of a 32bit integer.
- How would we count the characters on a line that contained tabs? Tab-expansion could potentially push the line-length beyond 4Gb.
I don’t have any good answers to these questions so I’m happy for the moment to have a simple restriction of something like 65Kb per line. I’d like to hear any thoughts in this area though!
Caching with GetUspData
The big issue with Uniscribe is all the memory that must be allocated in order to display just a single line of text. UspLib hides this complexity behind the USPDATA
object. However the memory overhead that each USPDATA
imposes is quite significant:
16 bytes per glyph.14 bytes per wide-character.32 bytes for each item-run.
For a typical string of UTF-16 text we are looking at an increase of many times that of the original string length. Obviously this is far too much to be creating USPDATA
objects for every line of text in a file. To solve this problem a new TextView member function was written, which manages USPDATA
objects from an internal cache.
struct USP_CACHE
{
USPDATA * uspData; // the UspLib data for this line
ULONG lineno; // which line this refers to
ULONG usage; // usage count for caching purposes
};
class TextView
{
...
// keep an internal cache of USPDATA objects
USP_CACHE m_uspCache[USP_CACHE_SIZE];
};
Whenever a line of text is required by the TextView (for drawing or mouse hit-testing), the GetUspData
function is called. The drawing and mouse-related routines no longer directly access the underlying TextDocument. All data-access is now through this single function.
USPDATA *TextView::GetUspData(HDC hdc, ULONG nLineNo)
{
TCHAR buff[TEXTBUFSIZE];
ATTR attr[TEXTBUFSIZE];
int len;
USPDATA * uspData = << find a cached object >>
// if found a match (an already analyzed line) then return it here!!
if(....)
return uspData;
// otherwise we need to style + analyze a new line
len = m_pTextDoc->getline(nLineNo, buff, TEXTBUFSIZE, &off_chars);
len = ApplyTextAttributes(nLineNo, off_chars, colno, buff, len, attr);
// setup the tabs
int tablist[] = { m_nTabWidthChars };
SCRIPT_TABDEF tabdef = { 1, 0, tablist, 0 };
SCRIPT_CONTROL scriptControl = { 0 };
SCRIPT_STATE scriptState = { 0 };
// generate glyphs etc
UspAnalyze(uspData, hdcTemp, buff, len, attr, 0, m_uspFontList,
&scriptControl, &scriptState, &tabdef);
return uspData;
}
The sample-code above gives the general idea for how GetUspData
works. The caching details are rather boring so I’ve omitted them here (just look in the real sources). The idea behind this method though, is that any time we want a USPDATA
object, GetUspData
will return one ready-analyzed. Most of the time this object will be from the cache, and only occasionally will a line need to be fetched from the TextDocument and analyzed with UspAnalyze
.
Conclusions
The move to Uniscribe defines a turning-point in Neatpad’s development. It has taken alot of effort to get here but the future now looks alot clearer. In many ways I wish I had started this project with Uniscribe right from the beginning - it would have saved alot of work. However Unicode is quite complicated and I think the beginning tutorials would have suffered from this extra complexity. Besides, I think it is good to see the evolution that has occurred since the start of this project, and also the mistakes that I have made along the way.
Overall I’ve found working with Uniscribe to be a very rewarding experience. The API itself is rather complicated but it is very well designed. The main difficulty is coming to terms with the concept of glyph-based rendering. However I do feel that the MSDN documentation for Uniscribe to be rather inadequate in places. For someone who had no prior experience in displaying Unicode text I struggled for quite some time before finally completing this phase of the project.
As a comparison, take a look at the Apple documentation for ATSUI (an equivalent API to Uniscribe but higher-level). The documentation is much clearer in my opinion - it doesn’t just document the ATSUI API but gives guidelines on how it should be used on the Apple system.
Coming up in Part 16
There are still some minor “todo’s” with UspLib which I haven’t quite managed to finish. The issue of CRLF sequences at the end of a line of text needs addressing for bi-directional texts. Sometimes the CRLF will not be on the far-right of a line - for RTL texts it can be on the left-side, or even in the middle of the line! The other issue is properly displaying the file with full right-to-left alignment, with the scrollbar positioned on the left.
The next tutorial will look at adding keyboard support to the TextView. We will focus only on caret-movement with the keyboard, as actual text-entry must wait until the TextDocument can actually edit text. The caret-movement code will be using Uniscribe’s ScriptBreak
routine, which will probably result in a couple more UspLib functions being added.
Beyond this I will probably tackle syntax highlighting, and once the GUI is completely finished I will finally move onto file-editing. The end is getting alot closer now I feel!