Home Software Win32 Tutorials Sourcecode and Snippets Links to other sites About Catch22

Design and Implementation of a Win32 Text Editor

Part 15 - Integrating UspLib

Download full source and demo (part 15, 155Kb)

Part 11 Part 12

Introduction

It's finally here! - a new and improved Neatpad which demonstrates the rendering capabilities of UspLib. The purpose of this is tutorial to document the UspLib API, and secondly to mention a few details about how UspLib was integrated into Neatpad's code. I very much hope that the design of UspLib is good enough that it will others to import it into their own editors and get instant styled-text support!

The image above shows Neatpad's new Unicode text-rendering engine in action. Five different scripts are being displayed - Devanagari, Tamil, Thai, Arabic and of course Latin. Font-fallback is not currently supported in Neatpad, so to display all of these different scripts a suitable font must be selected. In the example above I used the "Arial Unicode MS" font which weighs in at a hefty 22Mb!

Now, don't get too exited about this latest version. On the surface it is no different than before - it is only until you load a Unicode file containing lots of complex scripts that you will see where all the work has gone into.

The UspLib API has been documented below. Please let me know if you were successful in integrating this API into your own projects!


To use UspLib, include the single header-file usplib.h, and link against usplib.lib. There are no dependencies on the library itself other than the Uniscribe Script Processor DLL (usp10.dll) which will be present on Windows2000 and above.

UspAllocate

USPDATA * UspAllocate();

UspAllocate initializes and returns a new USPDATA object, which must be used for all subsequent UspLib operations.

UspAnalyze

BOOL UspAnalyze (
  USPDATA         * uspData,
HDC hdc,
WCHAR * wstr,
int wlen,
ATTR * attrRunList, // optional
UINT flags, USPFONT * uspFont, // optional SCRIPT_CONTROL * scriptControl,
SCRIPT_STATE * scriptState,
SCRIPT_TABDEF * scriptTabdef, // optional
);

UspAnalyze takes as input a single USPDATA object and analyses the the specified paragraph of UTF-16 text, saving the results back in uspData.

UspAnalyze must be used to analyze an entire paragraph of text. The resulting USPDATA object can be used in subsequent calls to UspTextOut and UspSnapXtoOffset.

struct ATTR
{
   COLORREF   fg;         // foreground text colour
   COLORREF   bg;         // background text colour

   int  len      : 16;    // length of this run (in WCHARs)
int font : 7; // font-index into the USPFONT table
int sel : 1; // selection flag (yes/no)
int ctrl : 1; // show as an isolated control-character int eol : 1; // only valid for last character in line, prevents mouse selection int reserved : 6; // unused };

All fields of the ATTR structure must be initialized before use. Any unrequired field should be set to zero. The ATTR::font field is used as an index into the USPFONT table. Any font in referenced by ATTR::font must have initialized using UspInitFont.

UspInitFont

void UspInitFont (
  USPFONT  * uspFont,		
HDC hdc, HFONT hFont );

UspInitFont must be called once for each font referenced by UspAnalyze in the attrRunList array. Several font-related resources are managed by the USPFONT object, including the Uniscribe SCRIPT_CACHE object, and the text-metrics for the font.

The USPFONT structure is defined below:

struct USPFONT
{
  HFONT         hFont;        
  SCRIPT_CACHE  scriptCache;  
  TEXTMETRIC    tm;           
  int           yoffset;      // height-adjustment when drawing font (set to zero)
};

The yoffset field is user-defined and specifies the vertical adjustment to be applied to all text using this font. UspInitFont initially sets this value to zero, however it can be modified after this call. All other structure members are managed by UspInitFont and should not be modified by the caller.

UspFreeFont

void UspFreeFont (
  USPFONT  * uspFont
);

UspFreeFont must be called when the specified USPFONT resource is no longer required. The font-handle specified in the call to UspInitFont is released, as well as the SCRIPT_CACHE object held internally to the structure.

UspApplyAttributes

void UspApplyAttributes (
  USPDATA  * uspData, 
ATTR * attrRunList
);

UspApplyAttributes can be called at any time to re-apply the style-run attributes for the specified USPDATA object. Only the colour and selection information is used - all other fields of the attribute-runs (including the font) are ignored.

The attribute-run list must reference a range of text the same length as the string that was previously analyzed by UspAnalyze.

UspApplySelection

void UspApplySelection (
  USPDATA  * uspData, 
int selStart,
int selEnd
);

UspApplySelection performs a similar task to UspApplyAttributes. However this time only the selection-flags are modified in the USPDATA object.

UspSetSelColor

void UspSetSelColor (
  USPDATA   * uspData,
  COLORREF    fg,
  COLORREF    bg
);

UspSetSelColor controls the selection-highlight colour to be used when calling UspTextOut. Any character marked with the ATTR::sel attribute, or any range of text identified by UspApplySelection will be drawn using this colour. Note that by default, the Windows selection-highlight colours will be used.

UspTextOut

int UspTextOut (
  USPDATA  *  uspData,
HDC hdc,
int xpos,
int ypos, int lineHeight, int lineOffsetY,
RECT * rect
);

UspTextOut is the counterpart to ScriptStringOut. It takes as input the USPDATA object which was previously analyzed, and draws the text to the specified location. Any fonts, colours and selection-highlights are applied to the text as it is drawn.

It is recommend to "double-buffer" the output of this function as the multi-pass rendering will result in flickering. The alignment-mode, background-mode and device-context colours of the device-context are unspecified on this function's return.

UspTextOut will change in the future to support word-wrapping.

UspSnapXToOffset

BOOL UspSnapXToOffset (
  USPDATA    * uspData,		
int xpos,
int * snappedX, // out, optional
int * charPos, // out
BOOL * fRTL // out, optional
);

UspSnapXtoOffset converts an x-coordinate to the nearest character-offset. In addition it returns the x-coordinate of the selected character.

The fRTL parameter is useful for the case when the text-caret's shape is modified to reflect the reading-direction of the run of text that corresponds to xpos.

UspXToOffset

BOOL UspXToOffset (
  USPDATA    * uspData,		
int xpos,
int * charPos, // out
BOOL * trailing, // out BOOL * fRTL // out, optional
);

UspXToOffset converts an x-coordinate to a character position.

UspOffsetToX

BOOL UspOffsetToX (
  USPDATA    * uspData,		
int offset,
BOOL trailing,
int * xpos // out
);

UspOffsetToX returns the x-coordinate for the leading or trailing edge of a character position.

UspFree

void UspFree(USPDATA * uspData);

UspFree should be called when the specified USPDATA object is no longer required.


Changes to Neatpad

It was very straight-forward to import UspLib into Neatpad's existing codebase. However there were several changes made to key aspects of the TextView library which made this possible. These changes are briefly mentioned below.

Whilst a large amount of code has removed from the TextView, in reality these areas of functionality have been transferred to UspLib which now handles all aspects of drawing, fonts and mouse hit-testing.

UspLib was designed primarily for use with Neatpad. However this does not mean that it cannot be used for other text-editor projects, or in fact any application that requires the use of complex, styled text. Remember, UspLib is Freeware and can be used in any project!

Problems with paragraphs

With Uniscribe (and therefore UspLib), the basic unit of text is the paragraph. For text-editors such as Neatpad, an entire line can be treated as a paragraph. This concept is important however, as it imposes a restriction on how UspLib should be used. Because whole lines must be analyzed, this effectively means that an entire line of text must be in memory at one time. The consequence of this means that we must impose a "line length" limit on text files that we load. In Neatpad, any line of text beyond a certain length will be truncanted.

I'm quite please about this limitation actually as I wasn't looking forward to handling arbitrarily long lines. These are just a few of the issues that long-lines present:

I don't have any good answers to these questions so I'm happy for the moment to have a simple restriction of something like 65Kb per line. I'd like to hear any thoughts in this area though!

Caching with GetUspData

The big issue with Uniscribe is all the memory that must be allocated in order to display just a single line of text. UspLib hides this complexity behind the USPDATA object. However the memory overhead that each USPDATA imposes is quite significant:

16 bytes per glyph.
14 bytes per wide-character.
32 bytes for each item-run.

For a typical string of UTF-16 text we are looking at an increase of many times that of the original string length. Obviously this is far too much to be creating USPDATA objects for every line of text in a file. To solve this problem a new TextView member function was written, which manages USPDATA objects from an internal cache.

struct USP_CACHE
{
   USPDATA * uspData;      // the UspLib data for this line
   ULONG     lineno;       // which line this refers to
   ULONG     usage;        // usage count for caching purposes
};

class TextView
{
   ...

   // keep an internal cache of USPDATA objects
   USP_CACHE m_uspCache[USP_CACHE_SIZE];
};

Whenever a line of text is required by the TextView (for drawing or mouse hit-testing), the GetUspData function is called. The drawing and mouse-related routines no longer directly access the underlying TextDocument. All data-access is now through this single function.

USPDATA *TextView::GetUspData(HDC hdc, ULONG nLineNo)
{
    TCHAR     buff[TEXTBUFSIZE];
ATTR attr[TEXTBUFSIZE]; int len; USPDATA * uspData = << find a cached object >> // if found a match (an already analyzed line) then return it here!! if(....) return uspData; // otherwise we need to style + analyze a new line len = m_pTextDoc->getline(nLineNo, buff, TEXTBUFSIZE, &off_chars); len = ApplyTextAttributes(nLineNo, off_chars, colno, buff, len, attr);
// setup the tabs int tablist[] = { m_nTabWidthChars }; SCRIPT_TABDEF tabdef = { 1, 0, tablist, 0 }; SCRIPT_CONTROL scriptControl = { 0 };
SCRIPT_STATE scriptState = { 0 }; // generate glyphs etc UspAnalyze(uspData, hdcTemp, buff, len, attr, 0, m_uspFontList, &scriptControl, &scriptState, &tabdef); return uspData; }

The sample-code above gives the general idea for how GetUspData works. The caching details are rather boring so I've omitted them here (just look in the real sources). The idea behind this method though, is that any time we want a USPDATA object, GetUspData will return one ready-analyzed. Most of the time this object will be from the cache, and only occasionally will a line need to be fetched from the TextDocument and analyzed with UspAnalyze.

Conclusions

The move to Uniscribe defines a turning-point in Neatpad's development. It has taken alot of effort to get here but the future now looks alot clearer. In many ways I wish I had started this project with Uniscribe right from the beginning - it would have saved alot of work. However Unicode is quite complicated and I think the beginning tutorials would have suffered from this extra complexity. Besides, I think it is good to see the evolution that has occurred since the start of this project, and also the mistakes that I have made along the way.

Overall I've found working with Uniscribe to be a very rewarding experience. The API itself is rather complicated but it is very well designed. The main difficulty is coming to terms with the concept of glyph-based rendering. However I do feel that the MSDN documentation for Uniscribe to be rather inadequate in places. For someone who had no prior experience in displaying Unicode text I struggled for quite some time before finally completing this phase of the project.

As a comparison, take a look at the Apple documentation for ATSUI (an equivalent API to Uniscribe but higher-level). The documentation is much clearer in my opinion - it doesn't just document the ATSUI API but gives guidelines on how it should be used on the Apple system.

Coming up in Part 16

There are still some minor "todo's" with UspLib which I haven't quite managed to finish. The issue of CRLF sequences at the end of a line of text needs addressing for bi-directional texts. Sometimes the CRLF will not be on the far-right of a line - for RTL texts it can be on the left-side, or even in the middle of the line! The other issue is properly displaying the file with full right-to-left alignment, with the scrollbar positioned on the left.

The next tutorial will look at adding keyboard support to the TextView. We will focus only on caret-movement with the keyboard, as actual text-entry must wait until the TextDocument can actually edit text. The caret-movement code will be using Uniscribe's ScriptBreak routine, which will probably result in a couple more UspLib functions being added.

Beyond this I will probably tackle syntax highlighting, and once the GUI is completely finished I will finally move onto file-editing. The end is getting alot closer now I feel!


Please send any comments or suggestions to:

Last modified: 01 August 2008 12:48:46