The source
zip file contains the needed classes and
file:
-
RTF2HTMLConverter.h/cpp, which is
the main
converter class
CRTF_HTMLConverter
-
RTF2HTMLTree.h/cpp, which contains
Alexander Kovachev's template tree
class
-
Util.h/cpp, some very simple helper
routines
-
RTF2HTML.h/cpp, a console-based
converter demo app
The
converter
class itself does no reading or writing
from/to files or
RichEdit controls; this
has to be done outside. (For
example,
to learn how to stream the complete
RTF
content from/into a
RichEdit controls, just look
here in the same section.) The class is
derived only from CObject, and works
with CString >>/<< streaming functions.
When streaming in, the data is
converted.
Note: Only the
RTF->HTML
direction is supported at the
moment. There is also a very small
subset of possible
RTF
supported, at this time.
-
Bold, Italic, Underline
-
Font Size, Color, and Face
-
Paragraph alignment
-
Special characters, such as encoded
German Umlauts
I hope
the class is easy to extend (for new
tags, mostly ::R2H_InterpretTag has to
be modified) and any suggestions or
extensions are very welcome; I'll post
them here. But please don't give me "My
RTF file
isn't correctly exported" comments; I
mentioned it is only a demo and only a
few tags are currently supported. I've
made my RTF
file using the WordPad editor
shipped with Windows; MS Word builds a
more complex
RTF structure. For complete
RTF
documentation, see MSDN ("RTF
Specification").
An
RTF file
stores text data in a structured
way, together with formatting tags
(slightly similar to
HTML).
Let's have the following example:
--
TEST BIG SMALL AGAIN
BROWN
BLUE
AND
ANOTHER fONT.
This is a left-aligned paragraph
right-aligned one
centered one
--
It is represented in RTF as the
following:
{\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fswiss\
fcharset0 Arial;}{\f1\fmodern\fprq1\fcharset0 Courier New;}
{\f2\fswiss\fprq2\fcharset0 Arial;}
{\f3\fnil\fcharset2 Symbol;}}
{\colortbl ;\red128\green0\blue0;\red0\green0\blue255;}
\viewkind4\uc1\pard\f0\fs24 TEST \fs40 BIG \fs24 SMALL AGAIN\b
\cf1 BROWN \cf2 BLUE \cf0\b0 AND \f1 ANOTHER fONT.\par
\par
\par
\f2 This is a left-aligned paragraph\par
\pard\qr right-aligned one\par
\pard\qc centered one\par
}
ConvertRTF2HTML is the main
converting
procedure. It performs the following
steps:
-
R2H_BuildTree
As you see,
RTF has a nested
structure, where each section is in
braces {}. So, our first step is to
build a tree structure :
+RTF1
+COLORTBL
+FONTTBL
+F0
+F1
+F2
+F3
Here, I've just noted the section's
first attribute (section name). Each
section then contains more code;
both plain text (RTF1 is the main
section with the main text) and
attributes.
-
R2H_SetMetaData
Sub-Items such as colortbl and
fonttbl are helper tables and in the
main text's
RTF tags there are
references to it, so these global
attributes have to be scanned and
stored.
-
R2H_CreateHTMLElements
Loop thru
RTF1 main text and add
HTML
elements.
HTML elements could be
either:
-
Plain Text—Is added like it is
-
RTF
tags starting with a \. These
have to be converted to the
correspondig
HTML
tags with R2H_InterpretTag.
Sometimes, there must be
look-ups in global tables (e.g.
color or font table), or
previously inserted elements
must be scanned or modified.
-
R2H_GetHTMLHeader—Write
HTML header in target
HTML
-
R2H_GetHTMLElements—Dump added
HTML
elements in target
HTML
-
R2H_GetHTMLFooter—Write
HTML
footer in target
HTML
Ready!