|
|
The first step of parsing a HTML file is to lex it. The lexer splitts up the file in tokens.
In HTML it is simple, there are only 2 tokens: a Tag, or not a Tag.
But the
Leo<HTML>
lexer is a little bit more complex. The compiler
tries prevent the code for reformating.
That means that your coding style won't be touched if it is possible.
Have a look at some of the generated output files, and you will see it.
Therefor the lexer stores additional informations in it's output file.
Original Code:
<p>
<leohtml> is a HTML precompiler for <b>static</b> home page generation.
It detects special HTML tags in your page and generates HTML code out of it.
</p>
|
Example:
dev2:index.lhtml:9:0::
dev2:index.lhtml:10:0:p:<p>
dev2:index.lhtml:11:0::
dev2:index.lhtml:11:1:leohtml:<leohtml>
dev2:index.lhtml:11:10:: is a HTML precompiler for
dev2:index.lhtml:11:37:b:<b>
dev2:index.lhtml:11:40::static
dev2:index.lhtml:11:46:/b:</b>
dev2:index.lhtml:11:50:: home page generation.
dev2:index.lhtml:12:0:: It detects special HTML tags in your page and generates HTML code out of it.
dev2:index.lhtml:13:0:/p:</p>
dev2:index.lhtml:14:0::
|
The first field is the directory where the file was found. This field can be empty.
The second field is the name of the file, that was original included.
The third field is the line number where this tag was found.
The fourth field is the column number where the tag was locacated within this line.
The fifth field is the name of the tag that was found. If there was no tag, this file is empty.
And the last file is the string that was found. This can be a tag too. If this field
is empty it can be a line break.
Now you can write an extra parser to parse each of these lines, or you
use the Line Class, that does this stuff for you. And a Line
will be accepted by the Tag Class. All these usefule classes are designed to work together.
Your life will be easier if you use them.
| |
|
Definition:
- Class: Lexer
-
- Constructor:
-
Lexer( File in, File out, current_dir =
// );
- Functions:
-
Bool is_valid();
String get_error();
Null lex();
|
The lexer get's an input file and produeces an output file.
If you now the current directory you can set it optional.
After constructing the Lexer class check if it valid.
If it is not valid you can get the error message via the get_error() function.
The third step is to lex the file. This will be done by the lex() class.
After lexing the file check again, if the lexer is valid. It is possible that
there occoured an error while lexing.
| |
|
Even if this is an reference, and I didn't wanted to add bigger
examples here, I think it is better to add one here. This example
shows how the various classes are working together.
This example is a simple filter. It searches for all tags within a
HTML file and erases them.
Example:
function html2text( infile_name ) {
// try open file
var infile = new File( infile_name, "r", true, true );
if( !infile.is_valid() ) {
message( "ERROR: cannot open file", infile.get_name() );
return;
}
// lex the file
// create a tmp file
var tmp = new File( get_tmp_file(), "rwt", false, false );
var lexer = new Lexer( infile, tmp );
if( ! lexer.is_valid() ) {
message( "ERROR: cannot lex file:", lexer.get_error() );
return;
}
lexer.lex();
if( ! lexer.is_valid() ) {
message( "ERROR: cannot lex file:", lexer.get_error() );
return;
}
tmp.clear();
tmp.seekg(0); // spool back to begin
while( ! tmp.eof() ) {
var line = new Line( tmp.getline() );
// ignore invalid lines
if( ! line.is_valid() )
continue;
if( line.tag_type != "" ) {
// here you can now inspekt the tag
// it is not necessary but a good
// example
var tag = new Tag( line );
if( tag.is_valid() ) {
var m = "Tag: " + tag.get_tag_type(); // the message string
for( var i = 0 ; i < tag.get_number_of_options() ; i++ ) {
m = m + " | Option: " + tag.get_option_name( i )
+ " Value: " + tag.get_option_value( i );
}
message( m );
}
} else {
// Line is not a tag, print the stuff out
print( line.tag );
}
}
}
|
| |
|
This page was created by
King Leo
. Page generator was
Leo<HTML>
version
0.99.0
.
|