lex

lex

The software tool lex(CP) lets you quickly generate solutions to problems that involve lexical analysis, that is, the recognition of strings of characters that satisfy certain characteristics. This enables you to solve a wide class of problems drawn from text processing, code enciphering, compiler writing, and other areas. For example:

in text processing, checking the spelling of words for errors
in code enciphering, translating certain patterns of characters into others
in compiler writing, determining what the tokens (smallest meaningful sequences of characters) are in the program to be compiled

It is not essential to use lex to handle problems of this kind: you could write programs in a standard language like C to handle them. What lex does is generate such C programs, based on a set of specifications that you give it. These lex specifications name and describe the classes of strings that you wish to recognize, and often give actions to be carried out when a particular kind of string is found. lex is referred to as a ``program generator'': more specifically, it is a ``lexical analyzer generator''. It offers a faster, easier way to create programs to perform lexical analysis. Its weakness is that it often produces C programs that are longer and execute more slowly than hand-coded programs that do the same task. In many applications size and speed are minor considerations, and the advantages of using lex considerably outweigh these disadvantages.

To understand what lex does, refer to ``Creation and use of a lexical analyzer with lex''. It begins with a lex specification, sometimes referred to as a lex source program. The source is read by the lex program generator. The output of the program generator is a C program which, in turn, must be compiled in order to generate an executable program that performs the lexical analysis. The lexical analyzer program produced by this process accepts as input any source file and produces the specified output, such as altered text or a list of tokens.

Creation and use of a lexical analyzer with lex

Programs generated by lex can also be used to collect statistical data on features of the input, such as character count, word length, and the number of occurrences of particular words. In later sections of this chapter, you will see how to:

write lex specifications to perform some of these tasks
translate lex specifications into C
compile, link, and execute the lexical analyzer in C
run the lexical analyzer program

This chapter is organized as follows: