A regular expression in computing is a way of describing in a concise and accurate manner, the presence of particular characters, words, phrases, and other textual information [wikipedia.org: "regular expression"].
Various symbols found on US-English keyboards, are used to describe the presence or absence of certain text characters in a given text string. The following are examples of the use of these symbols and what they mean:
vertical bar (or "pipe"), | : means the item on either side is an alternative, e.g., [flavor | flavour] means that EITHER "flavor" or "flavour" occur in thi subject text string
parentheses, ( ) : can be used to more particularly specify the alternatives by grouping the changes, e.g., [flav(o|ou)r] represents "flavor" or "flavour"
question mark, ? : indicates zero or one of the previous text element, e.g., [flavou?r], again, represents "flavor" or "flavour"
asterisk, * : indicates zero or more of the preceding element, e.g., [flavou*r] represents "flavor", "flavour", "flavouur", "flavouuur", etc. - note that this particular construction produces non-sensical words
plus sign, + : indicates one or more of the preceding element, e.g., [flavou+r] represents "flavour", "flavouur", "flavouuur", etc. - again, non-sensical words can be produced in this particular construction.
These constructions (see Regular Expression books and literature) are an inherent aspect of the mathematical field of Set Theory. For example, instead of using a vertical bar for alternation, a regular expression can list the intended query items with quoted items separated by commas, e.g., ["flavor", "flavour"] represents "flavor" or "flavour". Alternatively, the mathematical symbol for a union of sets can be used: e.g., {"flavor"} U {"flavour"} represents the same expression.
Thanks to the early efforts of Stephen Cole Kleene and Ken Thompson in the 1950s and later, various computer languagees were developed to handle pattern matching and searching a given text for particular text string, e.g., SNOBOL. Various text editors were created that had built-in capability for searching "regular expressions", e.g., QED, grep, expr, AWK, Emacs, vi, and lex. Computer languages such as Perl and Tcl used regular expressions from a library written by Henry Spencer. More recently, PHP and Apache HTTP Server have regular expression functionality, especially in handling database queries, the primary software engine for patent and other online searches.
Even more recently, DTD syntax and XML are using regular expression functionality for consistency and for data specification and location.
In place of Regular Expressions, "Nondeterministic Finite Automata" or NFAs, can be used, but this format does not appear to have the same general support in existing software systems.
Finally, POSIX Basic Regular Expressions (or "BRE") and Extended Regular Expression (or "ERE") are defined by IEEE as standards.
Francis "Fran" Lorin
siberkhem.com
20080207
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment