Tcl Tutorial Part III

String Manipulation

Tcl is 8 bit clean (not just ASCII 7 bit subset). Tcl does not apply any interpretation to characters outside of the ASCII subset. Tcl stores strings using a null (zero) character for termination, so it is not possible to store zero characters in a string. To represent binary data, convert it to a form that includes no zero characters, for example, by translating bytes to their corresponding hexadecimal values.

Note : As of TCL 8.0, binary strings are fully supported, above text is valid for only version before TCL 8.0

Glob-style pattern matching

Simplest form of Tcl pattern matching.

Syntax:

string match pattern string

Return 1 if match, 0 if no match. Special characters used in matching

*	Matches any sequence of zero or more characters.
?	Matches any single character.
[chars]	Matches any single character in chars. If chars contains a sequence of the form a-b, any character between a and b inclusive will match.
\x	Matches the single character x. This provides a way to avoid special interpretation for any of the characters *?[]\ in the pattern.

Pattern matching with regular expressions

Regular expression patterns can have several layers of structure. Basic building blocks are called atoms and the simplest form regular expression consists of one or more atoms. For a regular expression to match an input string, there must be a substring of the input where each of the regular expression's atoms (or other components) matches the corresponding part of the substring. E.g. regular expression abc matches any string containing abc such as abcdef or xabcy.

For example, the following pattern matches any string that is either a hexadecimal number or a decimal number.

^((0x)?[0-9a-fA-F]+|[0-9]+)$

Syntax:

regexp ?-nocase? ?-indices? {pattern} input_string ?variable ...?

and returns 0 if there is no match, 1 if there is a match.

Note, the pattern must be enclosed in braces so that the characters $, [, and ] are passed through to the regexp command instead of triggering variable or command substitution.

If regexp is invoked with arguments after the input string, each argument is treated as a name of a variable. The first variable is filled in with the substring that matched the entire regular expression. The second variable is filled in with the portion of the substring that matched the leftmost leftmost parenthesized subexpression within the pattern; the third third variable is filled in with the the match for the next parenthesized subexpression and so on. If there are more variables names than parenthesized subexpressions, the extra variables are set to empty strings.

Example:

regexp {([0-9]+) *([a-z]+)} "Walk 10 km" a b c

variable a will have the value "10 km", b will have "10" and c will have "km".

The switch -nocase specifies to match without case sensitivity. The switch -indices specifies that the additional variables should not be filled in with the values of the matching substrings, but with a list giving the first and last indices of the substring's range within the input string.

Example:

regexp -indices {([0-9]+) *([a-z]+)} "Walk 10 km"a b c

variable a will have the value "5 9", b will have "5 6" and c will have "8 9".

Characters	Meaning
.	Matches any single character
^	Matches the null string at the start of the input string.
$	Matches the null string at the end of the input string.
\x	Matches the character x.
[chars]	Matches any single character from chars. If the first character of chars is ^, the pattern matches any single character not in the remainder of chars. A sequence in the form of a-b in chars is treated as shorthand for all of the ASCII characters between a and b inclusive. If the first character in chars (possibly following a ^) is ], it is treated literally (as part of chars instead of a terminator). If a - appears first or last in chars, it is treated literally.
(regexp)	Matches anything that matches the regular expression regexp. Used for grouping and for identifying pieces of the matching substring.
*	Matches a sequence of 0 or more matches of the preceding atom.
+	Matches a sequence of 1 or more matches of the preceding atom.
?	Matches either a null string or a match of the preceding atom.
regexp1 \| regexp2	Matches anything that matches either regexp1 or regexp2.

Syntax:

regsub ?-nocase? ?-all? pattern input_string replacement_value new_string

The first argument to regsub is the regular expression pattern. If a match is found in the input string, regsub return 1, otherwise it returns 0 (like regexp command). If the pattern is matched, the substring of the input string is replaced by the third argument and the new string is stored in the fourth argument. If a match was not found, the fourth argument contains the original input string. Two switches can be used: -nocase is equivalent to the nocase switch in the regexp command; -all causes every matching substring in the input string to be replaced.

Formatted output

The format command provides facilities like sprintf in ANSI C.

Example:

format "The square root of 10 is %.3f" [expr exp(10)]

=> The square root of 10 is 3.162

Other format specifiers:

Format	Meaning
%s	String
%d	Decimal integer
%f	Real number
%e	Real number in mantissa-exponent form
%x	Hexadecimal
%c	Character

The format command can also be used to change the representation of a value. For example, formatting an integer with %c generates the ASCII character represented by the integer.

Parsing strings with scan

Syntax:

scan parse_string format_string ?variable ...?

Example:

scan "16 units, 24.2 margin" "%d units, %f" a b

=> 2

Character functions

String manipulation commands are options of the string command.

string index "See Spot run." 5

=> p

string range "See Spot run." 5 8

=> Spot

string range "See Spot run." 5 end

=> Spot run.

Seaching and comparison

Searching for a substring with first or last returns the position of the first character of the substring (starting at 0 for the first character in the input string). Returns -1 if no match was found.

string first th "The trains were thirty minutes late this past week"

=> 16

string last th "The trains were thirty minutes late this past week"

=> 36

Compare returns 0 if the strings match, -1 if the first string sorts before the second, and 1 if the first string sorts after the second.

string compare twelve thirteen

=> 1

string compare twelve twelve

=> 0

Length, case conversion, and trimming

string length "not too long"

=> 12

string toupper "Hello, World!"

=> HELLO, WORLD!

string tolower "You are lucky winner 13!"

=> you are lucky winner 13!

string trim abracadabra abr

=> cad

string trim takes a string to trim and an optional set of trim characters and removes all instances of the trim characters from both the beginning and end of its argument string, returning the trimmed string as result. trimleft and trimright options work in the same way except they only remove the trim characters from the beginning or end of the string. The trim comands are mostly commonly used to remove excess white space; if no trim characters are specified, they default to the white space characters (space, tab, newline, carriage return, and form feed)

Do you have any Comment? mail me at:deepak@asic-world.com