Perl Notes

This is just a quick reference for some of the commonly used features of Perl, and contains a number of oversimplifications. (It probably also contains a number of errors - holler if you find some!)

For a more thorough and precise description check out the Perl website or one of the numerous Perl reference books.

Table of contents

Intro / use
Identifiers and data types (scalars, strings, lists, arrays, hashes, references, special variables, and scoping rules)
Control structures (if/elsif/else, unless, until, while, for, foreach, goto, next, last, redo, continue)
Operators and expressions (plus precedence/associativity rules)
Subroutines
Regular expressions
Files and I/O
die and warn
Built-in functions
CGI and Cookies
Perl with MySQL

Intro / use

Perl is a powerful scripting language with a great many handy features and interfaces, but it also allows the generation of extremely compact (read cryptic) code.

You can either run commands interactively from the perl interpretter (/usr/bin/perl) or put your perl commands in a file that begins with
#!/usr/bin/perl
(that's version 5.8 at the time of this writing) or (for version 5.004)
#!/usr/local/bin/perl

You can also determine which version you're working with using perl -v

Lots of information, and the latest downloads, are available through perl.com .

General language features:

All statements end with a ; except for compound statements ending with a }
Comments on a line are everything to the right of a #
Multiline comments can be begun by placing = at the start of a line, all lines are ignored until and including one which begins with =cut
All user created objects are initialized to values null or 0 (as appropriate), so no warnings are given about using uninitialized values
Whitespace between tokens is required only where the two tokens could be confused as a single term

Identifiers and data types

Operands or expressions will be evaluated based on their context as either scalar elements or list elements.

As an example,

if the localtime function is called in a scalar context it will return a scalar value: the number of seconds since January 1, 1970
```
$numsecs = localtime();
```
if the same function is called in an array context it returns a nine-element list with the elements corresponding to seconds, minutes, hours, day, month, year, etc.
```
@datearray = localtime();
```

Scalars

Scalars include the following:

signed integers
double-precision floating point numbers
strings
references

Identifiers for scalar variables begin with a $ symbol.

Strings:

strings may be enclosed either in single quotes or double quotes.
Backslash and variable interpolation takes place in double-quoted strings, for example in the string
```
"variable foo has value \n \t $foo"
```
\n and \t are interpretted as newline and tab characters, and the value of variable foo would be substituted for the $foo token
The only interpolation which takes place in single-quoted strings are \' and \\, which are used to include ' and \ in a string

Lists

A list is an ordered group of comma-separated scalar values enclosed in parentheses, such as:

('a','blah','foo')
(2,4,5,9)
etc

You can create a list from a text block using the qw// operator:

qw/a b c/   # creates list ('a','b','c')

Arrays

Identifiers for array variables begin with the @ symbol.

The values of an array can be set from a list, e.g.

@myarray = (1, 7, 19, 3);
@otherarray = ("foo","blah","etc");

Accessing individual array elements is carried out through the [] subscript delimiters, e.g.

print "$myarray[0]";

(Note that the array element is a scalar, so we're using $myarray[0] not @myarray[0].)

The array elements are subscripted starting from 0, and if a negative subscript is used it references backwards from the end of the array.

print "$myarray[-1]"; # prints the last element of the array

If an element is accessed beyond the current known array bounds the array is dynamically resized to hold the new bounds (hence no range checking).

Hashes

Identifiers for hash variables begin with the % symbol.

Hash variables consist of a set of key/value pairs, the key is used to access elements rather than a subscript.

Hash values can be set using a list of pairs of values, e.g.:

%colorvalues = (
       'red', 1,
       'green', 2,
       'blue', 3
     );

print "$colorvalues{'red'}";  # prints value 1

An alternative notation that has the same effect but might be clearer from a readability viewpoint is using the => operator:

%colorvalues = (
       'red' => 1,
       'green' => 2,
       'blue' => 3
     );

print "$colorvalues{'red'}";  # prints value 1

As with arrays, note that the hash reference uses the $ operator, not the %, since we're accessing the scalar content within the hash.

References

A reference operates like a pointer to another location (data or code).

A reference is a scalar variable (hence prefixed with $), and to create a reference use the \ operator before the target location.

E.g. $myptr = \$x; sets up myptr as a reference to variable x

To dereference a variable such as myptr, precede it with the prefix for the desired data type, e.g. use $$myptr, @$myptr, or %$myptr as appropriate depending on whether myptr references a scalar, an array, or a hash.

The ref function can be used to determine which kind of data is being referenced, it returns one of the following:

SCALAR  ARRAY  HASH  CODE  GLOB  REF  FALSE

(False is only returned if you apply ref to something which is not a reference)

Scoping rules

Variables are implicitly declared (type, value, and storage bound) when they are first used in a program (with initial values 0 or null if none supplied).

You may explicitly declare a variable, with either dynamic scope (declaring a local variable) or lexical scope (declaring a my variable).

Lexical scope means the variable is truly local to the block within which they are declared

Dynamic scope means the variable is also visible to any function called from the block in which they are declared (and then to any function called by that function, etc).

Existing "special" variables

There are quite a number of variables with special meaning, here is a small sampling, check a reference manual for many many more:

$_ or $ARG : the default string, the assumed source or destination of a string in many places if you fail to otherwise specify it
$PID : the current process id
$UID : the user id for this process
$BASETIME : the time the prgram began running (seconds since Jan 1/70)
$0 or $PROGRAM_NAME : the name of this program
$DEBUGGING : the current debugging flags
$PERL_VERSION or $] : the current version of Perl
@ARGV : the array of command-line arguments,
e.g. @ARGV[i] is the i^th command line argument, 0-based
%ENV : the current environment settings
$MATCH : the string matched by the last successful pattern match
$INPUT_LINE_NUMBER or $. or $NR : the current input line number on the last input file read
$/ or $RS or $INPUT_RECORD_SEPERATOR : the input record seperator (newline by default, if set to the null string it uses blank lines as seperators)

ARGV, STDERR, STDIN, and STDOUT

Control structures

The if statement

The if statement includes an optional version of else-if and an optional else clause:

if (expression) {
   ...
} elsif (expression) {
   ...
} elsif (expression) {
   ...
} else {
   ...
}

The unless statement

The unless statement operates somewhat like the reverse of the if:

unless (expression) {
   ...
} else {
   ...
}

Labeled blocks and gotos

Any line can be labeled, and an unconditional branch to that line can be performed using the goto command:

JUMPHERE:  ...
           ...
           goto JUMPHERE;

Exception: the goto cannot be used to jump inside a structure that requires some form of initialization, such as a for loop or a subroutine.

The while loop

There are two versions of the while loop, one pretest and one posttest:

while (expression) {
   ...
}

do {
   ...
} while (expression);

The until loop

There are also pretest and posttest versions of the until loop:

until (expression) {
   ...
}

do {
   ...
} until (expression);

The for loop

The for loop structure is very similar to that of C/C++/Java, with an initialization expression, a continuation expression, and a step expression:

for ($index = 0; $index < 10; $index++) {
   ...
}

The foreach loop

The foreach loop executes once on each element of a list, setting the control variable to the current list element on each pass:

# print out a list of colors
foreach $current ("blue", "red", "green") {
   print "$current ";
}

Next, last, and redo statements

You can exit a loop immediately using the last statement.

You can skip to the next iteration of a loop using the next statement.

You can restart execution of the loop block without evaluating the conditional again by using the redo statement.

If the loop structure is labeled, you can supply the label as an argument to the next, last, or redo statements - allowing you to jump across more than one level of loop.

MYLOOP: while (expression) {
            ...
            while (expression2) {
               ...
               if (expression3) {
                  next MYLOOP;
               }
               ...
            }
            ...
        }

Continue blocks

Any loop can be followed by another optional "continue" block, which is executed after each pass through the main loop body.

The continue block is executed after a next statement in the main loop body, but is not executed following a last statement in the main loop body.

Operators and expressions

The operators are generally similar to C, but with the following additions:

, => comma/arrow-comma
\ for reference operator
** exponentiation
<=> signed comparison
and or not xor logic operators
~ for bitwise not
.. ... range
. string concatenation
lt, gt, le, ge, eq, ne, cmp string comparison operators
=~ pattern match
!~ pattern non-match

Precedence and associativity

associativity
left
non
right
right
left
left
left
left
non
non
left
left
left
left
non
right
right
left
right
left
left

operators (high at top)
->
++ --
**
! ~ \ unary+ unary-
=~ !~
* / %
+ - .
<< >>
< > <= >= lt gt le ge
== != <=> eq ne cmp
&
| ^
&&
||
.. ...
?:
= += -= *= **= .= /= %= &= |= ^= <<= >>= &&= ||=
, =>
not
and
or xor

Subroutines

Subroutines can be called using their name and the list of parameters in brackets, e.g. foo(x, y, z)

They can also be called omitting the brackets, e.g. foo x y z

In the callee, the parameters are accessible through the array named @_

You can also call subroutines using the ampersand, e.g. &foo
which has the effect of passing the caller's @_ along to the callee.

When passing arrays or hashes as parameters you'll usually want to pass references to the item, which is done by using the \ operator before the array/hash name, e.g. foo(\@myarray, \@anotherarray);

When declaring subroutines, the format is

sub routinename {
   ...
}

As an option, when declaring the subroutine you can indicate the types of parameters it expects to receive using the symbols $ @ % & * for scalar, list, hash, subroutine, and typeglob respectively.

For example, if you expect a scalar, an array, and another scalar, then the prototype might look like:

sub routinename ($@$) {
   ...
}

To create local variables that are lexically scoped (not visible externally) precede their declaration with my, and for dynamically scoped variables precede their declaration with local.

The return statement allows the return of scalars or lists to the calling routine.

Regular expressions

Perl has good support for string handling through a wide range of pattern matching and string manipulation operators and functions.

Pattern matching: ($mystring =~ /blah/) is true if the mystring variable contains "blah"
Character replacement: $mystring =~ tr/pattern1/pattern2/ goes through mystring and replaces all the characters from pattern1 with the corresponding characters from pattern2
Pattern replacement: $mystring =~ s/oldpattern/newpattern/ goes through mystring and replaces the old pattern (if found) with the new pattern

Regular expression syntax

The simplest regular expression in Perl is simply a string, e.g. "blah".

More complex RE's can be built up based on simpler ones, some of the operators or metacharacters available are:

| for the "or" of two expressions
( ) for grouping expressions, e.g. ("blah" | "foo") would match either string "blah" or string "foo"
. is a wildcard matching any single character

\ for including special characters, e.g.:

 \a  alarm (bell)      \f formfeed
 \n  newline           \e escape
 \r  carriage return   \x7f (any hex value for 7f) ascii value
 \t  tab               \cx  control-x (any char for x)

^ requires a match at the beginning of the string
$ requires a match at the end of the string
* repeat the previous element 0 or more times
+ repeat the previous element 1 or more times
? repeat the previous element 0 or 1 times
[ ... ] match any element enclosed, can use dashes to indicate a range, e.g. [a-zA-Z0-9] for alphanumeric
Special note: if ^ is placed within the [ ]'s it negates the character class following, e.g. [^0-9] means anything except a digit

Files and I/O

By default, input and output work with STDIN and STDOUT, with STDERR for error output.

To open a file for input, you allocate a file handler with the open function. To access one line of data at a time until end-of-file, use the < > symbols around the file handler and the $_ variable, e.g.:

open(INFILE, "filename.txt");
while (<INFILE>) {
   print "$_ \n";
}

If we wanted to combine that with some of the regular expression tests, we could use something like this (prints all lines containing the text "foo")

open(INFILE, "filename.txt");
while (<INFILE>) {
   print "$_ \n" if /foo/;
}

To open a file for output, prefix the file name with a < symbol, e.g.:

open(OUTFILE, ">filename.txt");
print OUTFILE "blah blah blah";

To concatenate to an existing file, using two >> symbols, e.g.:

open(OUTFILE, ">>filename.txt");
print OUTFILE "blah blah blah";

The open function returns true if the open was successful, false otherwise.

Formatting of print output can be done using format templates and field holders:

format myreport =
The account holder is @<<<<<<<<<< and my account balance is @####.##
$name, $cash

The < symbols are place holders for a left-justified field, and the # symbols are for a fixed-precision numeric field.

For right-justified use > symbols, and for centered fields use | symbols.

die and warn

die is used as a mechanism to test for errors and terminate the program with both an error message and a returned error status.

die "blah blah blah"; terminates the program after doing a print of "blah blah blah". The status value returned is whatever value is currently in variable $!, usually the result of a preceding failed command.

For instance, combined with the file opening code from above:

open(OUTFILE, ">>filename.txt")
   or die "could not open filename.txt";
print OUTFILE "blah blah blah";

warn can similarly be used to generate warning messages without terminating the program, e.g.

open(OUTFILE, ">>filename.txt")
   or warn "could not open filename.txt";

Built-in functions

There are many built-in functions and modules of functions available for Perl. Here is a quick list of a tiny subset:

arithmetic functions: abs, cos, sin, exp, log, rand, sqrt
string functions: chomp, length, reverse, substr, tr
list functions: grep, join, reverse, sort
process functions: sleep, wait, kill, fork, exec, times
file functions: chdir, chmod, mkdir, open, rename
time functions: gmtime, localtime, time, times
array functions: pop, push, splice, shift, unshift

CGI and Cookies in Perl

CGI scripts with Perl

As with Python, the script output in Perl is typically generated with print statements, e.g.:

print "Content-type: text/html\n\n"
      "<html><body>\n"
      "Hi!\n"
      "</body></html>\n"

We can determine what kind of method was used to submit the data (e.g. GET or POST), and then read the query string into a variable for later parsing:

# find out the method used (it is stored as an environment variable)
$method = $ENV{'REQUEST_METHOD'};

# get the query string from an environment variable if 
#     the GET method was used
if ($method eq "GET") {
  $querystring = $ENV{'QUERY_STRING'};
}

# read the query string from the request body if
#      the POST method was used
elsif ($method eq "POST") {
  read(STDIN, $querystring, $ENV{'CONTENT_LENGTH'});
}

# if any other method was used then this 
#    script is the wrong place to be!
else {
   printf "ERROR - illegal method used to call script";
   exit(1);
}

Having extracted the query string, we can split it up into a collection of name/value pairs, then process them one at a time:

@pairs = split(/&/, $querystring);
foreach $pair (@pairs) {
   ($name, $value) = split(/=/, $pair);
   # now do something with the name and value
}

Again, this is only the tip of the iceberg, but it provides a starting point for capturing and using form data.

Using cookies from Perl

Here is a short example illustrating the use of cookies in Perl with the CGI module, I'll fill in more details as time permits.

#! /usr/bin/perl

use CGI qw(:standard);
use CGI::Cookie;

#  we can look up the existing cookies, e.g.:
#  $name = cookie("username");
#  if ($name) {
#     ... the cookie has been set previously ...
#  } else {
#     ... no cookie named "username" has been set yet ...
#  }

# here we set up a form to grab a new username,
#    then we submit the form (to this same script)
#    for processing
   unless( param() ) {
      # this is the first time we've hit the form,
      #    so generate the form to get the cookie values
      print 
         header(),
         start_html("User login form"),
         h1("User login form"),
         start_form(),
         p("Enter your username", textfield("NAME")),
         submit(),
         end_form(),
         end_html();
   } else {
      # the form has been submitted,
      #     so process it and generate the cookie
   
      # first lookup the submitted username
      $name = param("NAME");
   
      # next set up the new cookie values
      $values = CGI::Cookie->new(
            -name => "username",         # the cookie name
            -value => $name,             # the cookie value (the user's name)
            -expires => "+30s",          # it expires 30 seconds from now
            -path => "/home/someplace",  # the path associated with the cookie
         );
   
         # echo the information to the user
      print 
         header( -cookie -> $values),
         start_html("ThanksScreen"),
         h3("Thanks for submitting the User login data"),
         p("Your username is ", b($name),
           " and is valid for the next 30 seconds");
      end_html();
   }

Perl with MySQL

Method 1: (Hack)

If a server doesn't have the Perl DBI module installed then the typical Perl+MySQL options aren't available, and one has to work through operating system calls to the mysql client, as shown below.

In this case, we will create a command string representing the call to the mysql client, and follow the standard Perl procedures for opening a pipe to direct the execution results to an appropriate handle in our Perl script. We can then process through the results of the call one line at a time.]

#! /usr/bin/perl
#

# set up the connection and query information
$query = "\"SHOW DATABASES;\"";
$user = "whoever";
$pwd = "somethingclever";
$executable = "/usr/local/mysql/bin/mysql";

# build the mysql client command
$cmd = "$executable -u$user -p$pwd -e$query";

# run the command and pipe the input to our handle,
#     quit with an error message if we couldn't connect
if (!open(CMDPIPE, "$cmd |")) {
   die "Could not connect to the server\n";
} 

# run through the command output, one line at a time
while (<CMDPIPE>) {
   print "$_";
}

# close the command's pipe
close(CMDPIPE);

Method 2: (Preferred)

If a server has the Perl DBI module installed then this can significantly improve access to a MySQL database.

Below is a short example of connecting to a MySQL database and running a query.

#! /usr/bin/perl5.8.4
#
use CGI;
use DBI;

# declare the variables to hold the necessary
#    user and host information to establish a connection
# here we've hard-coded them in the script itself,
#    which is actually undesirable from a security standpoint

my $host = 'localhost';
my $db = 'some_database_name';
my $user = 'whoever';
my $pwd = 'somethingclever';
my $sock = '/tmp/mysql.sock';

# establish the connection
my $dbhandle = DBI->connect("dbi:mysql:dbname=$db;host=$host;mysql_socket=$sock",
                            "$user", "$pwd");

# prepare a "SHOW TABLES;" query
my $query = $dbhandle->prepare("SHOW TABLES;");

# run the query
$query->execute();

# grab rows of results from the query output
#      until you run out of them
# (note here we just print the first element of
#  each nextrow array, since we know the "show tables" 
#  query just produces one column per row)
while (my @nextrow = $query->fetchrow_array()) {
    printf("%s\n", $nextrow[0]);
}

# finish the query
$query->finish();

# disconnect from the server/database
$dbhandle->disconnect();

If we want to do updates, inserts, or replaces from a Perl script (rather than a simple query) then we use the do command, e.g.

# do takes as parameters the SQL command,
#                        the processing attributes (undef)
#                    and the series of data values to be inserted
# it returns the number of rows affected
my $numrows = $dbhandle->do("INSERT INTO tablename (colA, colB, colC)
                             VALUES (?, ?, ?)", undef, valueA, valueB, valueC);

If we want to do transaction handling from a Perl script, we need to remember to turn autocommit off, and either commit or rollback after the do statement, e.g.

# turn autocommit off
$dbhandle->{AutoCommit}=0;

# do the transaction
  .....

# commit or rollback
#dbhandle->commit();