I tend to use
File::Find
the most in order to get some file searching and mangling. Usually my scripts have the same simple structure as follows:$| = 1; # autoflush
find( \&directory_scanner, ( $starting_directory ) );
$| = 0; # non autoflush
# and the scanner is something like
sub directory_scanner{
chomp;
return if ( $_ eq $starting_directory || ! -f $_ );
return if ( $File::Find::dir !~ /$re_dir(\d{4}-\d{6})$/ );
...
}
As you can see the event handler invoked by
File::Find
is used to both print some report (the $counter
) in order to tell me the script is still alive (I do pass 200+k files at once) but, most notably, applies a regexp to the directory I'm in in order to avoid some staging/backup/etc. directory that could be likely the one I'm interested into but I don't want the script to pass. For a few times I've tried to convert my
Find::File
based scripts to File::Find::Rule
, just to get more used with such interface, but I didn't know how to fix the application of regular expression to the traversing path. Reading a little more deeply the documentation I found the exec
subroutine that allows me to specify an handler (i.e., a subroutine) that can return true
or false
depending on what I want to do on the file I'm visiting. Therefore, converting my scripts becomes as easy as follows: $| = 1;
my $engine = File::Find::Rule->new();
my @files = $engine->file()
->exec( sub {
my ( $shortname, $path, $fullname ) = @_;
return $path !~ /$re_dir(\d{4}-\d{6})$/;
} )
->exec( sub{
my ( $shortname, $path, $fullname ) = @_;
$counter++;
return $shortname =~ /KCL/;
} )
->exec( sub{
my ( $shortname, $path, $fullname ) = @_;
print "." if ( $counter % 100 == 0 );
print "$counter\n" if ( $counter % 1000 == 0 );
return 1; # do not forget !
} )
->in( $starting_directory );
$| = 0;
I've kept three different handlers for readibility sake, but as you can image, it is possible to shrink them down into a single one. The funny part here is that I can check the path against a regexp again. The drawback is that an handler used for output reporting only must return always a true value.
In the case you are wondering, the autoflush is used simply to display the dots while the program is running.