justuptime.com - monitor your servers & websites

how to curb this quantifier's greediness

You are viewing this site as a guest. Join our community to get your questions answered and share knowledge. Active members may advertise and ask for a website critique.

They have: 46 posts

Joined: May 2002

Hi - I'm trying to get a Perl script to parse a HTML file and substitute an image like "submit.gif" with the full domain of that image like "http://www.mydomain.com/submit.gif"

It's working on regular image tags, but it's failing on a graphical submit button (see below). What's wrong with the following regular expression in Perl:

$html_line "";

$image_location = "http://www.domain.com/";

$html_line =~ s/(['"])(.+\.gif)?['"]/$1$image_location$2$1/g;

The result is:

It should be:

TIA...

He has: 296 posts

Joined: May 2002

Maybe it's mistaking TYPE for SRC.

Mark Hensler's picture

He has: 4,044 posts

Joined: Aug 2000

Woohoo, I love regex!

input.html:

this is a line
<img src="sample.gif" height="1" width="1">
<b>this is line 3</b>
<input type="image" src="sample2.gif">
line 5

'

test.pl:

#!/usr/bin/perl

$input_file = "input.html";
$image_location = "http://www.domain.com/";

open (FILE, "$input_file") || die ("Couldn't open guestbook entries file.");
@input = <FILE>;
close (FILE);

foreach $html_line (@input) {
    $html_line =~ s/((['])?(["])?)((?(2)([^']+?)|([^"]+?))\.gif)(?(2)'|")/$1$image_location$4$1/g;
    print $html_line;
}

'

output:

this is a line
<img src="http://www.domain.com/sample.gif" height="1" width="1">
<b>this is line 3</b>
<input type="image" src="http://www.domain.com/sample2.gif">
line 5

'
I haven't used perl much lately. I can remember how to say "I want a single or double quote, then a string, then another quote like the one I got before". So I did it the round about way. Not bullet proof, but it passed my simple input.html test. I can write a better PHP script because I know how to do the quote matching there. Roll eyes

Mark Hensler ["Max Albert"] [Email]
If there is no answer on Google, then there is no question.

Renegade's picture
Moderator

He has: 2,944 posts

Joined: Oct 2002

is it really a good idea to have the full path? it will make the site longer to load...

They have: 46 posts

Joined: May 2002

Mark - thank you - it definitely works - someday I'll get around to try to understand *how* it works Smiling

Regarding the full path to images - I've got a .cgi script in the cgi-bin dynamically creating a web page - without the full path the images will be looked for in the cgi-bin and will not display.

They have: 601 posts

Joined: Nov 2001

Why not use HTML::TokeParser ?

Mark Hensler's picture

He has: 4,044 posts

Joined: Aug 2000

I never got that deep into Perl. Can you show me how HTML::TokeParser would work?

They have: 601 posts

Joined: Nov 2001

Hi Mark

I'd use something like this to grab all image links and prepend an URL. (untested code):

#!/usr/bin/perl

    use strict;

    my $input_file = "input.html";
    my $image_location = "http://www.domain.com/";

    use HTML::TokeParser;
    my $p = HTML::TokeParser->new($input_file);

    while (my $token = $p->get_tag("img")) {

        my $src = $token->[1]{src};
        print $image_location . $src;
    } 

'

- wil