wc: Word Count

The goal for this Programming Praxis was to implement the Unix wc function. This one took me a couple days (I haven’t had a lot of time recently) to complete, but I finally finished it and this it works pretty good. It’s not a drop in replacement for wc, but it works – a programmers solution: by programmers, for programmers.

use Switch;

#print "arg num ". $#ARGV;
if ($#ARGV == -1) {
	usage();
}

@option;
@input;

#check first char of each argument
foreach $arg (@ARGV){
	if($arg =~ m/-.*/){
		@option = split //, $arg;
		#print "!".$arg."\n";
	}
	else{
		push(@input, $arg);
	}
}
#remove the "-" from the array
shift(@option);

foreach $input (@input){
	
	if(-e $input){
	#file exists, continue
		local( *FH ) ;
		open( FH, $input ) or die "sudden flaming death\n";
		$file = do { local( $/ ) ; <FH> };
		$fileSize = (-s $input);
		close(FH);
	}else{
		die("No such file or directory");
	}
	print "$input ";
	foreach $flag (@option){
		switch ($flag){
			
			case "c" {print "$fileSize "}
			case "m" {print length($file) . " "}
			case "l" {print $file =~ s/((^|\s)\S)/$1/g ." "}
			case "L" {print maxLineLength($input)." "}
			case "w" {print $file =~ s/(\n)/$1/g . " "}
			else {die ("invalid option -- $flag")}
		}
	}
	print "\n";
}

sub maxLineLength($input){
	my($thisInput) = @_;
	open(FH, $thisInput);
	$maxLine = 0;
	foreach $line (<FH>){
		if(length($line) > $maxLine){
			$maxLine = length($line);
		}
	}
	return $maxLine;
}

sub usage{

	print "Usage: wc [OPTION]... [FILE]...\n";
	print "  or:  wc [OPTION]... --files0-from=F\n";
	print "Print newline, word, and byte counts for each FILE, and a total line if\n";
	print "more than one FILE is specified.  With no FILE, or when FILE is -,\n";
	print "read standard input.\n";
	print "  -c            print the byte counts\n";
	print "  -m            print the character counts\n";
	print "  -l            print the newline counts\n";
	print "  -L, --max-line-length  print the length of the longest line\n";
	print "  -w, --words            print the word counts\n";
}

Happy Unix Day

In many computer system, time is kept track as number of seconds since midnight on January 1, 1970 (also know as epoch). This time format originated with the Unix system, which is why it’s often referred to as Unix time. At this very instant, the number of seconds since epoch is exactly 1,234,567,890. Happy Unix Day.

I think the next time we’ll probably celebrate is Monday, January 18th, 2038 at 20:14:07 hours. This is when we run into the real millennium bug. Why? Unix time is a 32-bit signed integer (it’s probably a signed integer so that dates before 1 Jan 1970 can be expressed). So, a signed 32-bit number has the range of -231 to +231-1…which is -2,147,483,648 to 2,147,483,647 for everyone who doesn’t want to break out their calculators.

2,147,483,647 turns out to be Monday, January 18th, 2038 at 20:14:07 hours (mountain time). What happens after this? The computer will think it’s Friday, December 13th, 1901 at 12:45:52 hours. Why does it think this? Because computers still count in binary. The important thing to know about a signed binary number is that the sign bit (i.e. how the computer knows if a number is negative or positive, 0 = positive and 1 = negative) is the most significant bit (i.e. the left-most bit). So, if we have a 32-bit signed number, that really means 31-bits worth of numbers, plus a bit for the plus or negative. Thus, the largest positive number we can have is:
0111 1111 1111 1111 1111 1111 1111 1111

…which is 2,147,483,647 in decimal (bonus points if you figure out the hex value of that by converting in your head).

If you add “1” to that binary number, it rolls over to:
1000 0000 0000 0000 0000 0000 0000 0000

…which you might think is negative zero. Of course, you would then ask yourself what’s the difference between negative zero and positive zero. And the answer is: there is none (at least none that most computer programmers care about). So instead of having two versions of zero, some genius decided to have only one version of zero (he kept the positive version), and to make extend the negative range of numbers by one. Thus, two’s compliment was invented and every signed-integer format uses two’s compliment.

To figure out the actual value of a negative number (1000 0000 0000 0000 0000 0000 0000 0000, in our case), you just subtract one and then flip all the bits (all the 1’s become 0’s and all the 0’s become 1’s) and read the resulting value as an unsigned value

So,
Start with: 1000 0000 0000 0000 0000 0000 0000 0000
Subtract 1 to get: 0111 1111 1111 1111 1111 1111 1111 1111
Flip all the bits to get: 1000 0000 0000 0000 0000 0000 0000 0000
Which is: 2,147,483,648 in decimal
But remember we started out with a negative number, so we have to add the negative sign back in to get: -2,147,483,648

And -2,147,483,648 is Friday, December 13th, 1901 at 12:45:52 hours in Unix time.

For what it’s worth, I was really only planning on saying “Happy Unix Day” but I got carried away. Sorry about that.

1FFFFFFFFFFFFFFF