Once upon a time I used to write about programming I had done for web pages and other stuff, but it all became woefully out of date and I no longer intend to say anything more about it. Rather than simple delete the section I jammed all the pages into this one. Some links may be broken. If you get anything useful out of this, then great, otherwise, go in search of *real* programming info elsewhere! - JAW Oct 2006
Over the years I've written plenty of software in many different languages; some for work (Ladder logic & functional block style through to your VB in access land up to stuff like clipper(dBase) and Cicode). Some stuff I do "to keep myself informed" purely for interest's sake.
If you want to know about programming that I do as part of my work then open your cheque book and email me, but if you want to know about HTML, Perl or Javascript then have a look at some of these things here - they are free ;)
Message Board.This Perl script was quite an undertaking that I did for a chap I met on the net. I was mentioning to him as we were blasting each others heads off playing Unreal Tournament online that he should have a message board on his web page (as he has quite a number of UT playing visitors). I mentioned that I could write it. Oops ;) I wanted the concept for the message board to be really simple. As someone who wanted to post you would simply type in your name, a password and a message. That message goes straight to the top of the list. If you are a new poster I didn't want some sort of "new user" schenanigans, just whatever password you give that is your password from then on it. You are in and posting straight away. With a password protection system it prevents people from masquerading as other people and allowed me a mechanism to edit or delete any of a user's own posts. It all came together in, well quite a few files. Post.cgiView Script This is the heart of the system. It:
Lets discuss the broad aspects of this script; I'll leave it up to you to get the details from the code. As usual it is pretty well commented (so I can understand what I did the next day after writing it...) Now one of the tricks of this message board is that all data is passed via standard html forms is, umm, "interesting". Html forms "do stuff" to a users input before passing it as data to a script. For example, a carriage return is coded as %0D%0A. In fact all ascii comes in as %xx where xx=the ascii hex character code. A section in the script decodes the message into healthy and happy html style (eg %0D%0A becomes <br>) - and also strips out the name and password of the user and whether this is new, edit or delete. If the message board doesn't already exist, build it from the hard coded template. Easy. Okay the guy I did this for has different tastes in look and feel to me, but hey, we are talking tech only here ;) Is this a current user? If so check the password and proceed. If not, create them as a user and pop the name and password into a flat text file. Okay, slack I know, fancy keeping a persons important password in a flat text file. If it is any consolation it is not world readable so only *I* know who is what...and I'm not tellin'. Right, now the fun part. Turn the input data into a message for insertion into the message board! Well the construction of the message is easy, but how do you insert it? I used html comments throughout the message board source html as markers. For instance, in the message board everything above the line <-- MARKER --> is the automatically generated header, everything below is user messages. <-- postid=x --> represents the start of a user message with id=x (each post gets the next sequential id) and finally <-- message begins --> <-- message ends --> within each post id marks the real words the user typed in. With this, it becomes easy to find the right place to stick the message. If it is a new message, put it directly after <-- MARKER -->, leaving the rest of the file alone. Ta da! How do you do this? Because <-- MARKER --> is unique in the html file I can use the Perl command split to split the file right there. The new file is therefore split(0) + "<-- MARKER -->" + new message + split(1). The "<-- MARKER -->" has to be put back in because it is removed by the split! The edit and delete work similarly but are a slightly more complex split; based on using the <-- postid=x --> and the edit/delete id parameter passed to the post script. New/Edit/DeleteThe New form is a html page that asks for name, password and message. Pressing submit sends the data to post.cgi. View Script Edit on the other hand is a script. Why? Part of the cunningness of post.cgi is an embedded link to edit that tells the script who owns the message (ie name) and the message id so that when you see the edit or delete page it has your name and your old message already there! So when a user decides to edit a message and clicks on "Edit" the postedit.cgi script pops up a form with the username hard coded in and the existing message (which was fetched from the message board itself using the <-- postid=x -->, <-- message begins --> and <-- message ends --> markers). The post.cgi is called from the postedit.cgi generated form with the new data (and password!) with a flag to say "edit post id = x". Given the right password - only the user who posted can edit - the old message is ditched in favour of the new. Deletions is similar except rather than slipping in a new message, it is removed. View Script The Secret Password...Maybe people are posting naughty stuff. As a moderator there is a secret password that allows you to edit or delete anyones post. This is easily implemented by inserting a record into the user database administration=x and checking if the password supplied=x. If you enter that password regardless of your name, you are the man. The Board Itself.Well I probably should have shown you this up front but if you are really keen you'll look at this and reread everything I've explained ;) Conclusion.As with my other Perl applications the code is fairly simple, but it's not the code that is important but the design. As you can see the whole effort hinges around the message board html file which consists of formatted html, marker blocks to identify messages and further script calls with the necessary data attached. Site LogView Script Page hits are interesting to know, you've made a page - how many people are visiting it? But then you realise a perl cgi script can get you more information from a user without his knowledge ;) The IP, the host name, browser and proxy server/via are interesting "free" info. Why with that info you could generate a complete surfing profile of users! First trick - all my pages are HTML not SHTML or XML etc. I wanted to stealthly collect information, how do you call a script without a post or get action? I took a leaf out of the page counters book and I call this script as an image, and it returns...an image. It does other stuff of course, but for all intensive purposes it is a script that returns a gif. That is why it has the gif data in it for the little house you see on all my pages. Yep, that's right, the innocent little house is the sneaky low down data collector ;) Grab the available environment variables and stamp them to a file. uh oh! The file is huge... So I broke the log up into 3 relational files (all text, yeah I know, not very efficient. Too bad - this ain't no 100,000 hits per day site you know...) File 1 is the users file. When a "user" comes along the script checks to see if that user has been here before. The users file is simply userid (ie a number) and the user information. If he's there already grab the id. If not, add him as a new id. File 2 is the datfile. When the little house script is called a "dat" term is passed to the script. (This dat term is also my normal hit counter name, and is currently the filename of the page with the "/"'s substituted as "_".) The main page dat term for example is jaycole_jaw_intro. Otherwise the same rules as File 1, if it's there already grab the id, if not add it and it's new id. File 3 is the log file itself. So when the log script is called it stamps in the date (in a very compact text form), the userid and the fileid. Simple!
So there you have it. User JAW went to the opening page, the 120y page then the pergola page. Later a cygnus user went to the opening page and the 120y page too. The log is small enough for my purpose but there are better ways of doing this. Of course now, what are you going to do with this log? LogviewView Script So much data, so many ways to analyse it... I started the logview by trawling the log files to find out the top x page hits. The code is a little messy I know, but... it basically sorts the 'log' file on page hits, counts duplicates into a new table, reverse sorts that numerically and grabs the top x records. Which of course is now the top x page counts. Hmm. You'll have to think about that one yourself ;) Because the log is all ID based the next trick is to look up the 'dat' term of the file. Now - my search engine has a nice title file. With a little bit of pokery it also contains the dat term and therefore the title is cross referenced. Output the data in a nice table, done. The top x users are handled in much the same way. There is plenty more room for other bits and pieces in the log analysis and I develop new stuff as I think of it. Have a look at the output - much better than a boring old counter hey? :) Crawler - Part 1 of my search engineView Script It's August 2001 and I've decided I need to code some more Perl. A search engine seemed pretty cool. Perl is just the ticket. Okay so I'm reinventing the wheel (there are plenty of free search engines out there), but it's all in the name of self improvement :) Half the battle is deciding how you are going to implement the search engine rather that the actual code itself. I came up with the following methodology, which is loosely based on an "inverted index" method I read about on the 'net:
I'm going to have to give you a quick rundown of the index. Consider the following two files:
Okay, I'll write slowly for you... The crawler generates both of these files. Firstly the title file consists of all the html files the crawler found on my website - it gives them an ID, stamps in the URL name and puts in a description (It gets this from the meta data description field which I use in all my documents - it is a html standard that tells any search engine the description of a document.) It then loads up every single html file and decides what words are in it. It puts an entry for each word it found in the word file, followed by the file ID that the word came from. If a word appears more than once it doesn't matter. If the same word appears in another document then that document file id is added to the word entry. For example the word 'beta' appears in files 2 and 3. And that, is all the crawler does. Everything else is done by the search.cgi script discussed next. Follow the code if you are keen, all my code is well commented (;)) because I may one day need to understand what it does... Note that this is not the only way to implement a search engine, in fact it is kinda clunky and slow. My word file is over 100k long. Bottom line is that it is simple and I understand it :) Search - Part 2 of my search engineView Script Now that we have the index structure in place, the search is easy enough. Take the search criteria from the user and go throught the word file. If they didn't pass any criteria tell them to. If there were no hits tell them there were no hits. If there was a hit, grab all the file IDs the hits were in and cross reference it with the title file. Build a nice output to be returned to the browser. Get it? A little twist is if there was more than one word passed. I decided that multi word would automatically be ANDed. The method I used was to search each word individually and then check if that word was in all the different searches. ie, logical AND. A nice feature I put in with some sweat was the ability to recognise that "test", "testing", "tested" should all be considered to be the same thing. It does a primitive version of this by saying "is the word in the word I'm checking?" ie, is test in test? is test in testing? When it is, those file IDs are also considered a hit. Nifty huh? Showscript - it, um, shows scripts...View Script This is a cute little example of Perl, written once I was getting a good feel for Perl. You have been running it in fact to view the source of all my scripts. It takes as an argument the name of a script file on my server. It then opens that file, sucks it into a giant string, replaces any naughty characters that html doesn't like with ones html does like and outputs it back to the browser. So yep, what you are seeing is the actual source for all my perl, not some fancy tweaked up text copy of it ;) Guest BookView Script I put this together in Jun '97. I didn't have much of a clue, and hacked most of the code from bits and pieces of other scripts I found on the web. This is of course the *best* way of coding, well, most anything. Why invent the wheel? Nothing special, basically there is a form on my main intro page which asks for your name or email (it's just some text as far as the script is concerned) a "thoughts" option box and any further comments. You fill them out and hit submit, the data gets passed to the script. The HTML formwork was half the battle in itself... The code returns some words to say thanks for submitting, appends the data to a big file and sends me some email. That's it! In fact looking at the code now it is ugly and clunky, but we all had to start somewhere huh. Javascript Programming.A nifty little language although I don't see all that much use for it. I've written a few little apps, but nothing to write home about. Oz Tax CalculatorI wrote this because my company pays me dollars into my account and then I wonder how much tax needs to be paid on that. With this simple bit of maths trickery you can find out. Look at the source and See it in action Interest CalculatorEven simpler than the Oz tax calculator, this works out compound interest. Look at the source and See it in action Payment CalculatorDon't earn interest, pay interest. Look at the source and See it in action Frameset FixerThis is a great concept I nabbed from www.echoecho.com who just quietly have a lot of good information. "Frames suck" they say. Well I quite like frames for use in navigation, the only issue I have against them is that you can "break" a frame and be stuck in child frame. With no navigation. At the time I stumbled this sweet concept below my user logs were telling me that I was getting 80% of all visits to my pages as "one hit wonders", ie they probably arrived from a search engine directly to a child frame and then left; there was nothing else to see was there? Because the navigation was not displayed, the search engine link "broke" the frameset. Now I don't specifically care, I'm not making any money here - but the technical challenge was now there and waiting to be conquered ;) Here's how it works. Firstly, all my pages and directories have a structure. For instance, I have a frameset html in all subdirectories for my different areas called "welcome.html". In that subdirectory there is an navigation html called "index_(blah).html" and an introduction page called "intro.html". On every welcome.html in the head section appears this code:
<script>
if (location.search)
{subpage=location.search.substring(1)}
else
{subpage='intro.html'}
</script>
The if statement says "was I passed any data when this page was called?", if so then subpage gets set to the data passed, if not then it defaults to intro.html, my standard introduction page. The body of the document is then a simple bunch of writes:
<script>
document.write('<frameset cols="156, *">');
document.write('<frame src="index_(blah).html" name="index">');
document.write('<frame src="'+subpage+'" name="textarea">');
document.write('</frameset>');
</script>
ie, set the frame, make the left nav frame the index_(blah).html and make the right frame set to the subpage string variable. Stay with me here... In all my stories and stuff is the following bit of script:
<script>
function detectframeset()
{
pagearray=window.location.href.split("/");
if (parent.location.href==window.location.href)
{parent.location.href="welcome.html?"+pagearray[pagearray.length-1];}
}
</script>
This little function is run on load says "make me an array which is the split at / of the href for this page." Don't understand? This demo will help you: href = "test/this/href.html"; // when split at "/" into pagearray: pagearray[0] = "test"; pagearray[1] = "this"; pagearray[2] = "href.html"; The next thing that happens is the location of the parent window is compared with the location of this window. If they are different it must be a framed child window. Fair enough, the frameset _must_ exist so do nothing. If on the other hand the locations are equal then this page we are looking at right now must not be in a frame - it is the parent! This means that this page was called directly and not from my nice navigation frame. In that case, set the parent reference to welcome.html and pass as data the last pagearray element grabbed moments ago. In the demo above that would be "href.html". So the window is changed to welcome.html. Welcome.html checks to see if anything was passed and uses that as the right frame - so now the user is looking at the same page except with the navigation now correctly shown. Neat! Recap: if welcome.html was called without data then the usual intro.html is displayed. If a page is displayed and it is already in a frame do nothing. If a page is called and it is not in a frame then display welcome.html and tell it to redisplay me in the frame. LATEST: the search engines took one look at my nice javascript as said "you are trying to spam our index with redirects" and promptly dropped all links to any of my pages. Well that's a shame. I've now reverted back to single html pages with a tabled menu that is "synchronised" by a bit of perl script, so that there is just a single point of maintenance on menus. HTML Programming.I've never used a fangled "web publisher" and with any luck will never need to. Have you seen the disgusting HTML source they produce? It's downright ee-vil! I bet the designers of hypertext markup language would like to hunt down and destroy all software that abuses what HTML is all about. I'm no angel, but I respect good HTML source. Indented. Simple. Use of style sheets. <h1> etc used throughout. Now I'm not going to harp on (yet), but you take a look at different HTML source from time to time. Then have a look at mine. You'll see what I mean. All the source on this site was hand created with a text file editor( mostly notepad). I keep my source clean using HTMLTidy (from www.w3c.org) - which does the formatting for me when I forget or can't be bothered. I'm not saying this is the right way to do it, but hey, it works for me. JAW's Programming.
|








