Chapter Seventeen
Databases: Fielded Versus Amorphous


What is a Database, Exactly?

Simply put, a database is just a file containing a bunch of data. This data may be in the form of strings or numerics or a combination of the two.

The laziest way to build a database is to use fielded data. Let's say you want to build a data file of the employees at your company. You would field the information, perhaps like this:

Name - allow 40 characters
Soc Sec Nr - allow 11 characters
Date Hired - allow 8 characters

In your fielded database, a name like Joe T. Hu would still require 40 bytes of storage space, despite being only 9 characters long. When written to your data file, Joe's name would be padded with 31 spaces, so that it would be the full 40 characters long.

The advantage of fielded databases is that you can parse it by character count. In the example above, you would know that if all the files are packed one after another in the data file, every 59 characters equals one employee record. So you could simply "break" the file every 59 bytes, print the data to screen and you would have a neat report:

Joe T. Hu                   555-55-5555   12/12/85

Mary E. Richards 444-44-4444 09/01/92
etc.

The problem with fielded databases is that they are disk hogs. In a typical database, easily 60% of the storage space on disk is filled with spaces!

Disks have really gotten cheap in the past few years. Even a 4 gigabyte disk doesn't cost that much anymore. So lazy programmers are to be excused if they say, "So what if it's a bit inefficient? It's no big deal!"

And it is hard to argue against the logic for local systems.

But when you are delivering that database to a visitor to your site through a dial-up modem that may only be running at 14.4 kbaud (1800 bytes per second) -- sending 2 of every 3 bytes of a fielded database as spaces is a real waste of the visitor's time, and is increasing your server load and bandwidth consumption unnecessarily. If you are billed for long distance telephone time, you will find a strong economic reason to converve bandwidth.

You may think that bandwidth consumption is no big deal to you personally. If you have a site that has low numbers of visitors and server hits, you'd be right.

But if your site begins getting the number of hits it really deserves, once the world discovers just how neat it is, you will be in for a very unpleasant surprise.

Most access providers cut you a bit of slack. They probably "give" you a gig of download each month from your site at no extra charge.

But if your site becomes a really popular destination, you're going to need lots more bandwidth than that -- more on the order of 5 to 10 gigabytes per day -- 150 to 300 gigs per month. Wait until you get the server bill for that "overage"! All this stuff about conserving bandwidth will suddenly become very important to you -- personally.

The Solution: The Amorphous Database

As we saw with the fielded database, 60% or so of the space occupied by the data is wasted. That's where the amorphous database shines.

An amorphous database is identical to the fielded database from the perspective that it stores the information you want, ready for retrieval.

The difference is simple: it doesn't pad the data. In fact, it is far more flexible, because you don't have to worry about a name being longer than the number of characters you allocated for that field.

You can have an empty field which will occupy no space and in the next record, an entry that takes up 122 characters.

You simply pick a character that is not being used in your database, like a tilde (~) and specify it as a field separator.

In the example above for a fielded database, we set up three fields - one for name, one for social security number and one for hire date.

So, for an amorphous database, each record will contain three tildes -- one to serve as a field terminator for each field.

Joe T. Hu~555-55-5555~12/12/85~Mary E. Richards~444-44-4444~09/01/92~

Despite appearing on two lines in this text (if your browser type size is set to 12 pt or greater) the line above is just one long line.

Parsing this database is actually just as simple as parsing a fielded one. You simply look for the tildes until you are done.

Because the database is now one long string, (without delving into the mysteries of contiguous - random - and non-contiguous - sequential -files) you can now pull any specific record very quickly with a simple .indexOf("name"); call.

From time to time, depending upon what you want to do, you may wish to include a record terminator in your amorphous database as well. It will be useful if you wish to load the records into an array without parsing the string all the way down to its individual field components. We might use an asterisk (*) for example, which would change our file like this:

Joe T. Hu~555-55-5555~12/12/85~*Mary E. Richards~444-44-4444~09/01/92~*

Delivering Databases with Your Own Pages

The fastest download of database info is as straight text. The most practical way to store the database is in a "hidden" form element.

Why not just define it as a string in the first place? Because, although that would work, the present limitation on JavaScript line lengths means that you would have to step through the whole thing, putting quotes and plus signs throughout to dis-concatenate the database into line lengths that would be acceptable to the compiler.

A "hidden" form element, on the other hand is not subject to these limitations. You can store a 30k or more database on just a single text line in your "hidden" form element as a VALUE="database here"

Then, as we saw in the last chapter, you may simply call it into a string with this call:

str=document.form.element.value;

Then you can parse that string into records and/or fields, read it into an array if it is desirable to do so, or simply pick pieces of it out to display. This is particularly useful for site searches.

This example may help you with some of the "tricks" to formulate your own site searches:

<HTML><HEAD>
<TITLE>Searching Databases</TITLE>
<SCRIPT LANGUAGE="JavaScript">
<!-- Hide from JS-Impaired Browsers
// First, get the correct newline character
rr=""+(navigator.appVersion.indexOf("Win")>-1?
"\r\n":"\n");
db=new Array();
dblc=new Array();
var arrayflg=0;
 
function showIt(){
 if (arrayflg==0){
  loadIt();
  }
 fsw=document.ug.wrd.value;
 sw=fsw.toLowerCase();
 ls="";
 fflg=0;
 for (var i=0;i<ctr;i++){
  /* We folded the search word to
     lower case, and we look in the
     lowercase array element for a
     match. */
  if (dblc[i].indexOf(sw)>-1){
   /* Since we found a match, set
      the found flag and add the
      unfolded (original) array
      element to the long string. */
   fflg=1;
   ls+='Match to "'+fsw
   +'" in record '+(i+1)+':'+rr+db[i]+rr+rr;
   }
  }
 if (fflg==0){
  ls='Sorry. I found no matches on "'
  +fsw+'"...'+rr
  +'Please try another word. Hint: try "no"'
  }
 /* Display the long string to the
    user. Also, select and focus the
    text element to make it easier to
    type another search word. */
 document.ug.ta.value=ls;
 document.ug.wrd.select();
 document.ug.wrd.focus();
 }
 
function loadIt(){
 /* This separate function simply
    saves parsing the database for
    each search. The arrayflg is
    used to flag that it has been
    done. */
 arrayflg=1;
 /* First, read in the string from
    the hidden form element. */
 ls=document.ug.ly.value;
 ctr=0;
 /* Then parse the long string and
    load it into the array. */
 while (ls.indexOf("*")>-1){
  pos=ls.indexOf("*");
  db[ctr]=ls.substring(0,pos);
  /* Similarly, no need to repeatedly
     convert the strings in the array
     to lower case. Once is enough. */
  dblc[ctr]=db[ctr].toLowerCase();
  ls=ls.substring(pos+1,ls.length);
  ctr++;
  }
 }
// End Hiding -->
</SCRIPT>
<BODY BGCOLOR="white"><CENTER>
<P><B>JavaScript for Searching
 Databases</B>
<BR>(Searching a modestly-sized 1152 character database)
<FORM NAME="ug">
<INPUT TYPE="hidden" NAME="ly"
 VALUE="The quick brown fox jumped over the lazy dogs
and ran into the noonday sun.*Now is the time for all good
men to come to the aid of their party.*All that glisters is
not gold, gilded tombs do worms enfold.*Fools rush in where
angels fear to tread.*I wept because I only had an Apple
Macintosh computer until I met a man who had a PC.*If Murphy's
law is true, there is a missspelling in this sentence*If
computer hardware is but a boat anchor, what is software -
the anchor chain?*A lot of people search for the word 'test'
or the word 'sex'.*Accountants will be pleased to learn that
Ira Magaziner is the new Internet Tax Czar, implementing taxes
on all World Wide Web communications.*Some people say man is
made out of mud.*But a poor man is nothing but muscle and blood.
*When, in the course of human events, it becomes... Well, you
know.*Silly is as silly does. I know, because I work in the
IS department.*People who live in cubicles shouldn't throw
stones at trailer trash.*Searching a site database is really
pretty easy, isn't it?*I am learning ugly JavaScript from
someone who knows ugly.*Stone walls do not a prison make,
nor a cubicle a cage.* ">
<P><B>Type a search word and click the Search Button</B>
<BR><INPUT TYPE="text" NAME="wrd"
 VALUE="">
<BR><INPUT TYPE="button" NAME="but"
 VALUE=" Search " onClick="showIt()">
<P><TEXTAREA NAME="ta" ROWS=12 COLS=75 WRAP=VIRTUAL>
</TEXTAREA>
</FORM>
</BODY>
</HTML>

Click Here to See This Script.



© Copyright 1997, John H. Keyes john.keyes@intellink.net