CSUB Computer Science - Michael Sarr

Reading

here is some discussion of a hashtable

https://cs.csub.edu/~msarr/cmps2020/lab/lab27/lec/hashtables.pdf






A Hash is a specialized container that attempts to have an insert or search that has a big O of 1

depending on the data and the efectivness of the hashing function it can get fast results, HOWEVER
it generally does not work well as it becomes full.

for these examples we will add values untill the hash is a certain percent full.
if an insert or search has 0 colisions the operation will be at Big of 1

as you can see here above 60% full we see a dramatic decrease in efficiency


Hash insert 203  to Hash of size 2039  ( 10%): time 255    : average time 1    : longest 3    : colisions 0
Hash insert 407  to Hash of size 2039  ( 20%): time 597    : average time 1    : longest 80   : colisions 0
Hash insert 611  to Hash of size 2039  ( 30%): time 826    : average time 1    : longest 59   : colisions 0
Hash insert 815  to Hash of size 2039  ( 40%): time 1168   : average time 1    : longest 72   : colisions 175
Hash insert 1019 to Hash of size 2039  ( 50%): time 1443   : average time 1    : longest 76   : colisions 379
Hash insert 1223 to Hash of size 2039  ( 60%): time 1729   : average time 1    : longest 75   : colisions 583
Hash insert 1427 to Hash of size 2039  ( 70%): time 2000   : average time 1    : longest 69   : colisions 787
Hash insert 1631 to Hash of size 2039  ( 80%): time 3858   : average time 2    : longest 69   : colisions 6144
Hash insert 1835 to Hash of size 2039  ( 90%): time 17338  : average time 9    : longest 87   : colisions 49840
Hash insert 2039 to Hash of size 2039  (100%): time 36020  : average time 17   : longest 225  : colisions 110591


take away the larger the hashtable is the faster it will be, but that could mean a lot of unused space 
and memory may be an concern, if not make it big. as you can see trying to make it small in relation to how many values
you plan to insert can really be a concern






another key aspect of a hashing approach is to create a hash key that will spread out our potential values evenly.

here are a few examples creating  different hash keys for a string input value.. 

we are using names and our hashtable will have a size of 2047 ( a prime number )




/*
pass in a string , add up the ASCII values of all the individual characters, mod it with the
size of the hashtable we will insert into and return the value
*/

int hashFunction(string key) 
{
    int hashCode = 0;
    for (int i = 0; i < key.length(); i++) 
    {
        hashCode +=  key[i];
    }
    return hashCode%HASHSIZE;
}

results
Hash "Bob Roberts         " hash1 = 1044  
Hash "Rob Boberts         " hash1 = 1044  
Hash "Eduardo Lopez       " hash1 = 1262  
Hash "Batman              " hash1 = 595  
Hash "Bartman             " hash1 = 709  

there are two problems here , we have to similar names hash to the same value
also our potential values seem to be clustering in the 600-1200 range we want something like 0-2046

so lets try again

/* lets muliply the ASCII value of each character in the string by a constant to inrease the spread 
of the values of the keys
*/ 


#define PRIME_CONST 31
int hashFunction(string key) 
{
    int hashCode = 0;
    for (int i = 0; i < key.length(); i++) 
    {
        hashCode +=  key[i] * PRIME_CONST;// make the key larger
    }
    return hashCode%HASHSIZE;
}

results

Hash "Bob Roberts         " hash1 = 1659  
Hash "Rob Boberts         " hash1 = 1659  
Hash "Eduardo Lopez       " hash1 = 229  
Hash "Batman              " hash1 = 22  
Hash "Bartman             " hash1 = 1509  


thats a bit better , now we are covering the range of  22-1659
our keys are more spread out in our hash
but Bob and Rob still have the same hash key value, becuase they have the same characters

so lets try again 

/*
lets make it have something that looks at the position in the string as well 
so Bob Roberts  and Rob Boberts  dont have the same key even if they have the exact same characters
*/


#define PRIME_CONST 31
int hashFunction(string key) 
{
    int hashCode = 0;
    for (int i = 0; i < key.length(); i++) 
    {
        hashCode +=  key[i] * i * PRIME_CONST; // ASCII * position in string * constant prime number
    }
    return hashCode%HASHSIZE;
}

results

Hash "Bob Roberts         " hash1 = 255  
Hash "Rob Boberts         " hash1 = 318  
Hash "Eduardo Lopez       " hash1 = 1464  
Hash "Batman              " hash1 = 286  
Hash "Bartman             " hash1 = 276  


now Bob and Rob are different but it still seems to be clustering in the 200-400 range 

lets try to change the prime number we are multiplying to 67

#define PRIME_CONST 67
int hashFunction(string key) 
{
    int hashCode = 0;
    for (int i = 0; i < key.length(); i++) 
    {
        hashCode +=  key[i] * i * PRIME_CONST; // ASCII * position in string * constant prime number
    }
    return hashCode%HASHSIZE;
}

results


Hash "Bob Roberts         " hash1 = 287  
Hash "Rob Boberts         " hash1 = 93  
Hash "Eduardo Lopez       " hash1 = 853  
Hash "Batman              " hash1 = 354  
Hash "Bartman             " hash1 = 1587 


OK now thats a pretty good spread , we can probably insert names into the hash and minimize colissions

lets try a few more, im currious

Hash "Bob Roberts         " hash1 = 287  
Hash "Rob Boberts         " hash1 = 93  
Hash "Eduardo Lopez       " hash1 = 853  
Hash "Batman              " hash1 = 354  
Hash "Millhouse           " hash1 = 1361  
Hash "Lisa                " hash1 = 1002  
Hash "Homer J. Simpson    " hash1 = 313  
Hash "Marge Simpson       " hash1 = 1867  
Hash "Maggie Simpson      " hash1 = 1385  
Hash "Krusty The Clown    " hash1 = 1981  
Hash "Sideshow Bob        " hash1 = 975  
Hash "Mayor Quimby        " hash1 = 235  
Hash "Edna Krabappel      " hash1 = 1875  


looks good.