Reading
here is some discussion of a hashtable
https://cs.csub.edu/~msarr/cmps2020/lab/lab27/lec/hashtables.pdf
A Hash is a specialized container that attempts to have an insert or search that has a big O of 1
depending on the data and the efectivness of the hashing function it can get fast results, HOWEVER
it generally does not work well as it becomes full.
for these examples we will add values untill the hash is a certain percent full.
if an insert or search has 0 colisions the operation will be at Big of 1
as you can see here above 60% full we see a dramatic decrease in efficiency
Hash insert 203 to Hash of size 2039 ( 10%): time 255 : average time 1 : longest 3 : colisions 0
Hash insert 407 to Hash of size 2039 ( 20%): time 597 : average time 1 : longest 80 : colisions 0
Hash insert 611 to Hash of size 2039 ( 30%): time 826 : average time 1 : longest 59 : colisions 0
Hash insert 815 to Hash of size 2039 ( 40%): time 1168 : average time 1 : longest 72 : colisions 175
Hash insert 1019 to Hash of size 2039 ( 50%): time 1443 : average time 1 : longest 76 : colisions 379
Hash insert 1223 to Hash of size 2039 ( 60%): time 1729 : average time 1 : longest 75 : colisions 583
Hash insert 1427 to Hash of size 2039 ( 70%): time 2000 : average time 1 : longest 69 : colisions 787
Hash insert 1631 to Hash of size 2039 ( 80%): time 3858 : average time 2 : longest 69 : colisions 6144
Hash insert 1835 to Hash of size 2039 ( 90%): time 17338 : average time 9 : longest 87 : colisions 49840
Hash insert 2039 to Hash of size 2039 (100%): time 36020 : average time 17 : longest 225 : colisions 110591
take away the larger the hashtable is the faster it will be, but that could mean a lot of unused space
and memory may be an concern, if not make it big. as you can see trying to make it small in relation to how many values
you plan to insert can really be a concern
another key aspect of a hashing approach is to create a hash key that will spread out our potential values evenly.
here are a few examples creating different hash keys for a string input value..
we are using names and our hashtable will have a size of 2047 ( a prime number )
/*
pass in a string , add up the ASCII values of all the individual characters, mod it with the
size of the hashtable we will insert into and return the value
*/
int hashFunction(string key)
{
int hashCode = 0;
for (int i = 0; i < key.length(); i++)
{
hashCode += key[i];
}
return hashCode%HASHSIZE;
}
results
Hash "Bob Roberts " hash1 = 1044
Hash "Rob Boberts " hash1 = 1044
Hash "Eduardo Lopez " hash1 = 1262
Hash "Batman " hash1 = 595
Hash "Bartman " hash1 = 709
there are two problems here , we have to similar names hash to the same value
also our potential values seem to be clustering in the 600-1200 range we want something like 0-2046
so lets try again
/* lets muliply the ASCII value of each character in the string by a constant to inrease the spread
of the values of the keys
*/
#define PRIME_CONST 31
int hashFunction(string key)
{
int hashCode = 0;
for (int i = 0; i < key.length(); i++)
{
hashCode += key[i] * PRIME_CONST;// make the key larger
}
return hashCode%HASHSIZE;
}
results
Hash "Bob Roberts " hash1 = 1659
Hash "Rob Boberts " hash1 = 1659
Hash "Eduardo Lopez " hash1 = 229
Hash "Batman " hash1 = 22
Hash "Bartman " hash1 = 1509
thats a bit better , now we are covering the range of 22-1659
our keys are more spread out in our hash
but Bob and Rob still have the same hash key value, becuase they have the same characters
so lets try again
/*
lets make it have something that looks at the position in the string as well
so Bob Roberts and Rob Boberts dont have the same key even if they have the exact same characters
*/
#define PRIME_CONST 31
int hashFunction(string key)
{
int hashCode = 0;
for (int i = 0; i < key.length(); i++)
{
hashCode += key[i] * i * PRIME_CONST; // ASCII * position in string * constant prime number
}
return hashCode%HASHSIZE;
}
results
Hash "Bob Roberts " hash1 = 255
Hash "Rob Boberts " hash1 = 318
Hash "Eduardo Lopez " hash1 = 1464
Hash "Batman " hash1 = 286
Hash "Bartman " hash1 = 276
now Bob and Rob are different but it still seems to be clustering in the 200-400 range
lets try to change the prime number we are multiplying to 67
#define PRIME_CONST 67
int hashFunction(string key)
{
int hashCode = 0;
for (int i = 0; i < key.length(); i++)
{
hashCode += key[i] * i * PRIME_CONST; // ASCII * position in string * constant prime number
}
return hashCode%HASHSIZE;
}
results
Hash "Bob Roberts " hash1 = 287
Hash "Rob Boberts " hash1 = 93
Hash "Eduardo Lopez " hash1 = 853
Hash "Batman " hash1 = 354
Hash "Bartman " hash1 = 1587
OK now thats a pretty good spread , we can probably insert names into the hash and minimize colissions
lets try a few more, im currious
Hash "Bob Roberts " hash1 = 287
Hash "Rob Boberts " hash1 = 93
Hash "Eduardo Lopez " hash1 = 853
Hash "Batman " hash1 = 354
Hash "Millhouse " hash1 = 1361
Hash "Lisa " hash1 = 1002
Hash "Homer J. Simpson " hash1 = 313
Hash "Marge Simpson " hash1 = 1867
Hash "Maggie Simpson " hash1 = 1385
Hash "Krusty The Clown " hash1 = 1981
Hash "Sideshow Bob " hash1 = 975
Hash "Mayor Quimby " hash1 = 235
Hash "Edna Krabappel " hash1 = 1875
looks good.