PHP Unique Hash

I want a short alphanumeric hash that’s unique and who’s sequence is difficult to deduce. I could run it out to md5 and trim the first n chars but that’s not going to be very unique. Storing a truncated checksum in a unique field means that the frequency of collisions will increase geometrically as the number of unique keys for a base 62 encoded integer approaches 62^n. I’d rather do it right than code myself a timebomb. So I came up with this.

I’ll walk us through the finer points below.

class PseudoCrypt {
 
	/* Next prime greater than 62 ^ n / 1.618033988749894848 */
	private static $golden_primes = array(
		1,41,2377,147299,9132313,566201239,35104476161,2176477521929
	);
 
	/* Ascii :                    0  9,         A  Z,         a  z     */
	/* $chars = array_merge(range(48,57), range(65,90), range(97,122)) */
	private static $chars = array(
		0=>48,1=>49,2=>50,3=>51,4=>52,5=>53,6=>54,7=>55,8=>56,9=>57,10=>65,
		11=>66,12=>67,13=>68,14=>69,15=>70,16=>71,17=>72,18=>73,19=>74,20=>75,
		21=>76,22=>77,23=>78,24=>79,25=>80,26=>81,27=>82,28=>83,29=>84,30=>85,
		31=>86,32=>87,33=>88,34=>89,35=>90,36=>97,37=>98,38=>99,39=>100,40=>101,
		41=>102,42=>103,43=>104,44=>105,45=>106,46=>107,47=>108,48=>109,49=>110,
		50=>111,51=>112,52=>113,53=>114,54=>115,55=>116,56=>117,57=>118,58=>119,
		59=>120,60=>121,61=>122
	);
 
	public static function base62($int) {
		$key = "";
		while($int > 0) {
			$mod = $int-(floor($int/62)*62);
			$key .= chr(self::$chars[$mod]);
			$int = floor($int/62);
		}
		return strrev($key);
	}
 
	public static function udihash($num, $len = 5) {
		$ceil = pow(62, $len);
		$prime = self::$golden_primes[$len];
		$dec = ($num * $prime)-floor($num * $prime/$ceil)*$ceil;
		$hash = self::base62($dec);
		return str_pad($hash, $len, "0", STR_PAD_LEFT);
	}
 
}

Here’s a use case

$ids = range(1,10);
foreach($ids as $id) {
  echo PseudoCrypt::udihash($id) . "\n";
}
cJio3
EdRc6
qxAQ9
TGtEC
5ac2F
huKqI
KE3eL
wXmSO
YrVGR
BBE4U

Pretty random-looking, huh?

Base 62

Hashes are base 62 encoded base 10 integers. 1=1, 10=a, 36=Z, 61=z, 62=10, 72=1a, etc.

62, 3844, 238328, 14776336, 916132832, 56800235584, 3521614606208

1 character = 62 permutations, 2 characters = 3844 permutations, etc.

41, 2377, 147299, 9132313, 566201239, 35104476161, 2176477521929

41 = next highest prime from golden mean of 62.
2377 = next highest prime from golden mean of 3844.

Uniqueness

I chose to use primes to ensure hash uniqueness. Any prime greater than one half 62^n will do, but if you use a prime near 62^n or 62^n/2 or 2*62^n/3 etc, you will detect a linearity in the sequence at certain points in the ring.

Appearance of randomness

I chose primes near the golden ratio to maximize the appearance of randomness. Given a small set of hashes (even with the associated id) it would be difficult for anyone to guess the next hash.

This is a minimum security technique.

Keep your primes a secret and limit the number of hashes a user can get his hands on to make it harder for script kiddies to reverse engineer your algo. This is a thin rotation and base re-encoding obfuscation algorithm, not an encryption algorithm. Don’t use this to crypt sensitive info. Use it to obfuscate integer IDs.

Tools Used

Golden Ratio Calculator
Prime Numbers Generator

That’s it.

Any of you CS majors out there care to tell me what the technical term is for this type of hash?

12 Comments so far

  1. German on February 9th, 2010

    Nice! this is what I was looking for. But it’d be nicer if you could code a decoder to get the original number back. Is there any chance I could get that? I don’t think ‘m capable to code it myself. Thanks for your time anyway!

  2. Pablo on February 11th, 2010

    Hey Kev, that’s very useful!
    Obviously not if you need something secure, but to obfuscate ids it’s quite simple and powerful (compared with some things going around in the web).

    Have you ever needed to unhash its results?
    it may be of some use if you are hashing ids…

  3. Tim on April 1st, 2010

    Wonderful function — but the output diverges above 58 on two systems I’m testing on. I suspect it’s because one’s a 32bit machine, and the other 64.

    Any thoughts on a fix for that?

    If I find one, I’ll come back and post.

  4. Morteza on July 6th, 2010

    Hey, that’s very useful!! I wanted to use unique IDs to generate a code as password for my users so they can come back later on the site, and enter this code, so they can edit what they entered. your algorithm is useful in my case.

    thank you.

    @Tim: I think, not needing the inverse function, it is not important to have different values for one number.

  5. Warren Wilkinson on March 17th, 2011

    Hey German, if you want decodable ones — would printing random numbers in base 64 work? Most languages let you print in base 16 (hex) and usually (with some sleuthing) have ways to read/write base 64.

  6. Druczki Pocztowe on March 28th, 2011

    Try rining the script with: PseudoCrypt::udihash($id, 10)

    All keys wil have values:
    000000000000
    000000000000
    000000000000
    000000000000
    000000000000
    000000000000
    000000000000
    000000000000
    000000000000
    000000000000

    How to solve it?

  7. KevBurnsJr on March 28th, 2011

    10 is probably too large. I suggest you run it w/ 5 or 6 and append some random data.

    $hash = PseudoCrypt::udihash($id, 5);
    $rand = PseudoCrypt::base62(rand(0,pow(62,5)));
    $rand = str_pad($rand, 5, “0″, STR_PAD_LEFT);
    $key = $hash.$rand;

  8. John on March 28th, 2011

    Hi, I implemented this in Factor, if you’re curious:

    http://re-factor.blogspot.com/2011/03/unique-hash.html

  9. Jenny on May 6th, 2011

    How many total “unique” possibilities are we talking about here?

  10. Jenny on May 6th, 2011

    Me again :)
    I wanted to know if you think it would be safe to run this command instead of your look.

    echo PseudoCrypt::udihash(time());

    I’m trying to create unique HASH to be inserted as KEY on database, but generating your way the hash always stays the same.

    Please let me know if there is any draw back to this solution.

    Thank you!

    Jenny

  11. KevBurnsJr on May 6th, 2011

    Using time() would probably not be a good idea, just because there’s a good chance that two records may be created within the same second.

    What I usually do instead for keys is

    echo time().’-’.base_convert(rand(1,getrandmax()), 10, 36);

    which yields a key like 1304134276-udxxi7

  12. Alec Gorge on September 19th, 2011

    Kevin,

    Great algorithm! I wanted to use it in my upcoming Node.js project, so I made a CommonJS/Node.js module available here: https://github.com/alecgorge/node-psuedohash

    Thanks again for putting this together!

Leave a reply