Archive for September, 2009

95% Done

I used to be this guy.

PHP Unique Hash

I want a unique alphanumeric hash who’s sequence is difficult to deduce. I could run it out to md5 and trim the first n chars but that’s not going to be very unique. Storing a truncated checksum in a unique field means that the frequency of collisions will increase geometrically as the number of unique keys for a lowercase alphanumeric hash approaches 36^n. I’d rather do it right than code myself a timebomb. So I came up with this.

It’s dense so don’t bother wading through it, I’ll walk through the finer points below.

class PseudoCrypt {
 
  private static $golden_primes = array(
    36 => array(1,23,809,28837,1038073,37370257 /*,1345328833*/)
  );
 
  public static function udihash($num, $len = 5, $base = 36) {
    $gp = self::$golden_primes[$base];
    $maxlen = count($gp);
    $len = $len > ($maxlen-1) ? ($maxlen-1) : $len;
    while($len < $maxlen && pow($base,$len) < $num) $len++; 
    if($len >= $maxlen) throw new Exception($num." out of range (max ".pow($base,$maxlen-1).")");
    $ceil = pow($base,$len);
    $prime = $gp[$len];
    $dechash = ($num * $prime) % $ceil;
    $hash = base_convert($dechash, 10, $base);
    return str_pad($hash, $len, "0", STR_PAD_LEFT);
  }
 
}

Here’s a use case

$ids = range(1,10);
foreach($ids as $id) {
  echo PseudoCrypt::udihash($id) . "\n";
}
m8z2p
8hy5e
uqx83
gzwas
38vdh
phug6
bqtiv
xzslk
k8ro9
6hqqy

Pretty random-looking, huh?

Alphanum = 36

Lowercase alphanumeric is a base 36 encoded base 10 integer. 1=1, 10=a, 36=z, 37=10, 48=1a, 1296=zz, 1297=100, etc.

36, 1296, 46656, 1679616, 60466176

1 character = 36 permutations, 2 characters = 1296 permutations, etc.

23, 809, 28837, 1038073, 37370257

23 = next highest prime from golden mean. 36-22-23, 1296-801-809, etc.

Uniqueness

I chose to use primes to ensure hash uniqueness.

Appearance of randomness

I chose primes near the golden ratio to maximize the appearance of randomness. Given a small set of hashes (even with the associated id) it would be difficult for anyone to guess the next hash.

Hash Length Overrun

Give it a low hash length and uniqueness will be preserved as it oversteps its range.

$ids = range(807,812);
foreach($ids as $id) {
  echo PseudoCrypt::udihash($id, 2) . "\n";
}
r3
dk
01
n4i
9dj
vmk

32-bit woe

I capped this example at a len of 5 because the modulus operator on any 32 bit machine (like this WinXP terdbox) chokes on len >= 6. I don’t anticipate a need for > 60.4 million of these suckers, so it’s a tradeoff I’m willing to make. I’d feel more comfortable releasing this as a default len of 6. Len of 6 (on a 64-bit system) will get you to 2.1 billion. I could probably make this work if I got into bit shifting, but… I don’t reeeeaaallly care. 60.4 mil is probably more than I will ever need.

This is a minimum security technique.

Keep your primes a secret and limit the number of hashes a user can get his hands on to make it harder for script kiddies to reverse engineer your algo. This is a thin rotation and base re-encoding obfuscation algorithm, not an encryption algorithm. Don’t use this to crypt sensitive info. Use it to obfuscate integer IDs.

Tools Used

Golden Ratio Calculator
Prime Numbers Generator

That’s it.

Any of you CS majors out there care to tell me what the technical term is for this type of hash?

MySQL: Timestamp vs Datetime

Rails uses timestamp, so I will too. Here’s a nice little matrix I found.


http://jmmolina.free.fr/t_49552/MySQL%20Date%20and%20Time%20Types.pdf

PHP 5.3 Install “configure: error: xml2-config not found.”

Solution:

sudo apt-get install libxml2-dev

Politically Incorrect Data Models

While this example may be a good illustration of a one-to-many relationship, it’s also a good way to get punched in the face.

For example if You have Husband and Wife model. Husband has one Wife and Wife belongs to Husband. In that case Wife must contain husband_id column because she belongs to Husband. From other hand Husband don’t have to contain any foreign key, because he is a base model and his Wife knows that she belongs to him.

A description language for RESTful web services

For any SOAP web service, you are very likely to find a WSDL which describes the service.

WSDL is an XML format for describing network services as a set of endpoints operating on messages containing either document-oriented or procedure-oriented information.

However, WSDL 1.1 is really not designed to describe RESTful web services. WSDL 2.0 includes language for better describing RESTful web services, but is still only a recommendation.

So suppose I wanted to automatically generate a JavaScript api for interacting with an arbitrary RESTful web service. I should be able to find an appropriate WSDL, parse the described resource types along with their properties, relationships and urls, create some internal model of the web service based on the properties of those resources and generate some JavaScript classes for interacting with the web service.

This is a rather simplistic example, but it shows what a partial service description might look like …

<service name="Twitter">
  <base href="http://twitter.com"/>
  <resources>
 
    <resource class="User">
      <base href="/users">
      <properties>
        <property name="id" type="int" />
        <property name="screen_name" type="string" />
      </properties>    
      <relationships>
        <relationship type="HasMany" target="Status">
      </relationships>
      <method name="GET"></method> #get user info
      <method name="POST">         #create new user
        <requires type="property" name="screen_name" />
      </method>
      <method name="DELETE">       #delete user
        <requires type="auth" name="owner" />
      </method>
    </resource>
 
    <resource class="Status">
      #...
    </resource>
 
    #...
 
  </resources>
</service>

After parsing this document in Javascript, I should be able to create a set of JavaScript classes that behaves somewhat like this …

Resource.load('http://twitter.com/service-description.wsdl', 'Twitter')
Resource.model('Twitter', 'User');
 
// GET a user's details
var user = Twitter.User.find('123');
var screen_name = user.screen_name;
 
// GET a user's most recent status update
Resource.model('Twitter', 'Status');
var most_recent_update = user.Status.last();
 
// POST a new status update on a user's behalf
var new_status = user.Status.new();
new_status.status = "Look ma, no library!";
user.Status.post(new_status);

Seems pretty powerful, huh?

But if you go to Twitter’s API documentation, you will not find any such machine-readable service description document. Nor does the markup for the plain-text description of the service contain any semantic markup that could be used to generate such a description document.

So why doesn’t Twitter post a machine-readable description of their web service? Are they just dumb?

Well, probably not.

I did about 2 days of digging around and finally came across the following article which leads to many additional articles on the subject (thanks to Bradley Holt in #REST on irc.freenode.net) …

Debate: Does REST Need a Description Language

… and this very succinct conclusion from Mark Baker

REST, WADL, forest, trees

So if you’re writing (or generating) contract/interface-level code which can’t late-bind to all resources, everywhere, you’re not doing REST [...]

Cut the cord already! RPC is dead. You’re not in Kansas anymore.

So if we’re not in Kansas, then where does that put us?