ORMs are CRUD

This diatribe is aimed at the simpicification of the venerable Standard Query Language and it’s bastard step-children.

Industry Standard ORMs

ORMs like Doctrine and it’s brethren are heralded as industry accepted data access libraries while the truth of their existence is rooted in misunderstanding and over simplification of the principles informing the development of the Standard Query Language. If all you want to do is read some shit and write some shit then your todo app is probably going to be a smashing success in the Nobody Gives a Fuck market. But if you have a meaningful, well normalized data set then it’s only a matter of time before these tools let you down in concerns of flexibility and auditability.

DQL, PhQL, WhateverQL

Making up your own Object Oriented Query Language to layer on top of SQL is not going to help you solve your problems. Query Builders are the tools of the devil and anyone who uses them will only more quickly infuriate the well educated and scientifically validated opinions of their (possibly non-yet-existent) Data Services department.

In Conclusion

Stop fucking around and learn to use SQL like an adult.

Symfony DIC is fucking slow.

If you’re thinking of using the Symfony Dependency Injection Container for your cat blog, then you go right ahead. You go inject the shit out of that motherfucker. You make that shit pure as fuck. 100% code cover all that shit. You go unit test the shit out of those fucking cats. Mock those fucking test suites all the way out the ass.

Just know that with every service you add, your site will get slower. Pages that don’t even use the CutePicService will instantiate it along with all it’s dependencies and all of their dependencies on every request. In fact, you’ll be instantiating every class in your entire fucking service graph on every request cause that’s just how baller you are.

Three problems and maybe a solution :

  1. Waking the beast
  2. The wrath of the graph
  3. Unforeseen Consequences
  1. Laziness

1. Waking the beast

Once your application grows to an appreciable size, building your Dependency Injection Definition itself may require an object graph with as many as 200 – 300 classes with obvious names like :

\Symfony\Component\DependencyInjection\Reference
\Symfony\Component\DependencyInjection\Parameter
\Symfony\Component\DependencyInjection\Variable
\Symfony\Component\DependencyInjection\Alias
\Symfony\Component\DependencyInjection\ParameterBag\ParameterBag

The overhead of parsing a yaml file on each request can be mitigated using APC Caching, but waking an object graph from APC consisting of a large number of objects will invariably cost you about 1 millisecond per 100 objects. There is no way to discard this bathwater without the baby.

2. The wrath of the graph

Worse is the cost incurred to actually instantiate a service. Depending on the complexity of your service graph, it’s very likely that instantiating a single service will cause the container to instantiate every service it can think of. Service instantiation times of great than 20ms are easily possible.

Here’s why:

<?php
 
class ServiceA {
    private $serviceB;
    private $serviceC;
    public function __construct($serviceB, $serviceC) {
        $this->serviceB = $serviceB;
        $this->serviceC = $serviceC;
    }
    public function doubleSuperRandomNumber() {
        return $this->serviceB->superRandomNumber() 
             + $this->serviceB->superRandomNumber();
    }
    public function doSomeShit() {
        return $this->serviceC->omgWtf() + 8;
    }
    public function justReturnTwelve() {
        return 12;
    }
}
class ServiceB {
    public function superRandomNumber() {
        return rand() * rand();
    }
}
class ServiceC {
    private $serviceD;
    public function __construct($serviceD) {
        $this->serviceD = $serviceD;
    }
    public function omgWtf() {
        return $this->serviceD->srslyDude() + 11;
    }
}
class ServiceD {
    public function srslyDude() {
        return 42;
    }
}

Now when we instantiate the controller, we must also instantiate ServiceA, ServiceB, ServiceC and ServiceD, even if all we want to do is print the number 12.

<?php
 
class JustafuckingController {
    private $serviceA;
    public function __construct($serviceA) {
        $this->serviceA = $serviceA
    }
    public function numberAction() {
        echo $this->serviceA->justReturnTwelve();
    }
}

This is a contrived example but you see the problem. Complex service graphs must be instantiated in whole regardless of the code path. This can take a lot of time if your graph is large given all the hidden logic that goes into conjuring this mess.

3. Unforeseen Consequences

The other problem with instantiating large object graphs in whole is that once your dependency graph grows sufficiently complex, making changes to a service at one end of the graph may have unintended consequences at the other end of the graph. If your Dependency Injection Definition is shared by multiple projects with separate configs, a new config param required at one end of the graph that’s missing from another project may break a distant service even if the project doesn’t use the service that requires the new config param.

Solution: Laziness

Lazy service instantiation is the only solution I see to the wrath. Unfortunately, the only means of doing so I know of using Symfony DIC is Container Injection which is purported to be filthy and whorish.

What the fuck am I supposed to do about this?

– – Update
Suggested solution to problem 2 from merk in #symfony using proxy objects
https://github.com/symfony/symfony/issues/5012

Tagged Cache Invalidation

Linked Cache Invalidation could be said to be an instance of a more general solution type one might refer to as Cache Object Dependency Invalidation. This blog post assumes some familiarity with LCI and is meant to explain the impetus behind the creation of a similar but alternate solution of the same type which I refer to as Tagged Cache Invalidation.

Background and History

In 2010, Mark’s 2006 Internet Draft http-link-header was accepted as RFC 5988 after 4 years and 10 revisions.

Around the same time, another brilliant engineer named Mike Kelly from the UK submitted a paper to WS-REST for a presentation detailing a use for Mark’s Link header in a cache invalidation technique he named Link Header Cache Invalidation.

Unbeknownst to Mike, Mark was already working on an implementation of a virtually identical concept.

Mike and Mark soon joined forces to co-author an Internet Draft formally specifying the new cache invalidation technique they named Linked Cache Invalidation which is presently on it’s 4th draft.

Mark’s implementation of this technique — squid-lci — has at least one very large scale production installation but exhibits a number of qualities which may make it unsuitable for widespread adoption :

  1. squid-lci requires squid 2.8 for which a stable version has still not been released and thus is not recommended for use in production.
  2. squid-lci is not forward compatible with squid 3 making it a poor candidate for adoption for users who wish to retain the option to upgrade squid in the future.
  3. squid-lci makes inappropriate use of squid’s log daemon.

Putting squid-lci through it’s paces in a side project I hacked together in 2011 between jobs convinced me that LCI was indeed a viable technique.

Houston, we have a problem.

At Beatport we make heavy use of http caching for published content and after a year as an application developer I received a user story for which LCI seems it might provide a good solution:

As a DJ, I can edit the title, description, event association, and track times of my mix after it has been published so I can fix errors in my published content.

It so happens that the day after scheduling this user story for the coming sprint, I boarded a plane for Greenville, South Carolina to attend RestFest, a barcampy unconference meeting of the minds for all things REST. The timing would turn out to be highly serendipitous.

RestFest

After a daylong hackathon, a bit of listening to presentations, a bit of presenting, a great deal of excellent southern cooking and more than a few beers with familiar like-minded folks, we were all in the zone.

During the course of a pleasant side conversation with Leonard Richardson, I remembered the problem and presented it for inspection to get a second opinion on whether LCI might be a valid solution. Leonard indicated to me that he was familiar with LCI and that it did indeed seem as though it might present a good solution to the problem.

Having no more slides to prepare as I had already given my lightning talk, I set out immediately to create a viable production ready technical implementation of Linked Cache Invalidation in Varnish (Beatport’s RPC of choice).

I scoured the net researching VCL to determine all available cache invalidation mechanisms in Varnish 3. Purge and Ban seemed they might do the trick.

The day was soon over so we all went out to dinner and crashed a local incubator drinkup.

Brick Walls

The next day I returned to the problem and quickly ran into a number of challenges:

  1. Accessing values for repeated headers in Varnish does not work well, if at all. VCL appears to assume the value of the first instance of each header and ignore the rest.
  2. Varnish comes pre-configured with a maximum header count limit of 64, which may be insufficient depending on whether the invalidation links are distributed across many headers. Not a show stopper but it could present problems.
  3. Varnish comes pre-configured with a maximum header length of 4096 characters. This could become a problem depending on the size and number of urls by which the application’s resources must be invalidated. At this point I am operating on the assumption that to function correctly for our application, the implementation must comfortably support at least 200 dependencies per object without significant modification to the default varnish settings.
  4. Finding and iterating over urls in invalidates headers is non-trivial since VCL is more of a configuration DSL than a real programming language. Implementing support for LCI’s rel=”invalidates” link headers will definitely require some programming in C, a language with which I am not terribly familiar.
  5. Invalidating resources across subdomains would require absolute URIs which could add significantly to the length of the required Link headers.

Given these constraints, I came to the following conclusions :

  1. I would need to cut a few corners in order to create a viable solution to the problem at hand
  2. LCI’s link header format is too verbose to comfortably fit 200 dependencies into 4096 characters
  3. I would need 2 different headers, one for dependencies and one for invalidations
  4. The invalidates header value would have to be a regex to avoid coding in C

Success

After a few hours of coding I had a concise, viable solution.

tci.vcl
sub vcl_fetch {
 
    if (beresp.status >= 200 && beresp.status < 400
    && (req.request == "PUT" || req.request == "POST" || req.request == "DELETE")) {
        ban("obj.http.x-invalidated-by ~ " + beresp.http.x-invalidates);
    }
 
}

The test app worked correctly and seemed quite efficient

a.php
<?php
 
header('x-invalidated-by: mix-a,track-b', false);
header('cache-control: s-maxage=86400');
 
if ($_SERVER['REQUEST_METHOD'] == 'POST') {
    header('x-invalidates: mix-a', false);
}
 
?>
<h1>A is invalidated by B</h1>
b.php
<?php
 
header('x-invalidated-by: track-b', false);
header('cache-control: s-maxage=86400');
 
if ($_SERVER['REQUEST_METHOD'] == 'POST') {
    header('x-invalidates: track-b', false);
}
 
?>
<h1>Changes to B invalidate both B and A</h1>

Advantages

  1. It effectively solves the problem at hand
  2. It’s fairly efficient
  3. It’s extremely concise
  4. It handles cross sub-domain cache object invalidation with no additional overhead

Drawbacks

  1. It uses custom http headers
  2. It’s only suitable for use in reverse proxy caches (equally true of LCI)
  3. It doesn’t solve the problem of out of band changes (equally true of LCI)
  4. Ban propagation for peered cache instances will require an out of band component with knowledge of all instances

Conclusion

Thank you for taking the time to read this post.

I’m hoping to deploy Tagged Cache Invalidation to production at Beatport in the next 3-6 weeks.

If you see any problems with this technique, please speak up. I want to hear what you think.

If this technique interests you, please pass this link to co-workers and acquaintances for comment.
I’m trying to gather as much feedback as possible concerning the viability and utility of this technique.

Asynchronous RSA key generation in Javascript

Source on GitHub – https://github.com/KevBurnsJr/rsasync

Generating RSA keys is a characteristically CPU intensive operation. This presents problems when operating on weak devices (such as the iPhone 3G) in environments under strict computation restrictions (such as safari mobile’s 10 second javascript execution timeout).

What is needed is an RSA key generation library that operates asynchronously in order to chug through the 2+ minutes of computation time required to generate a 512 RSA key on a weak device without bumping against the computation restrictions enforced by the safari mobile execution environment.

Some nearly suitable libraries do exist, but all of them fall short in some fashion.

  • Probably the closest is this asynchronous keygen from Atsushi Oka
    http://ats.oka.nu/titaniumcore/js/crypto/readme.txt
    However, the interface for Ats Oka’s library is not simple and the architecture of the code leaves something to be desired.

  • Cryptico is another library featuring RSA key generation which is also touted as a sort of all-in-one solution
    http://code.google.com/p/cryptico/
    However, this library really just glues together a bunch of already-available libraries and packages them as a unit.

  • jsbn is the underlying RSA key generator packaged with Cryptico and is available on its own
    http://www-cs-students.stanford.edu/~tjw/jsbn/
    This library has a fairly simple interface and is relatively fast and compact meeting most of my requirements.
    However, this library doesn’t do asynchronous key generation.

    But we can fix that.

jsbn RSA keygen times out after 11 seconds on the iPhone 3G for even a 256 bit key but with a little fenangling and a lot of setTimeouts, we can get it to handle a key of virtually any size for which the user has the patience to wait.

Here’s an example of the new async interface:

key = new RSAKey();
key.generateAsync(512, "03", function(){
    var pubKey = hex2b64(key.n.toString(16));
    alert(pubKey);
});

This was a great exercise in how to turn synchronous javascript into asynchronous javascript. Taking procedural code and breaking those for loops into recursive functions was a mind bender but once I figured out how it generally ought to work, each function became easier to port.

Originally I did it all inline, but later I ripped it all out into a separate file which extends Tom Wu’s jsbn.

// Copyright (c) 2011  Kevin M Burns Jr.
// All Rights Reserved.
// See "LICENSE" for details.
//
// Extension to jsbn which adds facilities for asynchronous RSA key generation
// Primarily created to avoid execution timeout on mobile devices
//
// http://www-cs-students.stanford.edu/~tjw/jsbn/
//
// ---
 
(function(){
 
// Generate a new random private key B bits long, using public expt E
var RSAGenerateAsync = function (B, E, callback) {
    //var rng = new SeededRandom();
    var rng = new SecureRandom();
    var qs = B >> 1;
    this.e = parseInt(E, 16);
    var ee = new BigInteger(E, 16);
    var rsa = this;
    // These functions have non-descript names because they were originally for(;;) loops.
    // I don't know about cryptography to give them better names than loop1-4.
    var loop1 = function() {
        var loop4 = function() {
            if (rsa.p.compareTo(rsa.q) <= 0) {
                var t = rsa.p;
                rsa.p = rsa.q;
                rsa.q = t;
            }
            var p1 = rsa.p.subtract(BigInteger.ONE);
            var q1 = rsa.q.subtract(BigInteger.ONE);
            var phi = p1.multiply(q1);
            if (phi.gcd(ee).compareTo(BigInteger.ONE) == 0) {
                rsa.n = rsa.p.multiply(rsa.q);
                rsa.d = ee.modInverse(phi);
                rsa.dmp1 = rsa.d.mod(p1);
                rsa.dmq1 = rsa.d.mod(q1);
                rsa.coeff = rsa.q.modInverse(rsa.p);
                setTimeout(function(){callback()},0); // escape
            } else {
                setTimeout(loop1,0);
            }
        };
        var loop3 = function() {
            rsa.q = nbi();
            rsa.q.fromNumberAsync(qs, 1, rng, function(){
                rsa.q.subtract(BigInteger.ONE).gcda(ee, function(r){
                    if (r.compareTo(BigInteger.ONE) == 0 && rsa.q.isProbablePrime(10)) {
                        setTimeout(loop4,0);
                    } else {
                        setTimeout(loop3,0);
                    }
                });
            });
        };
        var loop2 = function() {
            rsa.p = nbi();
            rsa.p.fromNumberAsync(B - qs, 1, rng, function(){
                rsa.p.subtract(BigInteger.ONE).gcda(ee, function(r){
                    if (r.compareTo(BigInteger.ONE) == 0 && rsa.p.isProbablePrime(10)) {
                        setTimeout(loop3,0);
                    } else {
                        setTimeout(loop2,0);
                    }
                });
            });
        };
        setTimeout(loop2,0);
    };
    setTimeout(loop1,0);
};
RSAKey.prototype.generateAsync = RSAGenerateAsync;
 
// Public API method
var bnGCDAsync = function (a, callback) {
    var x = (this.s < 0) ? this.negate() : this.clone();
    var y = (a.s < 0) ? a.negate() : a.clone();
    if (x.compareTo(y) < 0) {
        var t = x;
        x = y;
        y = t;
    }
    var i = x.getLowestSetBit(),
        g = y.getLowestSetBit();
    if (g < 0) {
        callback(x);
        return;
    }
    if (i < g) g = i;
    if (g > 0) {
        x.rShiftTo(g, x);
        y.rShiftTo(g, y);
    }
    // Workhorse of the algorithm, gets called 200 - 800 times per 512 bit keygen.
    var gcda1 = function() {
        if ((i = x.getLowestSetBit()) > 0){ x.rShiftTo(i, x); }
        if ((i = y.getLowestSetBit()) > 0){ y.rShiftTo(i, y); }
        if (x.compareTo(y) >= 0) {
            x.subTo(y, x);
            x.rShiftTo(1, x);
        } else {
            y.subTo(x, y);
            y.rShiftTo(1, y);
        }
        if(!(x.signum() > 0)) {
            if (g > 0) y.lShiftTo(g, y);
            setTimeout(function(){callback(y)},0); // escape
        } else {
            setTimeout(gcda1,0);
        }
    };
    setTimeout(gcda1,10);
};
BigInteger.prototype.gcda = bnGCDAsync;
 
// (protected) alternate constructor
var bnpFromNumberAsync = function (a,b,c,callback) {
  if("number" == typeof b) {
    if(a < 2) {
        this.fromInt(1);
    } else {
      this.fromNumber(a,c);
      if(!this.testBit(a-1)){
        this.bitwiseTo(BigInteger.ONE.shiftLeft(a-1),op_or,this);
      }
      if(this.isEven()) {
        this.dAddOffset(1,0);
      }
      var bnp = this;
      var bnpfn1 = function(){
        bnp.dAddOffset(2,0);
        if(bnp.bitLength() > a) bnp.subTo(BigInteger.ONE.shiftLeft(a-1),bnp);
        if(bnp.isProbablePrime(b)) {
            setTimeout(function(){callback()},0); // escape
        } else {
            setTimeout(bnpfn1,0);
        }
      };
      setTimeout(bnpfn1,0);
    }
  } else {
    var x = new Array(), t = a&7;
    x.length = (a>>3)+1;
    b.nextBytes(x);
    if(t > 0) x[0] &= ((1<<t)-1); else x[0] = 0;
    this.fromString(x,256);
  }
};
BigInteger.prototype.fromNumberAsync = bnpFromNumberAsync;
 
})();

I’m also adding a few functions for sleeping a public or private RSA key object to a simple JSON Transport Object (for storage) and waking it again.

(function(){
 
// Cast to private Transport Object
var RSAPrivTPO = function () {
    return {
        'n'     : hex2b64(this.n.toString(16)),
        'e'     : hex2b64(this.e.toString(16)),
        'd'     : hex2b64(this.d.toString(16)),
        'p'     : hex2b64(this.p.toString(16)),
        'q'     : hex2b64(this.q.toString(16)),
        'dmp1'  : hex2b64(this.dmp1.toString(16)),
        'dmq1'  : hex2b64(this.dmq1.toString(16)),
        'coeff' : hex2b64(this.coeff.toString(16))
    }
}
RSAKey.prototype.privTPO = RSAPrivTPO;
 
// Hydrate from private Transport Object
var RSAFromPrivTPO = function (tpo) {
    this.setPrivateEx(
        b64tohex(tpo.n), 
        b64tohex(tpo.e), 
        b64tohex(tpo.d), 
        b64tohex(tpo.p), 
        b64tohex(tpo.q), 
        b64tohex(tpo.dmp1), 
        b64tohex(tpo.dmq1), 
        b64tohex(tpo.coeff)
    );
    return this;
};
RSAKey.prototype.fromPrivTPO = RSAFromPrivTPO;
 
// Cast to public Transport Object
var RSAPubTPO = function () {
    return {
        'n' : hex2b64(this.n.toString(16)),
        'e' : hex2b64(this.e.toString(16))
    }
};
RSAKey.prototype.pubTPO = RSAPubTPO;
 
// Hydrate from public Transport Object
var RSAFromPubTPO = function (tpo) {
    this.setPublic(
        b64tohex(tpo.n), 
        b64tohex(tpo.e)
    );
    return this;
};
RSAKey.prototype.fromPubTPO = RSAFromPubTPO;
 
})();

Source on GitHub – https://github.com/KevBurnsJr/rsasync

Domain specific media types and REST

Below is a thread posted to rest-discuss. I’ve reposted my response here for easy linking and reading.
http://tech.groups.yahoo.com/group/rest-discuss/message/17650

Daniel Roussel posts:

Hi,

I’ve been reading a lot about how to do “proper” REST this week and the more I read, the more I’m lost, especially the HATEOAS part I fear.

First, to give some context, the company I work for develops mobile applications for clients. Most of the time, they want to get an iPhone native application, an Android application and a traditional Web based Application to cover the other mobile phones out there.

The way we are currently doing things is the good old (bad?) RPC over HTTP way. We define a bunch of URI which are coded inside the different apps, we exchange data as JSON, etc. This week, trying to do things in a better way, I’ve begin a more serious study of REST and how to do it properly.

What I really can’t wrap my head around is how, technically, have HATEOAS in a native application? I mean, when building a native application, I have tables to display lists, buttons to do some things, etc. My understanding is that all those should be displayed based on the data (hypermedia) received from the server. Is that right?

A concrete example would be a hotel room rental service. The person would open the application and have fields to enter the from/to dates. It would then tap a “Get Available Rooms”. The app would call the server and get back a list of rooms along with prices and other details. From there the person could select one room and rent it.

The RPC way of coding this is obvious to me but I have no idea how I’d do that in a proper REST way! What bugs me is that every way I look at it, the client application would still be tightly coupled to the service. I understand how I would only need to GET the http://rent-a-room.com URI hardcoded and then in the response I would have the http://rent-a-room.com/available-rooms URI given. But… My application would expect each “call” to return some pre-defined data and “rel”, those can’t appear out of the blue?!

I guess what I’m trying to say is that both the business process and the data exchanged must be known to my client application at the moment of coding it, and those can’t change without breaking existing clients. But reading about REST, every is talking about loose coupling and not breaking clients… I just don’t see it.

What am I missing?

Thanks a lot and sorry if it is a stupid question!

Daniel, Here are 3 ways you might use self-descriptive messages in your API

1) Create many domain specific media types (one for each view)

Content-Type: application/rent-a-room+xml

2) Create one domain specific media type

Content-Type: application/vnd.hotels.com+xml

3) Create zero domain specific media types

Content-Type: application/json
Link: </schema/rent-a-room>; rel="describedBy"

All three of these approaches could be seen as satisfying the self-descriptive messages constraint.

If you create many DSMs (domain specific media types), your application might bind the media type to the view class via some sort of client-side configuration.

"application/rent-a-room+xml"   =>   RentARoomView

If you create one DSM, your media type might specify the semantics by which a representation specifies details about itself which could be used in rendering the representation in a GUI.

{"_type": "rent-a-room", ... }

… which you might then bind to a view …

"rent-a-room"   =>   RentARoomView

If you create zero DSMs, your application might bind the value of the describedBy link header to a view in the gui.

"/schema/rent-a-room"   =>   RentARoomView

An alternative approach would be to create one DSM with a richer semantics which would effectively allow you to compose the interface from the server side using code-on-demand and/or more granular views

{"_links":[
    {"rel":"view","type":"text/javascript","href":"/views/RentARoomView.js"}
    {"rel":"commentable","type":"text/javascript","href":"/attributes/commentable.js"}
]}

This Code-on-demand approach would take greatest advantage of the constraints of REST to create a highly evolvable service by never binding anything directly to a view class within the application. Instead, your application would become a user agent, parsing representations and fetching additional computational resources as necessary to render the view.

Code-on-demand may be significantly less feasible if your client is written in object C, but perhaps it’s something to think about. The embedded links might not be javascript or CSS, but perhaps some other language used for GUI composition, such as XUL or a simple DSL.

Finally I’m sure it goes without saying that whatever way you wind up rendering a representation for a view, the UI would contain links which you would click to navigate to new screens which are built using the data and metadata from the representation of the resource identified by the link.

And there you have a few takes on creating an engine of application state with self-descriptive messages and code-on-demand.

Three Line Ternaries

These little buggers are so much more readable than their single-line brethren:

$yourmom['fat_joke'] = $yourmom['dinner'] instanceof Elephant
	? "so fat she ate Dumbo" 
	: "so skinny Africa sends her food";

Anonymous Functions in Mustache

When using Anonymous Functions in Mustache, it’s important to note this line from the spec:
https://github.com/mustache/spec/blob/master/specs/~lambdas.yml#L82

Lambdas used for sections should receive the raw section string.

This means that the inputs to your lambda will not be pre-rendered. The output of the lambda will be rendered.

So I had a lambda that looked like this…

function($timestamp) {
	return str_replace("T", " ", $timestamp);
}

But It was giving jibberish.
bobthecow (author of Mustache.php) helped me realize that I needed to render the inputs myself…

function($timestamp) use($view) {
	return str_replace("T", " ", $view->render($timestamp));
}

The lesson is this:
The next time I run into a problem, I’ll be sure to read the tests.

Thanks, Justin!

PHP Fatal Error 500

How to force PHP to return an empty 500 response when it encounters a fatal error:
Add this to the very beginning and very end of your bootstrap file

register_shutdown_function(function(){	
	if(!defined('REQUEST_SUCCEEDED')) {
		header("HTTP/1.1 500 Internal Server Error");
		if(getenv('APP_ENV') != 'dev') {
			ob_clean();
		}
	}
});
 
// ...
// ... your whole application ...
// ... from top to bottom ...
// ...
 
define('REQUEST_SUCCEEDED', true);

How to prevent a leak like Tumblr’s

Tumblr took a bit of a tumble today.
http://news.ycombinator.com/item?id=2343330

One of Tumblr’s engineers (presumably) deployed a file to production which contained a critical flaw.
The first character of the file was replaced with an `i` instead of a `<`.

i?php
    require_once('chorus/Utils.php');
    require_once('chorus/Kestrel.php');
    require_once('chorus/DataService.php');
    require_once('chorus/Shard.php');
 
    Database::set_defaults(array(
        'user'     => 'tumblr3',
        'password' => 'm3MpH1C0Koh39AQD83TFhsBPlOM1Rx9eW55Z8YWStbgTmcgQWJvFt4',
        'database' => 'tumblr3',
    //    'write_lock_tables' => '*',
        'extended_log' => (idate('G') == 17 && intval(idate('i')) == 56 && trim(`hostname`) == 'web10.tumblr.com')
    ));
 
    if (__FILE__ == '/var/www/apps/tumblr/config/config.php' || __FILE__ == '/data/tumblr/config/config.php') {
        define('ENVIRONMENT',      'production');
        if (! defined('DEFAULT_DATABASE')) define('DEFAULT_DATABASE', 'primary');
        define('S3_BUCKET',        'data.tumblr.com');
        define('ENABLE_PANTHER',   true);
        define('ENABLE_MEDIA_CDN', true);
        define('ASSETS_URL',       (ENABLE_MEDIA_CDN && ! (isset($_SERVER['HTTPS']) && $_SERVER['HTTPS']) ? 'http://assets.tumblr.com' : ''));
        define('MEMCACHE_HOST',         '10.252.0.68');
        define('MEMCACHE_VERSION_HOST', '10.252.0.67');
        define('VALIDATION_FAILURE_LOG', BASE_PATH . '/validate.log');
        # <snip>

Yes, that is tumblr’s production database password.

Full source here

Once this error was introduced to production, people viewing any page would see a dump of the first 749 lines of the config file along with some PHP errors. THEN, GoogleBot came along and indexed the whole mess which is why you can still see it in Google’s search results

Learning from the mistakes of others

How can we keep this from happening to us?

First of all, go to any project on your local dev environment, replace the first `<` with `i` in any included file in your project (hint config.php) and see what happens. Chances are you'll see exactly the same thing that happened to Tumblr.

The only solution I see to this is pre-commit syntax checking for committed PHP files.

Here's a tutorial for php syntax checking in SVN and here’s a pre-commit hook script for php syntax checking in GIT.

Basically the way it works is that if you ever commit a PHP file which contains a syntax error, your commit will be blocked and you will have to amend it before you are allowed to commit it to the repository. If Tumblr had done this, they might never have leaked their config file.

[UDPATE] Apparently both PHP’s built-in Syntax Checking and the PEAR package PHP_CodeSniffer won’t pick up on errors such as this. Searching for a valid solution now…

[UDPATE] The best solution I’ve seen so far is to always return 500 responses to clients with a generic error message in production, thus preventing errors like this from bubbling up at the Apache level.

[UDPATE] Turns out PHP returns 200 when it encounters a fatal error. Inconceivable.

[UDPATE] Here’s how to force PHP to return a 500 when it encounters a fatal error. Add this to a prepend file or make it the first 5 lines of your bootstrap file.

function die_with_honor() {
	header("HTTP/1.1 500 Internal Server Error");
	ob_clean();
}
register_shutdown_function('die_with_honor');

Mixing it up

I have 6 books on my nightstand, each on a different topic:

Lean Thinking process engineering
Metaphors We Live By role of language in psychology
Just Enough Software Architecture software architecture
Drive psychology of motivation
Web Operations operations
Leading Geeks management
Programming Erlang programming

This gives me the opportunity to jump around to follow whatever impulse is fueling me at any given time. If I’m feeling burned after working around some bogus proprietary API, I will jump into a book on Software Architecture to remind me that there is still a path toward sanity. If I’ve been teching out for days and feel like I’ve lost the forest for trees, I’ll pick up a book on psychology to remind me that software does still have the potential to assist human beings in a large number of not-necessarily-obvious ways. If I’ve spent all day writing emails and haven’t had time to touch a piece of code, I’ll jump into a book on process or management to give me new ideas on how to talk less and do more. If I’m feeling adventurous, I’ll pick up a book on a new language and type along for a few hours to advance my understanding of the universe.

And I do the same thing with projects:

Teambo A generic ticket tracking application
ToroPHP A PHP Framework
ripple-php A Riak ODM for PHP (ported from Ruby)
riak-php-client Riak’s PHP client which I’m extending to include support for Protocol Buffers

If I’m burnt on churning out HTML and CSS for days, I’ll step back and jump into something esoteric like adding a new transport protocol to an open source database client. If I’ve been crunching on a hard CS problem and start lacking steam, I’ll jump over to an application I’ve got on the back burner and knock out a few quick features to get the gears turning again. If I’m losing sleep over a cool new use case I’ve been turning over, I’ll flip on the lights and crank out a few test cases to put my ideas on paper as fodder for tomorrow’s session.

This sort of autonomy is crucial to remaining happy and productive.

Plant a thousand projects and let them flourish as they may.

Pretty JSON : Pipe to pj



    google image search for
    pipe and pjs

If you wind up doing a lot of curl from the command line (like you will using Riak), add this line to the bottom of ~/.bashrc

# ...
alias pj='python -mjson.tool'

Now when you’re curling, you can just pipe curl output to pj:

$ curl -s http://localhost:8098/riak/stats | pj

And this …

{"props":{"name":"stats","n_val":3,"allow_mult":false,
"last_write_wins":false,"precommit":[],"postcommit":[]
,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_s
td_keyfun"},"linkfun":{"mod":"riak_kv_wm_link_walker",
"fun":"mapreduce_linkfun"},"old_vclock":86400,"young_v
clock":20,"big_vclock":50,"small_vclock":10,"r":"quoru
m","w":"quorum","dw":"quorum","rw":"quorum"}}

Becomes this …

{
    "props": {
        "allow_mult": false,
        "big_vclock": 50,
        "chash_keyfun": {
            "fun": "chash_std_keyfun",
            "mod": "riak_core_util"
        },
        "dw": "quorum",
        "last_write_wins": false,
        "linkfun": {
            "fun": "mapreduce_linkfun",
            "mod": "riak_kv_wm_link_walker"
        },
        "n_val": 3,
        "name": "stats",
        "old_vclock": 86400,
        "postcommit": [],
        "precommit": [],
        "r": "quorum",
        "rw": "quorum",
        "small_vclock": 10,
        "w": "quorum",
        "young_vclock": 20
    }
}

Self-descriptive hypermedia in Riak

So… tell me about yourself.

GET /riak/people/old-macdonald
Accept: application/json

Not much to say, really. My name is Old MacDonald. I have a farm.

HTTP/1.1 200 Okay
Content-Type: application/json
Link:  </riak/people>; rel="up", \
    </riak/schema/person>; riaktag="describedby", \
    </riak/animals/trixie>; riaktag="pet", \
    </riak/locations/macdonalds-farm>; riaktag="farm"

{
    "name": "Old MacDonald"
}

Ya, I’ve heard the song. But lets pretend like I’m an alien.


WHAT IS A PERSON? BEEP BOOP

GET /riak/schema/person
Accept: application/json

A person is an animal of the species Homo Sapien.

HTTP/1.1 200 Okay
Content-Type: application/schema+json
Link:  </riak/schema>; rel="up"

{
    "id": "/riak/schema/person",
    "extends": "/riak/schema/animal",
    "type": "Person",
    "description": "A Homo Sapien",
    "properties": {
        "species": "homo sapien",
        "arms": { "type": "number", "default": 2 },
        "legs": { "type": "number", "default": 2 }
    }
}

What is an animal?

GET /riak/schema/animal
Accept: application/json

An animal is a member of the animal kingdom.

HTTP/1.1 200 Okay
Content-Type: application/schema+json
Link:  </riak/schema>; rel="up"

{
    "id": "/riak/schema/animal",
    "type": "Animal",
    "description": "A member of the animal kingdom.",
    "properties": {
        "species": { "type": "string" },
        "name": { "type": "string" }
    }
}

Okay, I get it. So tell me more about that `farm` you mentioned.

GET /riak/locations/macdonalds-farm
Accept: application/json

Dude. My farm is insane. You would not believe how many animals I have.

HTTP/1.1 200 Okay
Content-Type: application/json
Link:  </riak/people>; rel="up", \
    </riak/schema/location>; riaktag="describedby", \
    </riak/animals/wilbur>; riaktag="animal", \
    </riak/animals/bessy>; riaktag="animal", \
    </riak/animals/mr-ed>; riaktag="animal", \
    </riak/animals/donald>; riaktag="animal", \
    </riak/mortgages/macdonalds-farm>; riaktag="mortgage", \
    </riak/mortgages/macdonalds-farm-2>; riaktag="mortgage"

{
    "name": "Old MacDonalds Farm",
    "geo": { "lat": "30.000635", "lng": "-95.225313" }
}

What is a location?

GET /riak/schema/location
Accept: application/json

LOL wut? A location is a location. Are you high right now?

HTTP/1.1 200 Okay
Content-Type: application/schema+json
Link:  </riak/schema>; rel="up"

{
    "id": "/riak/schema/location",
    "type": "Location",
    "description": "A location",
    "properties": {
        "name": { "type": "string" },
        "geo": {
            "type": "object",
            "properties": {
                "lat": { "type": "number" },
                "lng": { "type": "number" }
            }
        }
    },
    "links": [
        {"map": "http://maps.google.com/maps?ll={geo.lat},{geo.lng}&z=18"}
    ]
}

No dude, I’m an alien. Remember? BEEP BOOP BEEP

Tell me about this animal on your farm.

GET /riak/animals/bessy
Accept: application/json

Bessy is a dairy cow. She goes `moo`.

HTTP/1.1 200 Okay
Content-Type: application/json
Link:  </riak/people>; rel="up", \
    </riak/schema/cow>; riaktag="describedby"

{
    "name": "Bessy the bovine",
    "type": "dairy"
}

What is a `cow`?

GET /riak/schema/cow
Accept: application/json

A cow is a producer of butter or a precursor to steak or an object of worship.

HTTP/1.1 200 Okay
Content-Type: application/schema+json
Link:  </riak/schema>; rel="up"

{
    "id": "/riak/schema/cow",
    "extends": "/riak/schema/animal",
    "type": "Cow",
    "description": "A Cow is a bovine animal.",
    "properties": {
        "type": { "type": "string", "enum": [
            "steer",
            "dairy",
            "sacred"
        ]},
        "legs": { "type": "number", "default": 4 }
    }
}

Tell me about your pet.


I still don’t know that she’s a dog. Or an animal for that matter.

I am programmed to treat URLs as opaque.

GET /riak/animals/trixie
Accept: application/json

Trixie’s a 3-legged Australian Shepherd. She got hit by a car when she was
young and lost the use of her front left leg.


Wait, how did you know she is a dog if you didn’t know she is a dog?

HTTP/1.1 200 Okay
Content-Type: application/json
Link:  </riak/people>; rel="up", \
    </riak/schema/dog>; riaktag="describedby"

{
    "name": "Trixie",
    "breed": "Australian Shepherd",
    "legs": 3
}

What are you talking about? I don’t even know what a dog is.

I’m an alien, remember? BEEP BOOP BOP BEEP

GET /riak/schema/dog
Accept: application/json

Whatever, dude. This whole routine is gettin’ kinda played out.

HTTP/1.1 200 Okay
Content-Type: application/schema+json
Link:  </riak/schema>; rel="up"

{
    "id": "/riak/schema/dog",
    "extends": "/riak/schema/animal",
    "type": "Dog",
    "description": "A Dog is a Canis lupus familiaris.",
    "properties": {
        "species": "Canis lupus familiaris",
        "breed": { "type": "string" },
        "legs": { "type": "number", "default": 4 }
    }
}

Ya, you’re right. Lemme just try 1 more thing.

DELETE /riak/animals/bessy

Dude.

HTTP/1.1 204 No Content

Hi, I’m a PHP Developer.

This year in #rest on Freenode



Full Size (1920×1100)
pdf also available

The more treacherous paths are yet unsolved

PBF Cube

Illustration used without permission from
The Perry Bible Fellowship Almanack
http://www.pbfcomics.com

We tend to see success as a constant. As though the puzzle of our existence were a common mystery. In our understanding we misunderstand that a success for some may easily be viewed as a failure by others who see the world in different terms. Living in a world such as ours in the top 1% of the planet’s wealthiest, it is difficult for us not to lead a life which has a carbon footprint one thousand times the average.

The chaos of a disenchanted existence while inglorious holds at least a candle to the dark reality fueling our unsustainable existence. This is an illustration of the indetermination expressed by those who accept their block with two yellows as an absolution of the responsibility they have to their contribution to this world. We all have double yellows at some places and levels of our being. So this is also an illustration of the ignorance expressed by those who feel shame in seeing that which makes them unique. Hoping nobody will see that their whites are touching their yellows.

A dark portrait of what our society has become.

That’s what I call rap

munin-node on EC2

If you’re trying to set up multiple munin nodes in EC2, you need to use the internal hostname to reference slave nodes.

munin.conf

[web01]
    address domU-22-31-38-12-25-04.compute-1.internal
    use_node_name yes

munin-node.conf

host_name web01
allow ^.*$

Voila!

Also be sure to open 4949 if you’ve got a security group set up.

Talent Acquisition

Talent Acquisition

compile php-cgi

If you’re compiling PHP and you can’t find the php-cgi binary, remove ‐‐with-aspx2 from the config params and you should be good to go.

php ubuntu error: xml2-config not found.

You need to install libxml2-dev

sudo apt-get install libxml2-dev

Next Page »