Archive for January 2009

Reading STDIN from Java

Micro-blog here, just to note this for my benefit and others. It started as a joke tweet:

I was going to tweet the coolest thing I learned about Java, but the line was too long.

Turns out, the minimized line would have fit in twitter's 140 character textbox, but wouldn't have allowed any context. It was actually a rather useful few lines of code, and I couldn't find the info in my own Googling so I'll share it in full.

The goal: read optional data from a pipe in Java/Rhino. There was a code snippet on the Rhino WikiPedia page doing just this, but it was a generic "read line by line" input prompt. It worked when you piped data to it, but would halt execution waiting for an EOF if nothing was in the pipe. I wanted this to run like any standard Unix shell program where pipe data is optional.

After a little doc searching, and head scratching I found the .ready() method of BufferedReader. Initially, I thought it was just a property ... print(!!stream.ready) said true, both when pipe data was available and without. Head-slapping moment: typeof ready == function. stream.ready() is false if no stream data is present:

 
importPackage(java.io);
importPackage(java.lang);
(function(args){
	// setup the input buffer and output buffer
	var stdin = new BufferedReader(new InputStreamReader(System['in'])),
	    lines = [];
 
	// read stdin buffer until EOF (or skip)
	while(stdin.ready()){
		lines.push(stdin.readLine());
	}
 
	if(lines.length){ /* handle buffer data */ }
 
})(arguments);
 

Then, we make a little shell script to run this via Java / Rhino. Call it stdin.sh:

 
#!/bin/sh
java -jar js.jar stdin.js "$@"
 

Now, we can pipe data to the script and process it in JavaScript later:

 
cat file.txt | ./stdin.sh
./stdin.sh < file.txt
find *.html | ./stdin.sh --filelist
 

In the last example, we passed an argument to the stdin.sh script, which is passed along to the arguments of the bootstrap .js file (stdin.js) ... Checking for that is trivial, as Rhino implements a sane modern version of JavaScript (the one with Array.forEach and Array.indexOf):

 
(function(args){
     var useFileList = args.indexOf("--filelist") > -1;
})(arguments);
 

Then we'd just make whatever code uses this treat the stdin data as a list of files rather than a raw dump of data. Magic.

[more] JavaScript tips

A few weeks ago, Thomas Fuchs (author of script.aculo.us) posted a few "JavaScript tips" that sparked my interest ...

The first, a post on Credit Card validation, and the importance of client-side validation because of potential costs and overhead associated with invalid CC requests to a provider. It immediately reminded me of dojox.validate, Dojo's own in-house validation extension. I was certain there were existing functions to do all the things mentioned in the blog, and it turns out the dojox.validate.creditCard module I ported from Dojo 0.4 is still valid and kicking. It provides methods for:

  • Determinging CC type by #
  • Determinging CC number by type
  • Simple validation of the CCV #
  • Determining validity by length and LUHN of CC #

... and a few other methods. I recently updated it to expose the map of CC types to the regular expressions that match them so users could extend the validation to check their own custom Gift Card / Gift Certificate numbers. For instance, creating a custom type 'my' that starts with 7, made of only numbers, and is 15 characters long:

 
dojo.require("dojox.validate.creditCard");
dojo.mixin(dojox.validate._cardInfo, {
	"my":"7[0-9]{14}"
});
 

The dojox.validate package also has out of the box email, url, and number validation, as well as locale-specific validation for edge cases like validating a Canadian Social Health number. This new extension point for CC's will be available in Dojo 1.3, as it was a private variable previously. I also cleaned up the inline documentation and unit tests for the project, though the modules have been very stable for a long time now it seems.

I setup an incredibly crude test on my Dojo sandbox using the Google CDN and Dojo 1.2.3. It simply prevents a form from submitting if an invalid credit card is detected, styling the input red. There is no endpoint, so no CC#-stealing is taking place ... I promise.

The other JavaScript Tips post was providing a Prototype-based DRY hint. It was a great tip. The example was a "page turner", using the same function for two actions, and currying nextPage and prevPage functions around that. The technique is sound. It made me think of dojo.hitch's lesser-used cousin dojo.partial. To summarize:

 
var turnPage = function(direction){
      // do something if direction is -1 or 1
}
var nextPage = dojo.partial(turnPage, 1);
var prevPage = dojo.partial(turnPage, -1);
 
// goto next page.
nextPage(); // calls turnPage(1);
 
// setup the nav:
dojo.query("a.nextPageLink").onclick(nextPage);
dojo.query("a.prevPageLink").onclick(prevPage);
 

This is all well and good. With dojo.partial I was able to simulate Thomas's example without issue -- but it got me thinking. I noticed in the blog example the 'curry' method was being called directly on the function. eg:

 
var turnPage(dir){ /* doit */ },
    nextPage = turnPage.curry(1),
    prevPage = turnPage.curry(-1);
 

Which means it exists on the native prototype ... Though it is only an opinion, one typically should avoid extending native objects (in this case Function) with methods ... I would go so far as to say it is a bad practice overall. That fact, however, didn't squelch my curiosity as to how this was implemented, so I set out to to write a curry implementation in plain JavaScript that worked like dojo.partial except transparently on Function.prototype.

Initially, I tried using dojo.partial direcly, but couldn't quite get the arg-shifting and scopes right ... I turned to my friend and colleague Eugene Lazutkin to help me write a function that behaved similarly to dojo.partial in that the 'hitched'/'curried' function would be called with the passed args first, as well as any arguments passed to the returned function. eg:

 
var foo = function(a,b,c){ /* code */ };
var bar = dojo.partial(foo, "one");
bar(); // foo("one")
bar("baz"); // foo("one","baz");
 

So we came up with a little function to do just this. Just a few lines of code, actually:

 
// a generic simple currying function. (thanks uhop!)
var curryFunc = function(){
	var me = this, a = Array.prototype.concat.apply([], arguments);
	return function(){
		return me.apply(this, a.concat.apply(a, arguments));
	}
};
 

So we have this function which refers to 'this', generates an Array (via .concat magic) from the passed 'arguments', and returns a function that will call (.apply) the original function (me) with those arguments mixed in with any others. We can see this function working without extending any native objects by just referencing it on an instance:

 
// setup the test:
var bar = function(a, b, c){ console.log('bar', a, b, c); }
bar.partial = curryFunc;
 
// curry bar
var foo = bar.partial(1, 2);
foo(); // echos 'bar', 1, 2, undefined
foo(3); // echos 'bar', 1, 2, 3
 

But adding the 'partial' function to every function we define seems like a lot of extra work. While I'm entirely happy calling dojo.partial(someFunction, "curried","args"), this experiment was to implement this function for all Functions. Easy enough:

 
// don't do this, please.
Function.prototype.curry = function(){
	var me = this, a = Array.prototype.concat.apply([], arguments);
	return function(){
		return me.apply(this, a.concat.apply(a, arguments));
	}
};
 

Which allows you (without using Dojo or Prototype for that matter) to write code like:

 
// create the function and the curried function:
var foo = function(a,b){ console.log(a,b); }
var bar = foo.curry("first");
 
// test:
bar(); // "first", undefined
bar("baz"); // "first", "baz"
 
// create a curried function from a curried function:
var baz = bar.curry("second");
baz(); // "first", "second"
 

Though, upon seeing my test page, Alex immediately scolded me for touching Function.prototype at all, and suggested I set ENUMMERABLE off so it doesn't show up in for( in ) loops. I didn't get that far. I like using dojo.partial, and don't recommend doing this at all.

The article made me think about how Dojo recently re-factored dojo.trim to defer to a native String.trim() function if available ... It seems a great idea, but I was interested to see if anyone had provided String.prototype with this function themselves, like:

 
// don't do this, please. Give all strings a .trim() method.
String.prototype.trim = function(){ return this.replace(/^\s\s*/, '').replace(/\s\s*$/, ''); }
 
// safer, recommended. Make a function to trim a string.
my.trim = function(str){ return str.replace(/^\s\s*/, '').replace(/\s\s*$/, ''); }
 

Google says quite a few in fact, or at least a lot of people talk about it. So Dojo will defer to their implementation in edge cases (if it exists prior to dojo.js being on a page.) The API for trim() is simple and fairly universal so it shouldn't be too much an issue, but illustrates the complications extending native objects introduces. I'm not sure my Function.prototype.curry() method is 100% API compatible with prototypejs's implementation, so in creating it I am also creating potential interoperability issues. Fairly certain I am not. Short of looking at the code, .curry is to dojo.partial as .bind is to dojo.hitch, so the implementations should behave well together, but I'd like to avoid it none the less.