Saturday, February 19, 2005

Thinking: Runtime Geography of a Java VM

Alex Miller read about my notion of runtime geography and posted some excellent analysis. As I've thought about it more, I agree that the notion of runtime geography is more intuitive and feels more general than the notion of an ontology, so let's run with it and see where we end up.

If we think about locating things on a map, there are numerous types of things, and their locations can be specified in a variety of ways. For example, we have towns and cities, which have a name (not always unique) and occupy a particular area centered on a point that we can specify with a longitude and lattitude. We can also talk about things in a relative manner ("Chicago is about 350 miles north-northeast of St. Louis). We sometimes want to find things based upon nearness (Where's the nearest pizza place that delivers?). We can also talk about roads, rivers, right-of-ways, etc., all which follow a route (which is usually specified by a width and a series of points specifying its path).

So, if we're going to have a runtime geography for the objects in the VM, we have to solve several problems and answer a number of questions:

  • What are the various dimensions used to create the space? Anything that might be a dimension must be something for which we can directly assign a value to an object, or one for which we can derive a value for an object based upon some affiliated/associated object.

  • How do we make the geography available to Java code? Once we've come up with some definition of the runtime location of objects, we need to be able to assign a location to an object and we then need to be able to specify a location or region to locate one or more objects.

  • How do we avoid conflicts with other ways of organizing objects in the VM? There are at least two ways to define a 'location' for an object at runtime today: the ClassLoader that loaded it's class and the package it is defined in. We might be tempted to include the Thread(Group} which is executing code of this object, but this has the problem that (a) it is transient and (b) it can result in multiple locations for the same object. We should leverage these if possible or at least not clash with them.

  • How does our solution fit into a distributed system? There are many situations where multiple Java VMs communicate with each other. Should the runtime geography be a purely internal thing which is never exposed outside a VM, or should we expose it and use it across VMs?


To make the whole discussion concrete, I'm going to focus on two particular uses of a runtime geography that are fairly different from each other. While I hope a well-conceived runtime geography will be applied to many problems, these two seem like a good start:

  • Dependency injection at object construction time - finding objects required by the constructed object based upon where in the program the object is created

  • Location-dependant logging - specifying different logging levels for the same kind of object used in different locations in the program. For example, assume StringTokenizer produced logging messages for your favorite logging API. If a JDBC driver and some user-written piece of code in the same program use a StringTokenizer and you want to crank up the logging messages for StringTokenizers 'inside' the JDBC driver but not 'inside' the rest of the program, how can we use runtime geography to do this? What if we want to crank up all logging inside the JDBC driver?


In addition, we should meet some design goals:

  • Make things as simple to use as possible.

  • Make the description and use of geography easy to understand - in particular, draw upon the analogy of real-world geography as much as possible.

  • Put information in as few places as possible - avoid multiple configuration files that all depend upon each other and which depend upon magic strings or magic numbers. Of course, this can compromise the utility of things if we overly restrict the facility, so we need to 'right-size' it.

  • Geography locations should be defined at compile-time, just as in real-world geography. This is essentially saying that objects won't change their location during runtime. This might conflict with use in a distributed manner, but seems like a good simpifying assumption, at least for now.

  • It should be possible to specify the 'query' used to navigate the geography both at compile-time and at runtime.

  • Don't conflict with other ways of defining object location.


  • This feels like enough to start mashing ideas together. Now it's time to do the serious thinking... :-)

    No comments: