# DATABASE ALGORITHMS

Todd L. Veldhuizen. Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm. LogicBlox Technical Report LB1201, December 2012.

Recent years have seen exciting developments in join algorithms. In 2008, Atserias, Grohe and Marx (henceforth AGM) proved a tight bound on the maximum result size of a full conjunctive query, given constraints on the input relation sizes. In 2012, Ngo, Porat, Ré and Rudra (henceforth NPRR) devised a groundbreaking join algorithm with worst-case running time proportional to the AGM bound. Our commercial Datalog system LogicBlox employs a novel join algorithm, leapfrog triejoin, which compared conspicuously well to the NPRR algorithm in preliminary benchmarks. This spurred us to investigate its complexity. In this paper we establish that leapfrog triejoin is also worst-case optimal, up to a log factor, in the sense of NPRR. Moreover, leapfrog triejoin achieves worst-case optimality for finer-grained classes, for example, those defined by constraints on the size of projections of the input relations. We show that leapfrog triejoin is asymptotically faster than NPRR for some such classes. On a practical note, leapfrog triejoin can be implemented using conventional data structures such as B-trees, and extends naturally to existential queries. We believe our algorithm offers a useful addition to the existing toolbox of join algorithms, being easy to absorb, simple to implement, and having a concise optimality proof.

Todd L. Veldhuizen. Incremental Maintenance for Leapfrog Triejoin. arXiv:1303.5313, March 2013.

We present an incremental maintenance algorithm for leapfrog triejoin. The algorithm maintains rules in time proportional (modulo log factors) to the edit distance between leapfrog triejoin traces.

# GRAPH VISUALIZATION

Todd L. Veldhuizen. Dynamic Multilevel Graph Visualization. Eprint arXiv:cs.GR/07121549, December 2007.

We adapt multilevel, force-directed graph layout techniques to visualizing dynamic graphs in which vertices and edges are added and removed in an online fashion (i.e., unpredictably). We maintain multiple levels of coarseness using a dynamic, randomized coarsening algorithm. To ensure the vertices follow smooth trajectories, we employ dynamics simulation techniques, treating the vertices as point particles. We simulate fine and coarse levels of the graph simultaneously, coupling the dynamics of adjacent levels. Projection from coarser to finer levels is adaptive, with the projection determined by an affine transformation that evolves alongside the graph layouts. The result is a dynamic graph visualizer that quickly and smoothly adapts to changes in a graph.

# SOFTWARE ENGINEERING

Todd L. Veldhuizen. Parsimony Principles for Software Components and Metalanguages. In Generative Programming and Component Engineering, 6th International Conference, GPCE 2007, Salzburg, Austria.

Software is a communication system. The usual topic of communication is program behavior, as encoded by programs. Domain-specific libraries are codebooks, domain-specific languages are coding schemes, and so forth. To turn metaphor into method, we adapt toolsfrom information theory--the study of efficient communication--to probe the efficiency with which languages and libraries let us communicate programs. In previous work we developed an information-theoretic analysis of software reuse in problem domains. This new paper uses information theory to analyze tradeoffs in the design of components, generators, and metalanguages. We seek answers to two questions: (1) How can we judge whether a component is over- or under-generalized? Drawing on minimum description length principles, we propose that the best component yields the most succinct representation of the use cases. (2) If we view a programming language as an assemblage of metalanguages, each providing a complementary style of abstraction, how can these metalanguages aid or hinder us in efficiently describing software? We describe a complex triangle of interactions between the power of an abstraction mechanism, the amount of reuse it enables, and the cognitive difficulty of its use.

Todd L. Veldhuizen. Software Libraries and Their Reuse: Entropy, Kolmogorov Complexity, and Zipf's Law. In OOPSLA 2005 Workshop on Library-Centric Software Design (LCSD'05), San Diego, USA, 16 October 2005.

We analyze software reuse from the perspective of information theory and Kolmogorov complexity, assessing our ability to “compress” programs by expressing them in terms of software components reused from libraries. A common theme in the software reuse literature is that if we can only get the right environment in place— the right tools, the right generalizations, economic incentives, a “culture of reuse” — then reuse of software will soar, with consequent improvements in productivity and software quality. The analysis developed in this paper paints a different picture: the extent to which software reuse can occur is an intrinsic property of a problem domain, and better tools and culture can have only marginal impact on reuse rates if the domain is inherently resistant to reuse. We deﬁne an entropy parameter $$H \in [0,1]$$ of problem domains that measures program diversity, and deduce from this upper bounds on code reuse and the scale of components with which we may work. For “low entropy” domains with H near 0, programs are highly similar to one another and the domain is amenable to the Component-Based Software Engineering (CBSE) dream of programming by composing large-scale components. For problem domains with H near 1, programs require substantial quantities of new code, with only a modest proportion of an application comprised of reused, small-scale components. Preliminary empirical results from Unix platforms support some of the predictions of our model.

# PROGRAMMING LANGUAGES AND COMPILERS

Todd L. Veldhuizen. Active Libraries and Universal Languages. Doctoral Dissertation, Indiana University Computer Science, May 2004.

Universal programming languages are an old dream. There is the computability sense of Turing- universal; Landin and others have advocated syntactically universal languages, a path leading to extensible syntax, e.g., macros. A stronger kind of universality would reduce the need for domain- specific languages — they could be replaced by ‘active libraries’ providing not just the abstractions for a problem domain but also domain-specific optimizations and safety requirements. Experience suggests that much domain-specific optimization can be realized by staging, i.e., doing computations at compile time to produce an efficient run-time. Rudimentary computability arguments show that languages with a ‘Turing-complete kernel’ can be both stage-universal and safety-universal. But making this approach practical requires compilers that find optimal programs, and this is a hard problem. Guaranteed Optimization is a proof technique for constructing compilers that find optimal pro- grams within a decidable approximation of program equivalence. This gives us compilers whose kernels possess intuitive closure properties akin to, but stronger than, languages with explicit stag- ing, and can meet the ‘Turing-complete kernel’ requirement to be stage- and safety-universal. To show this technique is practical we demonstrate a prototype compiler that finds optimal programs in the presence of heap operations; the proof of this is tedious but automated. The proof ensures that any code ‘lying in the kernel’ is evaluated and erased at compile-time. This opens several inter- esting directions for active libraries. One is staging: we can synthesize fast implementation code at compile-time by putting code-generators in the kernel. To achieve domain-specific safety checking we propose ‘proof embeddings’ in which proofs are intermingled with code and the optimizer does double-duty as a theorem prover. Proofs lying in the kernel are checked and erased at compile-time, yielding code that is both fast and safe.

Todd L. Veldhuizen. Tradeoffs in Metaprogramming. In 2006 ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation (PEPM'06). Charleston, South Carolina, January 9-10, 2006.

The design of metaprogramming languages requires appreciation of the tradeoffs that exist between important language characteristics such as safety properties, expressive power, and succinctness. Unfortunately, such tradeoffs are little understood, a situation we try to correct by embarking on a study of metaprogramming language tradeoffs using tools from computability theory. Safety properties of metaprograms are in general undecidable; for example, the property that a metaprogram always halts and produces a type-correct instance is $$\Pi^0_2$$-complete. Although such safety properties are undecidable, they may sometimes be captured by a restricted language, a notion we adapt from complexity theory. We give some sufficient conditions and negative results on when languages capturing properties can exist: there can be no languages capturing total correctness for metaprograms, and no functional' safety properties above $$\Sigma^0_3$$ can be captured. We prove that translating a metaprogram from a general-purpose to a restricted metaprogramming language capturing a property is tantamount to proving that property for the metaprogram. Surprisingly, when one shifts perspective from programming to metaprogramming, the corresponding safety questions do not become substantially harder -- there is no jump' of Turing degree for typical safety properties.

Todd L. Veldhuizen. Language embeddings that preserve staging and safety. Nordic Journal of Computing 12(2) 189--198, April 2005.

We study embeddings of programming languages into one another that preserve what reductions take place at compile-time, i.e., staging. A certain condition -- what we call a Turing complete kernel' -- is sufficient for a language to be stage-universal in the sense that any language may be embedded in it while preserving staging. A similar line of reasoning yields the notion of safety-preserving embeddings, and a useful characterization of safety-universality. Languages universal with respect to staging and safety are good candidates for realizing domain-specific embedded languages (DSELs) and active libraries' that provide domain-specific optimizations and safety checks.

Todd L. Veldhuizen. Guaranteed Optimization for Domain-Specific Programming. In Christian Lengauer and Don Batory, editors, Domain-Specific Program Generation, volume 3016 of Lecture Notes in Computer Science, pages 306-324, 2004.

For software engineering reasons, it is often best to provide domain-specific programming environments in the context of a general- purpose language. In our view general-purpose languages are not yet general-purpose enough, and progress needs to be made before we can provide domain-specific languages that are both fast and safe. We out- line some goals in this regard, and describe a possible implementation technology: guaranteed optimization, a technique for building compilers that provide proven guarantees of what optimizations they perform. Such optimizers can provide capabilities similar to staged languages, and thus provide the relevant performance improvements. They can also function as decision procedures, suggesting an approach of ’optimizers as theorem provers,’ in which optimizing compilers can be used to check domain- specific safety properties and check proofs embedded in programs.

Todd L. Veldhuizen and Andrew Lumsdaine. Guaranteed Optimization: Proving Nullspace Properties of Compilers. In Proceedings of the 2002 Static Analysis Symposium (SAS'02), volume 2477 of Lecture Notes in Computer Science, pages 263--277, 2002.

Writing performance-critical programs can be frustrating because optimizing compilers for imperative languages tend to be unpredictable. For a subset of optimizations – those that simplify rather than reorder code – it would be useful to prove that a compiler reliably performs optimizations. We show that adopting a “superanalysis” approach to optimization enables such a proof. By analogy with linear algebra, we define the nullspace of an optimizer as those programs it reduces to the empty program. To span the nullspace, we define rewrite rules that de-optimize programs by introducing abstraction. For a model compiler we prove that any sequence of de-optimizing rewrite rule applications is undone by the optimizer. Thus, we are able to give programmers a clear mental model of what simplifications the compiler is guaranteed to perform, and make progress on the problem of “abstraction penalty” in imperative languages.

Todd L. Veldhuizen and Dennis Gannon. Active Libraries: Rethinking the roles of compilers and libraries. In SIAM Workshop on Object Oriented Methods for Inter-operable Scientific and Engineering Computing (OO'98), IBM TJ Watson Research Center, Yorktown Heights, New York, 1998.

We describe Active Libraries, which take an active role in compilation. Unlike traditional libraries which are passive collections of functions and objects, Active Libraries may generate components, specialize algorithms, optimize code, configure and tune themselves for a target machine, and describe themselves to tools (such as profilers and debuggers) in an intelligible way. Several such libraries are described, as are implementation technologies.

Todd L. Veldhuizen. C++ Templates are Turing-Complete. Unpublished note.

We sketch a proof of a well-known folk theorem that C++ templates are Turing complete. The absence of a formal semantics for C++ template instantiation makes a rigorous proof unlikely.

# SCIENTIFIC COMPUTING AND C++

Todd L. Veldhuizen. Techniques for Scientific C++. Indiana University Computer Science Technical Report #542, August 2000.

This report summarizes useful techniques for implementing scientific programs in C++, with an emphasis on using templates to improve performance.

Todd L. Veldhuizen. Arrays in Blitz++. In Proceedings of the 2nd International Scientific Computing in Object-Oriented Parallel Environments (ISCOPE'98), Santa Fe, New Mexico, December 8-11, 1998.

Numeric arrays in Blitz++ rival the efficiency of Fortran, but without any extensions to the C++ language. Blitz++ has features unavailable in Fortran 90/95, such as arbitrary transpose operations, array renaming, tensor notation, partial reductions, multicomponent arrays and stencil operators. The library handles parsing and analysis of array expressions on its own using the expression templates technique, and performs optimizations (such as loop transformations) which have until now been the responsibility of compilers.

Todd L. Veldhuizen. Expression Templates. C++ Report, Vol. 7 No. 5, June 1995, pp. 26--31.

Expression Templates is a C++ technique for passing expressions as function arguments. The expression can be inlined into the function body, which results in faster and more convenient code than C-style callback functions. This technique can also be used to evaluate vector and matrix expressions in a single pass without temporaries. In preliminary benchmark results, one compiler evaluates vector expressions at 95-99.5% efficiency of hand- coded C using this technique (for long vectors). The speed is 2-15 times that of a conventional C++ vector class.

Todd L. Veldhuizen. Using C++ Template Metaprograms. C++ Report, Vol. 7 No. 4, pp. 36--43.

# IMAGE PROCESSING

Todd L. Veldhuizen. Grid Filters for Local Nonlinear Image Restoration. M. A. Sc. Thesis, Department of Systems Design Engineering, University of Waterloo, Canada, 1998.

A new approach to local nonlinear image restoration is described, based on approximating functions using a regular grid of points in a many-dimensional space. Symmetry reductions and compression of the sparse grid make it feasible to work with twelve-dimensional grids as large as $$22^{12}$$. Unlike polynomials and neural networks whose filtering complexity per pixel is linear in the number of filter co- efficients, grid filters have $$O(1)$$ complexity per pixel. Grid filters require only a single presentation of the training samples, are numerically stable, leave unusual image features unchanged, and are a superset of order statistic filters. Results are presented for additive noise, blurring, and superresolution.