Various cow drawings from the Brueghel corpus

Eric E. Monson is a Research Scientist at Duke University’s Visualization & Interactive Systems Group. Elizabeth Honig is an Associate Professor in UC Berkeley’s History of Art Department. Her project, janbrueghel.net, serves a test corpus for NSF-funded research in computer vision and machine learning. In this post, Eric discusses current work with dictionary learning to analyze similarities between paintings in the Brueghel corpus.

One of the things I find most fascinating about being a person at the interface between the humanities and the sciences is the ways various fields can influence each other from across the perceived divide. Digital humanities is often focused on finding ways for technology and scientific methods to help humanists explore new avenues in their scholarship. When researchers in the sciences listen to the challenges faced during humanistic inquiry, they find a view of the world that forces them to rethink both software systems and algorithms. For example, how do you handle uncertain dates, or multiple, conflicting viewpoints in databases and visualizations? How can you find links between paintings and drawings based on their image features, and perhaps templates used behind their production? Asking this latter question brings us to a cutting edge in computer vision and machine learning research.

Let’s begin with the art historical questions

The art historians started out with questions about workshop processes in 17th-century Antwerp. The sons of the famous renaissance artist Pieter Bruegel were prolific producers of works that were pretty close copies of his compositions (in the case of Pieter the Younger) or original works that nevertheless drew upon his ideas and compositions (in the case of the younger son, Jan Brueghel). Moreover, both brothers seemed to have run large studios in which multiple copies and variants of each work they invented could be created; they also shared ideas and, evidently, drawings with artist friends, with whom they collaborated in the production of more paintings. Jan’s son, Jan the Younger, continued this family tradition of copying and varying compositions until the mid-17th century, nearly a century after his grandfather’s work. This presents a tangled network of well over a thousand interrelated paintings, drawings, and also prints, produced by three generations of a family with the help of studio assistants and colleagues.

The challenge then was to define degrees and manners of relationship between these works. Two works may seem to be direct copies of one another, but how exact is this correspondence? More difficult still is to trace the use of “pattern drawings,” drawings of individual motifs that were apparently kept in the studio to be used in creating new compositions. This could mean a group of figures, a horse and cart, a pair of exotic leopards, a cluster of moored boats, a castle on a rock, a windmill, or a garland of flowers. Motifs from prints, like the mutant creatures in Pieter Bruegel’s Vices series, were also used in this way. The output of the younger Brueghels’ workshops was highly diverse and the compositions were often very complicated, and this was evidently enabled by the inventive reuse of patterns which could be flipped or rescaled as needed. Similar details are often embedded in quite different compositions: a horse from a religious history painting trots along the road pulling a peasant cart in another work, for instance, or a monster from Pieter’s Lust reappears in his son’s Christ in Limbo. We wanted to begin tracking these repeated details so that we could identify often-used patterns and compare their use within various family workshops.

Successful intersections between art and machine learning

Machine learning is the process of using sets of rules and calculations (algorithms) to teach a computer how to automatically accomplish a task. One such task is judging the level of similarity between the content of digital objects like documents, images or sounds, or to determine what category they fall into. Applications can be things like, “Are these two photos images of the same person?”, “Are these customer reviews of our product positive or negative?”, or “Which of these bank transactions are fraudulent?” If we can train a computer to do these tasks reliably and quickly, we can process huge amounts of data that would have been too time consuming or expensive to accomplish with expert human judgments.

There are some successful and high profile examples of applying machine learning to the world of art. Babak Saleh and his collaborators at Rutgers University recently used computer algorithms to classify features of digitized paintings to try to automatically discover artistic influence (story, paper). One of our collaborators, Ingrid Daubechies, has used image analysis and machine learning to try to judge whether paintings are forgeries.

Dictionary learning can be used to detect characteristics of images

In this collaboration between Duke Math and UC Berkeley History of Art, we’re using a method called “dictionary learning” to detect similarity between images and image features. Here, I’ll present a conceptual overview of dictionary learning and how it differs between text analysis and image analysis.

When used in text analysis, a dictionary of words is a list of characteristic and common features of a written language, and any text in the same language can be represented to some level of approximation by the entries in that dictionary. Comparing how two texts map differently onto that dictionary—whether their words (or features) can be found in the dictionary, and the number of times those features appear— gives us a measure of similarity between those texts. This same principle can be applied at other scales within written language, for instance, at the level of typical pairs of words, or of phrases, or even at a much larger scale in the narrative structure. It is straightforward to compile dictionaries of single words, but much more complex to come up with different rule sets (algorithms) for how one would both detect and represent those much coarser-scale dictionaries of language, as well as methods for mapping different texts onto that dictionary. Surprisingly, it’s often more helpful to have a smaller dictionary than a larger one. If we’re trying to understand the differences between story genres, it will help with interpretation if we can refine our dictionary to only include the words, phrases or structures that are most characteristic of each genre. A smaller set is easier to visualize and understand, giving us more insight into the similarities and differences between written styles. Pruning the dictionary down to the essential traits in an automatic and accurate way is a major challenge in developing dictionary learning algorithms.

Mathematicians and computer scientists are trying to apply dictionary learning to other digital media, such as sets of images and sounds. At the finest scale, digital images are made up of a collection of pixels with different brightness and color values. In a black and white photo, one might look at the counts of different pixel grayscale values and come up with a list of typical instances – it’s just the ones where the counts are higher.

At a slightly coarser scale, an image can also be seen as a composite of small image patches, say 3x3 pixels, or 16x16, or any other size. Learning a dictionary of image patches, though, is much more difficult than it was for single pixels. The problem is now “high-dimensional”, which just means that instead of having only one value per individual pixel in our set, we now have many values per patch. For instance, any 3x3 patch of pixels has nine brightness values arranged in a grid. We can take the nine values (in an arbitrary, but consistent order), and make a list of numbers out of them. We call that list a nine-dimensional vector, or a point in a nine-dimensional space.

You’re probably used to seeing a collection of two-dimensional points plotted out using a scatter plot, where the two dimensions are measured out on the X and Y axes, and a mark is made at the correct spot for each 2D point. When all of the marks are made for our set, we can look at where there are clumps or clusters of points, and say the center of those clusters are characteristic 2D values for our set. There are also algorithms (again, sets of rules and calculations), that people have developed over the years to determine clusters and find their centers, so we do things in a consistent and automatic way.

Three dimensions is the same as two, but now with three values we have an X, Y, and Z value, and each point in our set is a mark in 3D space. The clusters could be viewed if we use a computer and rotate around the 3D plot to see where there are clumps, but again, we would typically have the computer do it for us, so it’s a more objective measure. To mathematicians, though, it’s exactly the same concept and process in a nine-dimensional or even 2000-dimensional space as it is in 2D or 3D!

As long as we’re taking our image patches from real photos or paintings, there won’t just be a cloud of random points in our high-dimensional space, but points will be related to each other in ways that create clusters or cause the points to lie on lower-dimensional surfaces. If algorithms can detect that lower-dimensional structure, classification and dictionary learning can be done more easily, and in a way that’s both efficient and interpretable. Efficient is important because a huge collection of large images will include millions and millions of patches to process and compare. An interpretable solution is important because we’re doing all of this work to help humans make sense of our image collections.

Geometric Multi-Resolution Analysis with invariances for image matching

This collaboration is funded through the National Science Foundation Computational Mathematics Program, and is called Structured Dictionary Models and Learning for High Resolution Images. Mauro Maggioni is one of the Principal Investigators, and he’s using this application to art historical image sets to extend his dictionary learning algorithm, Geometric Multi-Resolution Analysis in new directions by adding something called “invariances”.

Mauro and his colleagues have already developed methods for calculating image characteristics at a variety of scales. This multi-scale approach is critical because most images have important features all the way from very fine scales, which may be mostly textures like brush strokes, to very coarse scales, like objects, scenes, or perhaps even moods. We would also like to recognize objects as similar even if they appear at different scales (sizes). Ideally, a tiny horse in a complex village scene should be recognized as similar to a large horse that fills most of the painting in a picture of a farmer with his cart if their forms and perspective are similar. When this is accomplished, the feature matching is said to be “scale invariant”, because the answer doesn’t vary with the image scale. If two horses are matched even though one is close to the center of an image patch, and the other is right up against the edge of a patch, this matching is said to be “translation invariant”. The same goes for rotation and reflection.

Moving forward

We are still in the middle of our algorithm development and testing, but it is exciting to be working on a project where the needs of both mathematicians and art historians match up so well! Mauro is busy developing invariant measures and writing efficient code. One of Ingrid Daubechies’ graduate students, Rachel Yin, is testing out alternative methods for image matching using Deep Neural Networks and the Caffe framework (also from Berkeley). Elizabeth Honig’s students are manually extracting and tagging image features from Brueghel’s paintings and drawings for use in algorithm testing. Finally, I’ve been acting as a bridge between the collaborators; collecting, processing, organizing and visualizing data on the Duke side, and communicating with our Berkeley colleagues to understand their needs and guide the manual extraction and tagging efforts. We’ll submit another post in the future to report progress.