Matt Boutell's Selected Research Publications

Research Summary

My primary research interest is understanding digital home photograph collections using semantic scene classification. Most of my research uses various types of context, such as that provided by camera settings, other photos in a collection, or spatial relationships between objects in a scene, to improve classification accuracy. I have also worked on determining image orientation, and on general issues involved in learning by example.

I have involved myself in robotics and computing education research. With colleagues, I researched, developed, and published an educational robotics curriculum. With a colleague, I also developed and published a pedagogical technique for challenging advanced first-year computer science students.

Publications are given here in reverse chronological order. Many differ from the published versions in formatting and minor edits. Please reference the published versions.

Journal Papers

A multidisciplinary robotics minor

Matthew Boutell, Carlotta Berry, David Fisher, and Steve Chenoweth. ASEE Computers in Education Journal, Special Issue on Novel Approaches to Robotics Education, 1(3), pp. 102-111, July 2010.

Abstract: Rose-Hulman Institute of Technology recently created a multidisciplinary robotics minor. In this program, students first gain depth in their discipline, majoring in Computer Science or Electrical, Computer, Mechanical or Software Engineering. They then gain focused expertise in robotics by completing a track of robotics-related courses, including embedded systems and robotics programming. They gain multidisciplinary teamwork skills by working on a robotics senior design project with students from other majors. In this paper, we present details of the curriculum, assessment, student interest, and profiles of our first graduates.

Scene parsing using region-based generative models

Matthew Boutell, Jiebo Luo, and Christopher Brown. IEEE Transactions on Multimedia, 9(1), pp. 136-146, January 2007.

Full text: boutell07tmm.pdf

Abstract: Semantic scene classification is a challenging problem in computer vision. In contrast to the common approach of using low-level features computed from the whole scene, we propose “scene parsing” utilizing semantic object detectors (e.g., sky, foliage, and pavement) and region-based scene-configuration models. Because semantic detectors are faulty in practice, it is critical to develop a region-based generative model of outdoor scenes based on characteristic objects in the scene and spatial relationships between them. Since a fully connected scene configuration model is intractable, we chose to model pairwise relationships between regions and estimate scene probabilities using loopy belief propagation on a factor graph. We demonstrate the promise of this approach on a set of over 2000 outdoor photographs, comparing it with existing discriminative approaches and those using low-level features.

Pictures are not taken in a vacuum: An overview of exploiting context for semantic scene content understanding

Jiebo Luo, Matthew Boutell, and Christopher Brown. IEEE Signal Processing Magazine, 23(2), pp. 101-114, March 2006.

Abstract: Considerable research has been devoted to the problem of multimedia indexing and retrieval in the past decade. However, limited by the state of the art in image understanding, the majority of the existing content- based image retrieval (CBIR) systems have taken a relatively low-level approach and fallen short of higher-level interpretation and knowledge. Recent research has begun to focus on bridging the semantic and conceptual gap that exists between man and computer by integrating knowledge-based techniques, human perception, scene content understanding, psychology, and linguistics. In this article, we provide an overview of exploiting context for semantic scene content understanding.

A generalized temporal context model for classifying image collections

Matthew Boutell, Jiebo Luo, and Christopher Brown. ACM Multimedia Systems, 11(1), pp. 82-92, November 2005.

Full text: boutell05mm.pdf

Abstract: Semantic scene classification is an open problem in computer vision, especially when information from only a single image is employed. In applications involving image collections, however, images are clustered sequentially, allowing surrounding images to be used as temporal context. We present a general probabilistic temporal context model in which the first-order Markov property is used to integrate content-based and temporal context cues. The model uses elapsed time-dependent transition probabilities between images to enforce the fact that images captured within a shorter period of time are more likely to be related. This model is generalized in that it allows arbitrary elapsed time between images, making it suitable for classifying image collections. In addition, we derived a variant of this model to use in ordered image collections for which no timestamp information is available, such as film scans.We applied the proposed context models to two problems, achieving significant gains in accuracy in both cases. The two algorithms used to implement inference within the context model, Viterbi and belief propagation, yielded similar results with a slight edge to belief propagation.

Natural scene classification using overcomplete ICA

Jiebo Luo and Matthew Boutell. Pattern Recognition, 38(10), pp. 1507-1519, October 2005.

Full text: boutell05pr.pdf

Abstract: Principal component analysis (PCA) has been widely used to extract features for pattern recognition problems such as object recognition. In natural scene classification, Oliva and Torralba presented such an algorithm for representing images by their "spatial envelope" properties, including naturalness, openness, and roughness. Our implementation closely matched the original algorithm in accuracy for naturalness classification (or "manmade-natural" classification) on a similar (Corel) dataset. However, we found that consumer photos, which are far more unconstrained in content and imaging conditions, present a greater challenge for the algorithm (as they typically do for image understanding algorithms). In this paper, we present an alternative approach to more robust naturalness classification, using overcomplete independent components analysis (ICA) directly on the Fourier-transformed image to derive sparse representations as more effective features for classification. Using both heuristic and support vector machine classifiers, we demonstrated that our ICA-based features are superior to the PCA-based features used in Oliva and Torralba. In addition, we augment ICA-based features with camera metadata related to image capture conditions to further improve the performance of our algorithm.

Image transform bootstrapping and its applications to semantic scene classification

Jiebo Luo, Matthew Boutell, Robert T. Gray, and Christopher Brown. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 35(3), June 2005.

Abstract: The performance of an exemplar-based scene classification system depends largely on the size and quality of its set of training exemplars, which can be limited in practice. In addition, in nontrivial data sets, variations in scene content as well as distracting regions may exist in many testing images to prohibit good matches with the exemplars. Various boosting schemes have been proposed in machine learning, focusing on the feature space. We introduce the novel concept of image-transform bootstrapping using transforms in the image space to address such issues. In particular, three major schemes are described for exploiting this concept to augment training, testing, and both. We have successfully applied it to three applications of increasing difficulty: sunset detection, outdoor scene classification, and automatic image orientation detection. It is shown that appropriate transforms and meta-classification methods can be selected to boost performance according to the domain of the problem and the features/classifier used.

Beyond pixels: Exploiting camera metadata for photo classification

Matthew Boutell and Jiebo Luo. Pattern Recognition, Special Issue on Image Understanding for Digital Photos, 38(6), June 2005.

Full text: boutellIUPR.pdf

Abstract: Semantic scene classification based only on low-level vision cues has had limited success on unconstrained image sets. On the other hand, camera metadata related to capture conditions provides cues independent of the captured scene content that can be used to improve classification performance. We consider three problems, indoor-outdoor classification, sunset detection, and manmade-natural classification. Analysis of camera metadata statistics for images of each class revealed that metadata fields, such as exposure time, flash fired, and subject distance, are most discriminative for each problem. A Bayesian network is employed to fuse content-based and metadata cues in the probability domain and degrades gracefully even when specific metadata inputs are missing (a practical concern). Finally, we provide extensive experimental results on the three problems using content-based and metadata cues to demonstrate the efficacy of the proposed integrated scene classification scheme.

Automatic image orientation detection via confidence-based integration of low-level and semantic cues

Jiebo Luo and Matthew Boutell. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), pp. 715-726, May 2005.

Abstract: Automatic image orientation detection for natural images is a useful, yet challenging research topic. Humans use scene context and semantic object recognition to identify the correct image orientation. However, it is difficult for a computer to perform the task in the same way because current object recognition algorithms are extremely limited in their scope and robustness. As a result, existing orientation detection methods were built upon low-level vision features such as spatial distributions of color and texture. Discrepant detection rates have been reported for these methods in the literature. We have developed a probabilistic approach to image orientation detection via confidence-based integration of low-level and semantic cues within a Bayesian framework. Our current accuracy is 90 percent for unconstrained consumer photos, impressive given the findings of a psychophysical study conducted recently. The proposed framework is an attempt to bridge the gap between computer and human vision systems and is applicable to other problems involving semantic scene content understanding.

Learning multi-label semantic scene classification

Matthew Boutell, Xipeng Shen, Jiebo Luo, and Christopher Brown. Pattern Recognition, 37(9), pp. 1757-1771, September 2004.

Full text:

Abstract: In classic pattern recognition problems, classes are mutually exclusive by definition. Classification errors occur when the classes overlap in the feature space. We examine a different situation, occurring when the classes are, by definition, not mutually exclusive. Such problems arise in semantic scene and document classification and in medical diagnosis. We present a framework to handle such problems and apply it to the problem of semantic scene classification, where a natural scene may contain multiple objects such that the scene can be described by multiple class labels (e.g., a field scene with a mountain in the background). Such a problem poses challenges to the classic pattern recognition paradigm and demands a different treatment. We discuss approaches for training and testing in this scenario and introduce new metrics for evaluating individual examples, class recall and precision, and overall accuracy. Experiments show that our methods are suitable for scene classification; furthermore, our work appears to generalize to other classification problems of the same nature.

Psychophysical study of image orientation perception

Jiebo Luo, David Crandall, Amit Singhal, Matthew Boutell, and Robert T. Gray. Spatial Vision, 16(5), pp. 429-457, December 2003.

Full text: luo03spatial.pdf

Abstract: The experiment reported here investigates the perception of orientation of color photographic images. A collection of 1000 images (mix of professional photos and consumer snapshots) was used in this study. Each image was examined by at least five observers and shown at varying resolutions. At each resolution, observers were asked to indicate the image orientation, the level of confidence, and the cues they used to make the decision. The results show that for typical images, accuracy is close to 98% when using all available semantic cues from high-resolution images, and 84% when using only low-level vision features and coarse semantics from thumbnails. The accuracy by human observers suggests an upper bound for the performance of an automatic system. In addition, the use of a large, carefully chosen image set that spans the "photo space" (in terms of occasions and subject matter) and extensive interaction with the human observers reveals cues used by humans at various image resolutions: sky and people are the most useful and reliable among a number of important semantic cues.

Conference and Workshop Papers

SPLICE: Self-Paced Learning in an Inverted Classroom Environment

Matt Boutell and Curt Clifton. Poster to be presented at the SIGCSE 2011 Technical Symposium on Computer Science Education, Dallas, TX, March 11, 2011.

Full text: boutellclifton11sigcse.pdf

Abstract: Learning to program is hard for many students. Practice with an expert coach is key to overcoming this challenge. However, finding time for this is an issue because presenting concepts, showing examples, and modeling problem solving reduce the time available for mentored practice. Pace is also an issue because some students arrive with confidence and prior experience and are thus bored, while other students labor and become overwhelmed.

To address these problems in CS1, we created on-line videos for a C programming unit to present concepts, show examples, and model problem solving. As a result, our students spend every class session entirely in active learning activities with expert coaching, receive more individual attention, and set their own pace.

PyTetris

Delvin Defoe, Matthew Boutell, and Curtis Clifton. CCSC:Midwest 2010 Nifty Tools and Assignments, Franklin, IN, Sept 25, 2010.

Project page: PyTetris

A Simulator for Teaching Robotics Programming Using the iRobot Create

Andrew Hettlinger and Matthew Boutell. Poster presented at the AAAI 2010 Symposium on Educational Advances in Artificial Intelligence, Atlanta, GA, June 2010.

Poster: hettlingerBoutelleaai10Poster.pdf

Full text: hettlingerBoutelleaai10.pdf

Abstract: Past educational robotics research has indicated that the use of simulators can increase students’ performance in introductory robotics programming courses. In this paper, we introduce a simulator for the iRobot Create that works on Windows PCs. It was developed to work with a Python robotics library and includes an Eclipse plugin, but can simulate any library that uses the serial Open Interface on the Create. The platform, library, and simulator are all easy to use and have been well-received initially by students.

MERI: Multidisciplinary Educational Robotics Initiative

Carlotta Berry, Matthew Boutell, Steve Chenoweth, and David Fisher. Proceedings of the American Society for Engineering Education, Austin, TX, June 2009.

Abstract: This paper will describe the implementation of an innovative multidisciplinary robotics certificate program at a small teaching institution in the Midwestern United States. The Multidisciplinary Educational Robotics Initiative (MERI) is a product of a collaborative effort between faculty in Computer Science and Software Engineering (CSSE), Electrical and Computer Engineering (ECE), and Mechanical Engineering (ME). At this institution, a certificate is defined as a minor across multiple disciplines, e.g. CSSE, ECE, and ME. This is a groundbreaking program with the certificate curriculum approved in fall 2008. This paper will present the motivation for the certificate program, expected outcomes, details of the program and curriculum, select courses in the program, first graduates of the program, assessment and future work.

Comparison of two methods in detecting late-night talk shows using pattern recognition

Joshua Burbrink, Justin Miller, and Matthew R. Boutell. American Society for Engineering Education IL/IN Section Conference, Terre Haute, IN, April 3-5, 2008.

Full text: burbrink2008asee.pdf

Abstract: This paper presents two alternative approaches to detecting images taken from videos of Leno talk-shows: a Support Vector Machine (SVM) and an Eigen-classifier based on principal components analysis. On a testing set of 2952 images collected from 88 videos, the SVM approach produced an experimentally calculated 90.41% accuracy using color features. On the same set, the Eigen-classifier produced 97.37% accuracy employing thresholds derived from Eigen images. The paper describes strengths and weaknesses of both methods, as well as their potential use on the difficult problem of video copyright violation detection.

Challenging the advanced first-year student’s learning process through student presentations

Lisa C. Kaczmarczyk, Matthew R. Boutell, and Mary Z. Last. The Third International Computing Education Research Workshop, Atlanta, GA, September 15-16, 2007.

Full text: last07icer.pdf

Abstract: The decline in computing enrollments is a global concern that necessitates that every potentially successful computing student be targeted for support and development. The needs of technically experienced, highly capable first-term college students are unique, and no less challenging than the needs of their lesser-prepared peers. In order to attract advanced first-year students to further computing studies, we need to understand better how instruction can meet their needs. This paper reports the results of a study in which advanced first-term computing students were challenged to become in-depth researcher-learners and to teach the content they acquired to their peers. The results demonstrate that students and instructors alike perceive that the students made significant improvements in communication, presentation, and teaming skills and acquired deep content knowledge from their experience in the course. The data also show that students were extremely uncomfortable with the paradigm shift in their learning environment. These results suggest that anxiety-reducing changes are needed for the course, but that overall, the teacher-researcher-learner concept is very beneficial for increasing the understanding and learning of advanced first year computing students.

Home interior classification using SIFT keypoint histograms

Brian Ayers and Matthew Boutell. International Workshop on Semantic Learning Applications in Multimedia (in conjunction with CVPR2007), Minneapolis, MN, June 2007.

Full text: ayers07slam.pdf

Abstract: Semantic scene classification, the process of categorizing photographs into a discrete set of classes using pattern recognition techniques, is a useful ability for image annotation, organization and retrieval. The literature has focused on classifying outdoor scenes such as beaches and sunsets. Here, we focus on a much more difficult problem, that of differentiating between typical rooms in home interiors, such as bedrooms or kitchens. This requires robust image feature extraction and classification techniques, such as SIFT (Scale-Invariant Feature Transform) features and Adaboost classifiers. To this end, we derived SIFT keypoint histograms, an efficient image representation that utilizes variance information from linear discriminant analysis. We compare SIFT keypoint histograms with other features such as spatial color moments and compare Adaboost with Support Vector Machine classifiers. We outline the various techniques used, show their advantages, disadvantages, and actual performance, and determine the most effective algorithm of those tested for home interior classification. Furthermore, we present results of pairwise classification of 7 rooms typically found in homes.

Factor-graphs for region-based whole-scene classification

Matthew Boutell, Jiebo Luo, and Christopher Brown. International Workshop on Semantic Learning Applications in Multimedia (in conjunction with CVPR2006), New York, NY, June 2006.

Full text: boutell06slam.pdf

Abstract: Semantic scene classification is still a challenging problem in computer vision. In contrast to the common approach of using low-level features computed from the scene, our approach uses explicit semantic object detectors and scene configuration models. To overcome faulty semantic detectors, it is critical to develop a region-based, generative model of outdoor scenes based on characteristic objects in the scene and spatial relationships between them. Since a fully connected scene configuration model is intractable, we chose to model pairwise relationships between regions and estimate scene probabilities using loopy belief propagation on a factor graph. We demonstrate the promise of this approach on a set of over 2000 outdoor photographs, comparing it with existing discriminative approaches and those using low-level features.

Using semantic features for scene classification: How good do they need to be?

Matthew Boutell, Anustup Choudhury, Jiebo Luo, and Christopher Brown. IEEE International Conference on Multimedia and Expo, Toronto, July 2006.

Full text: boutell06icme.pdf

Abstract: Semantic scene classification is a useful, yet challenging problem in image understanding. Most existing systems are based on low-level features, such as color or texture, and succeed to some extent. Intuitively, semantic features, such as sky, water, or foliage, which can be detected automatically, should help close the so-called semantic gap and lead to higher scene classification accuracy. To answer the question of how accurate the detectors themselves need to be, we adopt a generally applicable scene classification scheme that combines semantic features and their spatial layout as encoded implicitly using a block-based method. Our scene classification results show that although our current detectors collectively are still inadequate to outperform low-level features under the same scheme, semantic features hold promise as simulated detectors can achieve superior classification accuracy once their own accuracies reach above a nontrivial 90%.

Overcomplete ICA-based manmade scene classification

Matthew Boutell and Jiebo Luo. IEEE International Conference on Multimedia and Expo, Amsterdam, NL, July 2005.

Abstract: Principal Component Analysis (PCA) has been widely used to extract features for pattern recognition problems such as object recognition. Oliva and Torralba used "spatial envelope" properties derived from PCA to classify images as manmade or natural. While our implementation closely matched theirs in accuracy on a similar (Corel) dataset, we found that consumer photos, which are far less constrained in content and imaging conditions, present a greater challenge for the algorithm (as is typical in image understanding). We present an alternative approach to more robust naturalness classification, using overcomplete Independent Components Analysis (ICA) directly on the Fourier-transformed image to derive sparse representations as more effective features for classification. We demonstrated that our ICA-based features are superior to the PCA-based features on a large set of consumer photographs.

Improved semantic region labeling based on scene context

Matthew Boutell, Jiebo Luo, and Christopher Brown. IEEE International Conference on Multimedia and Expo, Amsterdam, NL, July 2005.

Abstract: Semantic region labeling in outdoor scenes, e.g., identifying sky, grass, foliage, water, and snow, facilitates content-based image retrieval, organization, and enhancement. A major limitation of current object detectors is the significant number of misclassifications due to the similarities in color and texture characteristics of various object types and lack of context information. Building on previous work of spatial context-aware object detection, we have developed a further improved system by modeling and enforcing spatial context constraints specific to individual scene type. In particular, the scene context, in the form of factor graphs, is obtained by learning and subsequently used via MAP estimation to reduce misclassification by constraining the object detection beliefs to conform to the spatial context models. Experimental results show that the richer spatial context models improve the accuracy of object detection over the individual object detectors and the general outdoor scene model.

Photo classification by integrating image content and camera metadata

Matthew Boutell and Jiebo Luo. International Conference on Pattern Recognition, Cambridge, UK, August 2004.

Abstract: Despite years of research, semantic classification of unconstrained photos is still an open problem. Existing systems have only used features derived from the image content. However, Exif metadata recorded by the camera provides cues independent of the scene content that can be exploited to improve classification accuracy. Using the problem of indoor-outdoor classification as an example, analysis of metadata statistics for each class revealed that exposure time, flash use, and subject distance are salient cues. We use a Bayesian network to integrate heterogeneous (content-based and metadata) cues in a robust fashion. Based on extensive experimental results, we make two observations: (1) adding metadata to content-based cues gives highest accuracies; and (2) metadata cues alone can outperform content-based cues alone for certain applications, leading to a system with high performance, yet requiring very little computational overhead. The benefit of incorporating metadata cues can be expected to generalize to other scene classification problems.

Incorporating temporal context with content for classifying image collections

Matthew Boutell and Jiebo Luo. International Conference on Pattern Recognition, Cambridge, UK, August 2004.

Full text: boutell04icpr_temporal.pdf

Abstract: Semantic scene classification is an open problem in image understanding, especially when information purely from image content (i.e., pixels) is employed. However, in applications involving image collections, surrounding images give each image a temporal context. We present a probabilistic approach to scene classification, capable of integrating both image content and temporal context. Elapsed time between images can be derived from the timestamps recorded by digital cameras. Our temporal context model is trained to exploit the stronger dependence between images captured within a short period of time, indicated by the elapsed time. We demonstrate the efficacy of our approach by applying it to the problem of indoor-outdoor scene classification and achieving significant gains in accuracy. The probabilistic temporal context model can be applied to other scene classification problems.

Learning spatial configuration models using modified Dirichlet priors

Matthew Boutell, Jiebo Luo, and Christopher Brown. Workshop on Statistical Relational Learning (in conjunction with ICML2004), Banff, Alberta, July 2004.

Full text: boutell04srl.pdf

Abstract: Semantic scene classification is a challenging problem in computer vision. Special-purpose semantic object and material (e.g., sky and grass) detectors help, but are faulty in practice. In this paper, we propose a generative model of outdoor scenes based on spatial configurations of objects in the scene. Because the number of semantically-meaningful regions (for classification purposes) in the image is expected to be small, we infer exact probabilities by utilizing a brute-force approach. However, it is impractical to obtain enough training data to learn the joint distribution of the configuration space. To help overcome this problem, we propose a smoothing technique that modifies the naive uniform (Dirichlet) prior by using model-based graph-matching techniques to populate the configuration space. The proposed technique is inspired by the backoff technique from statistical language models. We compare scene classification performance using our method with two baselines: no smoothing and smoothing with a uniform prior. Initial results on a small set of natural images show the potential of the method. Detailed exploration of the behavior of the method on this set may lead to future improvements.

Bayesian fusion of camera metadata cues in semantic scene classification

Matthew Boutell and Jiebo Luo. IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, June 2004.

Full text: boutell04cvpr.pdf

Abstract: Semantic scene classification based only on low-level vision cues has had limited success on unconstrained image sets. On the other hand, camera metadata related to capture conditions provides cues independent of the captured scene content that can be used to improve classification performance. We consider two problems: indoor-outdoor classification and sunset detection. Analysis of camera metadata statistics for images of each class revealed that metadata fields, such as exposure time, flash fired, and subject distance, are most discriminative for both indoor-outdoor and sunset classification. A Bayesian network is employed to fuse content-based and metadata cues in the probability domain and degrades gracefully, even when specific metadata inputs are missing (a practical concern). Finally, we provide extensive experimental results on the two problems, using content-based and metadata cues to demonstrate the efficacy of the proposed integrated scene classification scheme.

A generalized temporal context model for semantic scene classification

Matthew Boutell and Jiebo Luo. IEEE Workshop on Learning in Computer Vision and Pattern Recognition (in conjunction with CVPR2004), Washington, DC, June 2004.

Full text: boutell04lcvpr.pdf

Abstract: Semantic scene classification is an open problem in computer vision especially when information from only a single image is employed. In applications involving image collections, however, images are clustered sequentially, allowing surrounding images to be used as temporal context. We present a general probabilistic temporal context model in which the first-order Markov property is used to integrate content-based and temporal context cues. The model uses elapsed time-dependent transition probabilities between images to enforce the fact that images captured within a shorter period of time are more likely to be related. This model is generalized in that it allows arbitrary elapsed time between images, making it suitable for classifying image collections. We also derived a variant of this model to use in image collections for which no timestamp information is available, such as film scans. We applied the context models to two problems, achieving significant gains in accuracy in both cases. The two algorithms used to implement inference within the context model, Viterbi and belief propagation, yielded similar results.

A probabilistic approach to image orientation detection via confidence-based integration of low-level and semantic cues

Jiebo Luo and Matthew Boutell. 4th International Workshop on Multimedia Data and Document Engineering (in conjunction with CVPR2004), Washington, DC, July 2004.

Full text: luo04mdde.pdf

Abstract: Automatic image orientation detection for natural images is a useful, yet challenging research area. Humans use scene context and semantic object recognition to identify the correct image orientation. However, it is difficult for a computer to perform the task in the same way because current object recognition algorithms are extremely limited in their scope and robustness. As a result, existing orientation detection methods were built upon low-level vision features such as spatial distributions of color and texture. In addition, discrepant detection rates have been reported. We have developed a probabilistic approach to image orientation detection via confidence-based integration of low-level and semantic cues within a Bayesian framework. Our current accuracy is approaching 90% for unconstrained consumer photos, impressive given the findings of a psychophysical study conducted recently. The proposed framework is an attempt to bridge the gap between computer and human vision systems, and is applicable to other problems involving semantic scene content understanding.

Multi-label machine learning and its application to semantic scene classification

Xipeng Shen, Matthew Boutell, Jiebo Luo, and Christopher Brown. International Symposium on Electronic Imaging, San Jose, CA, January, 2004.

Abstract: In classic pattern recognition problems, classes are mutually exclusive by definition. Classification errors occur when the classes overlap in the feature space. We examine a different situation, occurring when the classes are, by definition, not mutually exclusive. Such problems arise in scene and document classification and in medical diagnosis. We present a framework to handle such problems and apply it to the problem of semantic scene classification, where a natural scene may contain multiple objects such that the scene can be described by multiple class labels (e.g., a field scene with a mountain in the background). Such a problem poses challenges to the classic pattern recognition paradigm and demands a different treatment. We discuss approaches for training and testing in this scenario and introduce new metrics for evaluating individual examples, class recall and precision, and overall accuracy. Experiments show that our methods are suitable for scene classification; furthermore, our work appears to generalize to other classification problems of the same nature.

Using image transform-based bootstrapping to improve scene classification

Jiebo Luo, Matthew Boutell, Robert T. Gray, and Christopher Brown. 2004 International Symposium on Electronic Imaging, San Jose, CA, January 2004.

Abstract: The performance of an exemplar-based scene classification system depends largely on the size and quality of its set of training exemplars, which can be limited in practice. In addition, in non-trivial data sets, variations in scene content as well as distracting regions may exist in many testing images to prohibit good matches with the exemplars. We introduce the concept of image-transform bootstrapping using image transforms to address such issues. In particular, three major schemes are described for exploiting this concept to augment training, testing, and both. We have successfully applied it to three applications of increasing difficulty: sunset detection, outdoor scene classification, and automatic image orientation detection. It is shown that appropriate transforms and meta-classification methods can be selected to boost performance according to the domain of the problem and the features/classifier used.

Sunset scene classification using simulated image recomposition

Matthew Boutell, Jiebo Luo, and Robert T. Gray. IEEE International Conference on Multimedia and Expo, Baltimore, MD, July 2003.

Full text: boutell03icme.pdf

Abstract: Knowledge of the semantic classification of an image can be used to improve the accuracy of queries in content-based image organization and retrieval and to provide customized image enhancement. We developed an exemplar-based system for classifying sunset scenes. However, the performance of such a system depends largely on the size and quality of the set of training exemplars, which can be limited in practice. In addition, variations in scene content, as well as distracting regions, may exist in many testing images to prohibit good matches with the exemplars. We propose using simulated spatial and temporal image recomposition to address such issues. The recomposition schemes boost the recall of sunset images from a reasonably large data set by 10%, while holding the false positive rate constant.