Stevens REU 2007
Category: | EE & Hardware (back to list) | ||
Project Page: | Stevens REU 2007 project page | ||
Summary: | This is the project log for my 2007 Research Experience for Undergraduates project at Stevens Institute of Technology. Keep watching this space for more information. Advisor: Professor Yu-Dong Yao |
||
Complete: |
|
||
Begin: | May 20, 2007 | ||
Completed: | July 28, 2007 |
Week 1 (5/21/07-5/25/07)
Summary | In the first week of my project, I established my webpage and began preliminary work on an object-recognition algorithm and program implemented in Matlab. These first two programs check whether an object exists in an image, which I tested using headshots from the Yale Face Database. I also researched previous work on this topic but found little information. |
Full Description |
The first group meeting of all team members took place on Monday, May 21, 2007 at 10am. After a ninety-minute orientation, we broke into small groups; Professor Yao is my advisor. In our first small group meeting, I presented my idea for the project: facial recognition and eigenobjects as subjects within the category of image processing. Professor Yao suggested applying this idea to an unfinished proect a graduate student had begun, writing a program to algorithmically distinguish cancerous cells from non-cancerous cells in cellular imagery. After lunch, I met with Prof. Yao again, and he showed me the work that had been done on the project so far. It struck me that eigenobjects, in this case eigennuclei, could be used to count the relative density of nuclei in various sections of a cellular image. This would provide a strong indicator of the presence of cancer, since high nuclear density (cancerous cells) can be identified as distinct from normal or low nuclear density (healthy cells). We agreed that I would be working on this project in the second small-group meeting on Tuesday. On Wednesday, I created a private area of the Projects section on my website (http://beta.cemetech.net) to house my research website, and updated it with my progress through today. I am currently working to brainstorm software methods to solve this problem; I am leaning towards a Matlab-based approach. Above is an example of an area of cancerous cells. Note the high ratio of nuclei (dark brown circles) to the area of the image as a whole. Part of the challenge of this project will be determining a base threshold for the densities that define normal cells and the densities that belie cancerous areas. One idea would be to determine a probability that the image contains cancer cells based on the various densities within the image and some pre- (or dynamically) defined threshold. So far, I have found several technical papers published about facial recognition using eigenfaces that I believe will be relevant to the methods I will be using in this project: » Face Recognition Using Eigenfaces (M. A. Turk and A. P. Pentland » Eigenfaces and Beyond (M. A. Turk) » Eigenfaces Group - Algorithmics I also began to mock up some various output diagrams, all based on a heatmap-type view to show nuclear density using varied block sizes. Very small blocks show the most detail but are very CPU-intensive. Images with larger blocks would process much faster but at the cost of a lower level of detail. On Thursday, May 24th, I began to work with Matlab and developed an initial method of extracting eigencells to define a training set. First, I used The GIMP, an image editor, to crop and scale 8 sample cells from an image: I next used algorithmic information from the Eigenfaces Group - Algorithmics site I found yesterday (listed above) to design a Matlab program to extract eight eigencells. The code is 74 lines (http://www.cemetech.net/projects/cdp/eigenv1.m). This code generated the following (normalized) eigencells: On Friday, I created a second version of Thursday's program, extending its capabilities. It now prompts the user for an image that it converts to object space and then converts back, finding the Mean-Squared Error (MSE) between the original and final images to classify the image as a face or not. This method did not work well on the cell images in my training set; I believe the reason is the whitespace around all the images. When I used eight of the headshots from the Yale facial recognition training set instead and made a ninth face the test image, the program was more accurate. The next step towards cellular density detection will be to implement the more memory-efficient training set processing algorithm presented in the site above so I can process larger images. I will also make the eigenobjects calculated from the training set data be saved to a file so that the program does not have to recalculate the eigenobjects every time the program is run, as this is a slow and memory-intensive task. The ninth face before and after tranformation to and from face-space is below: Here are the programs that produced the two images above: » http://www.cemetech.net/projects/cdp/eigenv2.m (Main program) » http://www.cemetech.net/projects/cdp/inner.m (Inner product script) |
Week 2 (5/29/07-6/1/07)
Summary | In the second week of the project, I further developed my ideas about a preliminary method of identifying cancer cells, expanding my Matlab programs to recognize faces within an image. At the beginning of the week, I modularized my program from the previous week and redesigned parts of it with a faster, more efficient algorithm. In the latter half of the week, I did additional research and began writing code to find specific objects within an image. For the third week, I plan to further improve my algorithm to be able to identify specific coordinates where it thinks an object is based on last Friday's program. I will then build on this for the following week to identify rotated and scaled objects. I plan to start this week with programming to identify object coordinates based on a threshold, then research scaling and rotation in the second half of the week for next week's work. |
Full Description |
On Tuesday and Wednesday, I once again rewrote my program, this time separating it into distinct training and checking programs. The training program generates a set of eigenobjects (eigenFaces or eigencells) that it then stores to a file, as this is generally more time-consuming than the checking program. I also rewrote the algorithm used in the training program to use much smaller matrices (several magnitudes smaller in bytes) which will allow more and high-resolution images to be used to train the program. The checking program now lets the user input the name of a file to check against the training set, then transforms it to facespace (or cellspace) and back to imagespace. Initially, the MSEs were generally smaller for real faces than for non-faces, yet there was still insufficient distinction between faces and non-faces. Once I realized I was forgetting to normalize the new images generated before calculating the MSE, I received properly scaled MSEs that distinguish faces from non-faces. While the right-hand images below look identical, mathematical analysis reveals that they differ significantly.
Training set of images: Eigenface matches generated by transforming to face space, then back to image space (original is on the left): Mean squared error: 4.3429e+007 Mean squared error: 3.0242e+007 Mean squared error: 2.9503e+007 Mean squared error: 5.7416e+007 On Thursday, I did further research on how to identify an instance of an object within a larger image. At Professor Yao's prompting, I investigated deeper into the IEEExplorer dataase, and found several relevant resources, one of which led me to an excellent article from MIT that helped me on Friday. The three most relevant of the resources I read and wrote sample code to test theories from: » Principal Component Analysis and Neural Network Based Face Recognition » Example-Based Learning for View-Based Human Face Detection » Human Face Detection in Visual Scenes I also realized that after I implement complete object detection within images using faces as an example, I need to deal with rotated and scaled objects within the image. When I ran my program created Friday, it functioned properly. The three program modules used can be found here: » http://www.cemetech.net/projects/cdp/eigenv3training.m (Creates eigenobjects from training set) » http://www.cemetech.net/projects/cdp/eigenv3check.m (Produces odds an image is one of a class of objects) » http://www.cemetech.net/projects/cdp/eigenv3detect.m (Creates grayscale heatmap of objects for an image) When I ran through my first test image (640x480), it took 379 seconds to process, and produced a correct heatmap for the image. Lighter means a lower probability that an object is centered at that location, while darker means a higher probability. Notice that the non-faces in the heatmap image are a lighter gray than the faces. |
Week 3 (6/4/07-6/8/07)
Summary | This week I began by implementing exact location pinpointing into my image recognition algorithm, which I improved until it reached about 75% accuracy. I then began to research the problem of scaling and rotating. My computer encountered a fatal hard drive crash that destroyed its contents, but luckily I had most of my data backed up and only lost a few hours of work. In the latter half of the week, I found a specific mathematical solution to the problem of scaling and rotation and began to design an algorithm to implement this solution. This week, Week 4, I will be beginning programming of my solution and testing it against both faces and cellular images. |
Full Description |
On Monday and Tuesday, I wrote Eigenv3Rectangle.m, which identifies objects, prints out coordinates of each image, and boxes each object based on the training set and a pre-determined threshhold. One of the latest outputs: (413,408) (479,133) (45,127) (61,100) (90,168) (438,320) (317,274) (552,204) (258,165) (321,393) (465,46) (72,75) (209,291) (294,56) (40,152) (395,105) As shown in the image above, the program correctly identified nine of fifteen faces, missed six of fifteen faces, and produced a false positive for three single false positives and one group of four false positives. On Wednesday, I began to experiment with testing the robustness of the eigenObject-based recognition algorithm. I tried scaling and rotating the faces in my test image, and found that my original algorithm failed at an unacceptable rate on these new faces. As rotation angle increased past about 5%, recognition rate went down dramatically. A similar result occurred for scaling, although the error margin was somewhat higher; after scaling up or down by about 10%, the recognition rate decreased sharply. In order to solve this problem, I planned to spend Thursday and Friday doing research on previous solutions to the problem and attempting to test solutions in Matlab. On Thursday morning, unfortunately, my laptop's hard drive crashed fatally, making it unreadable and effectively deleting its entire contents. Luckily I had backed up everything for my REU project through 2pm Wednesday on my remote FTP server, so no significant progress on this was lost; I also had most of my personal projects backed up on a flash drive. I spent the last two days of the week buying a new harddrive, reinstalling Windows, and reinstalling all my programs, but also used Google and the IEEE Xplorer to find quite a few relevant papers on the topic including solutions for algorithms to handle scaling and rotation of the eigenobjects. The most helpful of the papers I found: » Experiments on Eigenfaces Robustness (Lemiux, Alexandre and Marc Parizeau) - This document was superb introductory material: instead of jumping into solutions, it carefully analyzed the problem. It determined that the eigenObject method was relatively reliable, but depends on having properly centered, scaled, and rotated input in order to function properly. It identified the problem as expanding the algorithm that finds ohjects within an image and passes those subimages to the identifier to take into account scaling and rotation. This paper confirmed my general conclusions: they found the error margins for pure eigenObject recognition to be +/-5% scaling and +/-5% rotation. They also tested the effects of downsampling, morphing, and translation (my algorithm already accounts for translation, so this is not an issue for me). » Face Recognition: Eigenface and Fisherface Performance Across Pose (Brooks, Alan, Li Gao and Ying Wu) - This graduate student paper on computer vision introduced me to several new subsets of image recognition approaches, including fisherobjects (usually used for facial recognition as fisherfaces). They mentioned the methods of preprocessing they used to normalize for lighting, scaling, and 2-axis rotation, but they were dealing with images that only contained a single object, significantly simplifying the procedure. Indeed, I began to think that no satisfactory solution has yet been devised for the problem of scaling and rotation. » Real Time Face Recognition using Eigenfaces (Cendrillon, R) - This paper from 1999 again outlined the problems of rotation and scaling in eigenObject recognition, particularly in the context of realtime facial recognition. The author deduces error ranges of +/-12% for scaling and +/-10% for rotation, higher than some of the other figures I found, but underscores that for robust recognition, algorithmic attention must be paid to scaling and rotation. » Robustness and Specificity in Object Detection (Eriksson, Anders P. and Kalle Astrom) - This was the most helpful paper so far, introducing specific mathematical solutions to the problems of scaling and rotation that I will be attempting to apply in my program. I will spend Week 4 using this and similar algorithmic solutions to build on last week's positive recognition results. |
Week 4 (6/11/07-6/15/07)
Week 5 (6/18/07-6/24/07)
Week 6 (6/25/07-7/1/07)
Summary | The majority of this week was spent finding new images of cancerous and non-cancerous cell structures upon which to test my program suite, then tweaking the threshold to account for any inconsistencies. I found my single set of 16 training cells worked extremely well even given disparate images, a result that greatly surprised me. Thursday I wrote and debugged the script to generate pre-colored, full-sized heatmaps that I originally was creating by hand, and Friday I began to brainstorm and experiment with cancer-presence decision routines. For next week, I hope to develop each of my possibilities so I can decide on a final method, finish implementation early in Week 8, and begin writing up my results in a final paper. |
Full Description |
After the large and small group meetings Monday morning and early Monday afternoon, I resumed work on testing my program under less-controlled conditions. Up through the end of Week 5, I had only been running my training program on three cellular images, one of which I found on the internet, and the other two of which Professor Yao had procured for me. All three were similar in color, density, and cell size, so he and I decided a logical next step would be to find other images of what I discovered is called microphotography. I searched for specific cellular imaging databases, and finding nothing easily-accessible, instead combed general internet image databases with keywords such as "cancer cells" and "cells microscope". I ended up weeding down my finds to ten images I feel are a good representative cross-section of my total results, two of which are a paired comparision of cancerous / non-cancerous regions. Most passed my program with flying colors, while only a few tripped it up or produced inconclusive results. First, the ten images that I selected. The first two on the left are the matching pair. For all of these images, I found that I had to switch from the strict 1.5*107 threshold I was able to use for my three initial images to a more liberal 2.0*107 threshold that allows more non-cell areas to be selected but overall produces better precision in the uncontrolled test images. Four performed poorly or moderately well, while six processed very well and revealed coherent search and density maps. The first two significant images were the pair labelled 01a and 01b, respectively cancerous and non-cancerous cell structures. As you can see, the cancerous cell produced many more positive detections and thus has a noticeably more chaotic heatmap. The next image, this one of a cancerous region, shows the same high nuclear density with a correspondingly varied heatmap, similar to the first of the two images in the pair above. Again cancerous, the next significant image displays the same traits as the first several, including wide density variations and several areas of unusually high density. The next image that stands out contains fewer total nuclei and more intracellular tissue, yet correctly maintains the same regionalized behavior, even though the program missed several of the cells that differ too far from its training set. The final image is visually quite confusing and busy, appearing to be almost uniformly filled with cells and nuclei. After processing the image and comparing the results to a closer examination of the original, it becomes evident that nuclear concentration is actually shifted towards the top of the image. I returned to wrestling with Matlab code on Thursday, when I wrote a section at the end of eigenv6density.m to produce the actual image shown at the far right of each of the rows above, instead of merely producing an array blocks that I could manually manipulate into a viewable image in the open-source image editor, The Gimp 2.2 (http://www.gimp.org). This posed no significant problems or bugs, and I was able to test it by verifying its results matched the images I had previously been rendering by hand. Finally, I spent Friday brainstorming and ended up with three equally viable and possibly mixable possibilities for algorithms to recognize cancer in a microphotograph of cellular structures. The first is the simplest, querying the user for a raw threshold of nuclei per pixel, above which the region can be considered cancer. The advantages of this are the highest overall accuracy, but only if its major disadvantage, that requirement that the user input an appropriate threshold value, is ignored. If the user inputs an invalid or incorrect threshold, the results of this algorithm are likely to be inaccurate at best. The second possibility uses a similar method, but operates somewhat more autonomously. Instead of querying the user for a nuclei-per-pixel threshold, it requests a percentage, and flags regions that differ from their nearest neighbor by at least that percentage as suspect. The third and final solution appears most elegant at this point, utilizing accepted statistical analysis methods. Instead of finding what I realized was essentially deviation, I could use established formula to calculator the mean and standard deviation of the image, and then identify areas that differed by a significant amount as determined by the standard deviation. More on this next week. For next week, Week 7, I hope to completely explore these and any other possibilities for algorithmic density detection solutions, and decide which I will implement. I will then proceed to implementation and a written summary the following week and a half. |
Week 7 (7/2/07-7/6/07)
Summary | I spent Monday, Tuesday, and Thursday of this week exploring respectively my three possible solutions to the problem of deciding whether an image represents cancer tissue or not. On Wednesday, I went to see the fireworks from Castle Point; on Friday, I chose a final method and began to implement it into the full program. For the following week, Week 8 of the REU program, I plan to complete my program and begin the writeup for possible eventual publication. |
Full Description |
I spent the first day of the week exploring my simplest possibility, prompting the user for either a nuclei-per-pixel or a pixels-per-nucleus value and processing each of the regions of the image based on that number. Testing this on several of my images, though, I found that it was difficult to accurately judge the proper threshold value without substantial time spent fine-tuning for that precise cell size and distribution. I ended up discarding this possibility as a viable solution on its own due to the extensive supervision it requires, but I am still considering parts of it as possible components of the final algorithm, particularly if I can deduce an automated, machine-driven method of determining the proper NPP or PPN threshold. Tuesday saw me explore the compromise solution, a method that would retain the user-prompting of the first solution but work with a more flexible value, the maximum allowed deviation from the average. Under the final form of this plan as designed last Friday, the program would calculate the average nuclei per region for the entire image, then compare the individual values for each region to the average. If the number of nuclei in the region was less then or greater than that percentage from the average, the region would be flagged as possibly cancerous. I was more satisfied with this method upon writing up a rough test and executing it on several images, but still saw an unacceptable dependence on constant block size for the regions. Decreasing the block size caused a noticeable drop in accuracy, but I was on the right track. I explored my third and final initial idea on Thursday, using accepted methods of statistical analysis to determine automatically the threshold for deviation based on the mean and standard deviation, quite similar to my second option but with less user input. I experimented with retaining an optional prompt where the user could fine-tune the percentage of the deviation that was regarded as acceptable, which I decided was important enough to include in my final plan. In preliminary tests using rough approximations of the deviation function, the test outperformed both other solutions. Friday I spent the morning researching how Matlab handles statistical funtions, particularly standard deviation, to determine whether I could use the built-in functionality or needed to write my own. I decided that though it supported only one-dimensional arrays instead of the two-dimensional arrays that I needed, I would be able to adjust Matlab's std() function for my use. In the afternoon, I completed the section of the program that determines the standard deviation of the density heatmap. Next week, the eighth week of this program, I will spent the first two or three days completing and testing my solution using the standard deviation function. Unless other problems in my program or its design arise, I plan to spend the latter half of this week outlining and beginning to draft my paper about my project and its outcome. |
Week 8 (7/9/07-7/13/07)
Week 9 (7/16/07-7/20/07)
Summary | This week consisted of tidying up loose ends in the CDP project, commenting code, and putting together final documentation. In the first half of the week I completed my Matlab code, added comments, and cleaned it up a bit, then started my presentation for next Wednesday. In the latter half of the week I completed my research poster, executive summary, and final report including my Weekly Report collation. |
Full Description |
On Monday of this week I worked on cleaning up my Matlab code. There were several sections where pieces had been added and other pieces commented out during the development process; I left some of the debug sections in but added notes indicating their purpose, and I removed the unneeded parts. I finished a few final optimizations, then added all the remaining comments that I felt were necessary. I ended up with two final pieces for the CDP program, the training program and the evaluation program (see below). I began to work on my presentation on prostate cancer for next Wednesday on Tuesday of Week 9. I spoke to Professor Yao to confirm details about the presentation, then read over the article I was assigned, did some additional research of fluorescent microscopy, and then began to make my presentation. I planned out the first fifteen slides and wrote my presentation notes to go along with them. Wednesday we had the seminar on Harbor Security and went to Madame Tussaud's in Manhattan, so the only thing I was able to achieve on my project was a few additional slides for my presentation. As of Wednesday I have around ten minutes of the required thirty minutes complete. Thursday I looked over the requirements for the final report and made both my coversheet and my Exceutive Summary. I started with the executive summary, writing an outline and then a complete page on my project. I began with my introduction, presenting the goals of the CDP project and the basic functionality it includes. I then wrote a few paragraphs detailing its exact functionality, and concluded with plans and possible improvements for the future. For the coversheet, I took Professor Yao's examples as a starting point, then incorporated a highly abbreviated form of my Executive Summary and included examples of the program's output. I needed to be at Cooper Union, so when I completed those two items, I printed a poster-sized color plotting of my coversheet. On Friday I put together the remainder of my final report, including the title page, table of contents, and most importantly, the collation of all of my Weekly Summaries. I ran into a few problems when I found that most computers would crash from the number of images the report contains when I tried to export to PDF, but I was eventually able to make it function correctly. Below you will find a listing of all the final components and documentation for the CDP project. The following are all available from http://www.cemetech.net/projects/cdp/documentation. For print media, PLEASE use the PDFs instead of the Word documents, as the PDFs print much better. » Coversheet.doc - My mini (8.5" x 11") research poster for the final report in MS Word format. » Coversheet.pdf - My mini (8.5" x 11") research poster for the final report in PDF format. » FinalReport.doc - My full final report in MS Word format. » FinalReport.pdf - My full final report in PDF format. » CDP_train.m - The final training set processor program in Matlab format. » CDP_evaluate.m - The final main program in Matlab format. |
Week 9 (7/16/07-7/20/07)
Summary | As this was the final week of the program, most of the time was spent wrapping up final loose ends. Monday was the final group presentation, I gave my Prostate Cancer-related presentation on Wednesday, and final awards were presented on Friday. |
Full Description |
The following are all available from http://www.cemetech.net/projects/cdp/documentation. For print media, PLEASE use the PDFs instead of the Word documents, as the PDFs print much better. » FinalPres.pdf - Final, somewhat-related presentation on Fluorescent vs. Photographic Microscopy and the CDP. » Coversheet.doc - My mini (8.5" x 11") research poster for the final report in MS Word format. » Coversheet.pdf - My mini (8.5" x 11") research poster for the final report in PDF format. » FinalReport.doc - My full final report in MS Word format. » FinalReport.pdf - My full final report in PDF format. » CDP_train.m - The final training set processor program in Matlab format. » CDP_evaluate.m - The final main program in Matlab format. » finalpres.ppt - My final presentation regarding prostate cancer in Powerpoint format. » finalpres.odp - My final presentation regarding prostate cancer in Open Office presentation format (superior quality). |
My name is Christopher Mitchell; I am currently entering my junior year as an Electrical Engineering undergraduate at Cooper Union in New York, NY. My interests are mainly in the field, and including webprogramming and programming TI graphing calculators in z80 assembly. I am the webmaster of a 600-member technology site, www.cemetech.net. I enjoy reading, music, drawing, movies, listening to and creating music, and long walks.
Week 1 |
» Eigenv1.m - Extract eigenMatrix from 8 training images » Eigenv2.m - Extract eigenMatrix from n training images » Inner.m - Take inner product of two vectors |
Week 2 |
» Eigenv3Training.m - Extract eigenMatrix from n training images and store to a file. » Eigenv3Check.m - Compare an image to the eigenMatrix and produce an index of how likely it is to be an object. » Eigenv3Detect.m - Given an image, produce heatmap of probability of objects within the image. |
Week 3 |
» Eigenv3Rectangle.m - Find coordinates of assumed object given a threshold. |
Week 4 |
» Eigenv4Training.m - (Lighting correction for improved accuracy) Extract eigenMatrix from n training images and store to a file. » Eigenv4Check.m - (Lighting correction for improved accuracy) Compare an image to the eigenMatrix and produce an index of how likely it is to be an object. » Eigenv4Detect.m - (Lighting correction for improved accuracy) Given an image, produce heatmap of probability of objects within the image. » Eigenv4Rectangle.m - (Lighting correction for improved accuracy) Find coordinates of assumed object given a threshold. » Eigenv4Density.m - (Lighting correction for improved accuracy) Find coordinates of assumed object given a threshold, then create a densitymap / heatmap of the data given a defined block size. |
Week 5 |
» Eigenv5Training.m - Extract eigenMatrix from n training images and store to a file for any dimensions and number of training images. » Eigenv5Check.m - Compare an image to the eigenMatrix and produce an index of how likely it is to be an object with corrected and normalized contrast. » Eigenv5Detect.m - Given an image, produce heatmap of probability of objects within the image. » Eigenv5Rectangle.m - Find coordinates of assumed object given a threshold. » Eigenv5Density.m - Find coordinates of assumed object given a threshold, then create a densitymap / heatmap of the data given a defined block size. Includes all week 5 bugfixes and tweaks. |
Weeks 6&7 |
» Eigenv6Training.m - Extract eigenMatrix from n training images and store to a file for any dimensions and number of training images. » Eigenv6Check.m - Compare an image to the eigenMatrix and produce an index of how likely it is to be an object with corrected and normalized contrast. » Eigenv6Detect.m - Given an image, produce heatmap of probability of objects within the image, now with correctly-normalized lighting. » Eigenv6Rectangle.m - Find coordinates of assumed object given a threshold. » Eigenv6Density.m - Find coordinates of assumed object given a threshold, then create a densitymap / heatmap of the data given a defined block size. Includes all week 5 bugfixes and tweaks. Now also includes full scaling and rendering of the heatmap into a writeable image. |
Weeks 8 |
» Eigenv6Evaluate.m - Given an image, generate a hitmap, density heatmap, and cancer-presence evaluation |
Weeks 9 |
» Coversheet.doc - My mini (8.5" x 11") research poster for the final report in MS Word format. » Coversheet.pdf - My mini (8.5" x 11") research poster for the final report in PDF format. » FinalReport.doc - My full final report in MS Word format. » FinalReport.pdf - My full final report in PDF format. » CDP_train.m - The final training set processor program in Matlab format. » CDP_evaluate.m - The final main program in Matlab format. » finalpres.ppt - My final presentation regarding prostate cancer in Powerpoint format. » finalpres.odp - My final presentation regarding prostate cancer in Open Office presentation format (superior quality). |
Advertisement