Tomographica is an artificial intelligence tool for radiologists. With higher sensitivity and fewer false-positives than any offering currently on the market, we can automatically detect lung nodules in thoracic CT scans.
Using the latest advances in deep learning technology, Tomographica can accurately and consistently detect 86% of suspicious lung nodules in thoracic CT scans, with less than three false positives per case.
In the developing world, where resources for interpreting scans are particularly limited, we believe Tomographica has the potential to save lives, by ensuring that patients in likely need of urgent intervention receive radiologist attention immediately.
In countries like the US, we think it can dramatically reduce read-times for radiologists, especially time involved in the peer-review process. By doing so it can improve practice margins, helping radiologists meet the growing demand for thoracic CT screening, and free up more time for quality patient care.
In The Developing World
One of the greatest sources of healthcare disparity between developing countries and developed countries is access to state of the art medical imaging.
In the past decade equipment manufacturers have invested heavily in the developing world by subsidizing the purchase of x-ray, CT, and ultrasound machines, so these countries are now beginning to have access to state of the art equipment. There remains, however, a lack of trained radiologists to read the images that equipment produces.
This gap is currently being filled by charitable organizations such as Doctors Without Borders. Radiologists within the US can volunteer to read scans from other countries in their spare time. This fills a need, but there is still a 30 day or more gap between a scan being acquired and read. In most cases that is adequate, but many patients need immediate intervention, and a 30 waiting period is too long.
Tomographica could save lives in the developing world by automatically detecting large lung nodules that need urgent intervention. Organizations like Doctors Without Borders could use Tomographica to ensure that these patients get feedback from a US radiologists immediately instead of waiting 30 or more days.
In The Developed World
With thoracic CT scans now approved by the Centers for Medicare and Medicaid Services for lung cancer screening for at-risk patients, demand for thoracic CT scan interpretation is growing faster than the resources available for interpretation. A single scan consists of around 200 cross-sectional images, each of which must be reviewed, to identify sometimes imperceptible nodules.
Furthermore, review by a single radiologist is generally insufficient. To maintain licensure with the American College of Radiology, radiology departments must have a peer-review process in place to assess the diagnostic accuracy of their radiologists. Today that peer review consists of another full manual read by a second radiologist.
Reimbursement for the time-consuming work of reading scans is generally on a fixed, fee-for-service basis. As a result, there are huge efficiency dividends to be gained by improving radiologist efficiency.
While it may take time and further development to clinically validate a machine learning tool like Tomographica for use in a primary radiologist read, starting immediately there are major gains to be achieved by dramatically reducing the time for peer review. A sufficiently sensitive machine learning tool for automatic detection of nodules and abnormalities could transform the peer review process, enabling the peer-reviewer to quickly zero in on two or three known areas of focus in each scan, rather than needing to examine each of the 200 images in the same minute detail as the primary radiologist.
In developing Tomographica, training, validation, and testing have been performed using the 1018 cases of the Lung Image Database Consortium image collection, each of which includes an annotation file, detailing the results of a two-phase image annotation process performed by four experienced thoracic radiologists.
Data Preprocessing and Feature Engineering
The first step in our process is to transform 3-dimensional DICOM files into a data structure that is more amenable to machine learning. Each grayscale slice from a 3-dimensional case (up to 200 of them in a single case) is first sliced into 64x64 pixel patches with a 32x32 pixel stride. We pursue this strategy to evaluate cases at a local level, thus eliminating the need for an architecture that handles both detection and localization.
The grayscale patches are then transformed into 2-dimensional slices with RGB channels. This is done to encode more information in every 2-dimensional slice. The R channel represents the original patch, the G channel represents maximum intensity projection of 5 slices centered on patch of interest, and finally the B channel represents minimum intensity projection of 3 slices centered on patch of interest.
The effect of using the RGB channels to encode three dimensional information can be seen by comparing images of normal anatomy and a lung nodule (below). The images on the left contain a lung nodule (red circle) while the images on the right contain a normal vessel (blue circle). The top row shows that in a single slice image a vessel and a nodule can appear very similar.
However, when using the color channels to combine the single slice with maximum and minimum projections it is clear than normal vessels are tubular (green tips extend out from the white center) whereas the nodule is spherical.
Data Imbalance and Sampling
If each slice is broken down to patches and then naively used in any machine learning model, the model will only learn to predict the normal class. This is due to a highly imbalanced dataset in which about 99.99% of the patches across our cases are normal and the remainder is either a nodule or abnormal.
To mitigate this problem, we proceed by downsampling the normal cases: we only create patches from within the lung, avoiding patches that represent lung matter.
We also upsample the abnormal and nodule patches: after creating tiled patches with fixed strides, we randomly sample patches that contain a nodule or some abnormal growth. These patches do not overlap with the tiled patches, therefore artificially boosting the patches a network can learn from. The final dataset that was used in training consisted of 80% normal patches, 10% abnormal patches, and 10% nodule patches.
Training a Convolutional Neural Network
After experimenting with a subset of Convolutional Neural Network (CNN) architectures, we decide to use the GoogLeNet Inception architecture. Given that nodules change in size for a fixed patch size, we needed an architecture that can combine many receptive fields in order to control for different nodule sizes.
The GoogLeNet model consists of several convolutional networks stacked on top of each other and along each other. The output of every layer of convolutional networks are concatenated together as the input to the next stack of convolutional networks. The weights of all the convolutional layer parameters were initialized at the weights derived from a pretrained GoogLeNet on the ImageNet dataset. The fully connected layers were initialized randomly and learned through training. Stochastic gradient descent was used for training with a scheduled decreasing learning rate over half epochs. The model was trained with early stopping on a validation set of cases that was prepared in the same way described above.
To understand the process of the CNN model, it is useful to visualize the convolutions of a single patch, to understand the model learning process, begining with the original patch itself (below left).
The first convolutional layer of the model (shown above right) starts to distinguish broad features within the patches; edges in particular become visible.
In the second convolutional layer, the model begins to differentiate physical features at a more granular level, while the in the final convolutional layer, convolutional layer, heatmaps and hotzones depicting areas that the model predicts are likely to represent nodules appear:
Postprocessing and Final Predictions
After the CNN makes predictions on a patch-by-patch basis, the predictions are reassembled into a cohesive 3D volume with labeled regions of predicted nodules. The reassembly process begins by forming a 3D nodule probability map according to the nodule probability generated by the CNN. A threshold is chosen to make the probability map a binary image of potential nodules. A morphological filter is used to filter out evident false positives using the fact that any true nodule must be overlapped by four adjacent patches. Finally a region-growing algorithm finds the disconnected regions of predicted nodules.
In the viewer below, the four quadrants (from top left, clockwise) represent:
- A 3D view
- The axial plane
- The coronal plane
- The sagittal plane
To move through a given axis, mouse over the relevant planar view and use the mousewheel or trackpad to scroll, or arrow keys for fine-grained control. Scrolling over the axial plane will move through the Z axis, while the coronal plane view moves through the Y axis and the sagittal plane the X axis.
Holding ‘Shift’ while mousing over a planar view will synchronize movements in the other planar views with your mouse movements. Holding shift while dragging will pan the 2D view.
Click and drag the 3D view to change the viewpoint. Click and drag any of the planar views to alter the black/white point levels of the images. To reset the 3D view, mouse over it and press “R”. Mousing over any other view and pressing R will reset black/white point levels.
Opening the control panel provides sliders for fine control of planar movement, black/white levels, and visibility and opacity of labels.
Regions marked ‘annotated’ are those confirmed by a consensus of 4 reviewing radiologists to contain a nodule. Regions marked ‘predicted’ are patches predicted by Tomographica to be likely to contain a nodule. Predicted regions that do not overlap with annotations are false positives in some case, and in other cases may simply represent small nodules that not all four of the reviewing radiologists identified.
The best-performing product currently commercially available on the market for these purposes is MeVis; its adoption has been limited because of limitations to its sensitivity, and the number of false positives it produces. A study recently presented to the 12th International Symposium on Biomedical Imaging benchmarked MeVis’s sensitivity at 71% at a false positive rate of two per case, 74% at a false positive rate of four per case, and 76% at a false positive rate of 256 per case.
Even at this early stage of development, Tomographica significantly outperforms these benchmarks. In testing to date we have achieved 86% sensitivity with a false positive rate of 3 per case. This is clearly a level of performance that will enable Tomographica to play a key role in dramatically improving the efficiency of the radiological peer review process, and in prioritizing highest-need cases for radiologist attention in the developing world.
False positives per case
False positive per case
Our current prototype is performing better than any product currently available for sale, and it is already a compelling tool for peer review in thoracic CT scans, and for prioritizing scan interpretation in places where wait times for radiologist attention are high. Ultimately, however, we believe with additional data and effort, Tomographica’s false positive rate and sensitivity could be improved to the point where it is a clinically validated primary read for lung nodules as part of a lung cancer screening program.
We believe the prediction infrastructure can be improved by gathering more training data through the National Lung Screening Trial (20,000 subjects), and by converting our 2D artificial neural network architecture two a true 3D architecture.