If you are here, you would have scanned a couple of documents or pages of a book. This would have made you realize one important thing; the annoying gray background! Most of us do not know we are just a few steps away from a clean document that does not have this gray background.
How to remove the gray background from a scanned image or PDF?
There are several ways to do this. The concept is to convert the PDF to images, open them, convert them to black and white, and save them. This will solve the problem most of the time.
Let us discuss this further in detail and find out the many ways how we can remove the gray background. There will be an additional step to be followed when it comes to the removal of background from PDFs.
Why do we get those annoying gray background when we scan?
The reasons behind this gray background are two:
- When we scan a page with the color mode set to grayscale, the pages are scanned with the pixels ranging from different shades of a mixture of black and white. This mixture is nothing but the different shades of gray getting into the images in the form of pixels. Read more about how we scan books without damaging them here.
- Contrast and brightness are a part of image correction. The settings of these two variables impact the scanned image resulting in gray color on the image
These two reasons are very important for us to know as reverse engineering them will be the way to attain the gray background-free images.
Let us now find the different ways of gray background removal from the below points.
Remove Gray Background while Scanning
A lot of scanners nowadays come with the option of using which we can remove the gray background without any problem. As we have already discussed that the pages are affected by this problem due to the grayscale mode.
So, there are two possible fixes right while we scan the pages.
Black and White Mode
Directly scanning the pages using the black and white mode removes the shades or mixtures of black and white. As gray is a mixture of black and white in different proportions, it will be removed from the document resulting in a text in black, and the background in white.
However, this cannot be done when there are color or grayscale pictures in the pages.
Certain scanners will have the option right out of the box, to increase contrast/brightness. The combination of this will remove the gray background. Just increasing the contrast to a certain amount will be enough to remove the grayish regions.
While doing this we have to watch out for the color pages as the contrast on the colors will also be increased.
Remove Gray Background after Scanning
If you have the simplest scanner that does not come with the features mentioned in the previous section, then here is what you need to do. You need to install one of the below-mentioned tools that can edit the images once they are scanned. Here are the tools that are either available for free or come with the software that you have already installed.
There are about nine ways that we have learned to scan books. If you are interested, you can read them in this article.
The above two ways of removing gray backgrounds are the simplest ones. There are many tools that can do the above jobs, however, we recommend the best open source graphic editor, GIMP (GNU Image Manipulation Program). You can download it from this link.
Most of what you need to change the page colors are in GIMP and it can be seen in the menu -> image. From there based on your expertise, you can make use of them. The simplest way is to go to the image menu -> brightness/contrast and adjust them.
The other way to convert the image to black and white is to go to menu -> image -> threshold and adjust the threshold to attain a black and white image.
Microsoft Picture Manager
If you think GIMP is slow and you have MS Office installed, a photo editor comes by default, installed in your PC. You can use that to open the photos easily and navigate to the next picture within the software. This feature is not available in GIMP. You will have to open a new photo either by dragging and dropping it into GIMP or opening them from a file, each time, providing the path.
The name of the tool is Microsoft Picture Manager and you can use the software by following the procedure written in this article.
Bulk Gray Background Removal
There are instances when we will want to do the above action to a huge number of pages. Yes, even if you consider a book of 300 pages, opening them and trying to enhance the pages is a herculean task, moreover, a waste of time.
To overcome this, again, we can use an open-source tool. Kudos to the creators of this tool. It is so easy to use and helps us batch process an entire set of books within minutes without much human intervention.
The tool that makes the above process possible is ScanTailor. Let us see in this section, how to use the great tool ScanTailor and convert the raw scanned images with a gray background into an enhanced set of pages removing the gray background.
Download the software by following this link.
Once you are done downloading, install the software by clicking on the dump and following the instructions.
After installing, when you open it, you will see the software like so.
The software is divided into six important steps that you can see in the top left side. Before going through the steps, we will have to import the images into the tool. To do this, use the path as the input folder from the input directory. After giving the input folder the software looks something like this:
From the above image, once you enter the folder path, all the images in the folder will be displayed in the left section. Select the relevant images or all images and transfer them to the File in Project using the double right arrows. You will have the images transferred to the right section that will look something like this.
Now, all the images on the right section are ready to be imported to the software for further editing. You can check the below check box that says “Fix DPIs, even if they look OK”, to improve the resolution of the scans.
Once done, click on OK to import all the images to the software, which looks like the below image.
Now let us go through the steps one by one. The steps start with Fix Orientation.
If the images are not in the right orientation, you have to select the angle from the top left and apply to all (don’t forget to do this in all the steps if you want the changes to be applied to all the images), if they are all in the same angle. If they are not at the same angle, we might have to select them manually and turn to their right or left.
Once all of them are in the right orientation, go to the next step that is Split Pages.
If the book is scanned two page-wise, you need to split them in this step. The software does that automatically for you. However, a thorough check throughout the pages is inevitable. As there may be some errors while the images are selected. You can select the pages by clicking on the thumbnails of the pages right on the right side of the software.
Now that the pages are split, there are all possibilities that while scanning we would have not scanned it in the perfect straight angles. The images are bound to be skewed. To de-skew them, we will follow the next step.
From the above step, we will select the pages that are skewed and rotate them slightly with the help of the guidelines, and make sure that they are straight.
Once they are de-skewed, the important thing that should be followed is the selection of content only. The edges are not required as they might contain irrelevant information like scratches, tears, etc.
Even this requires a little manual intervention when sometimes the content is not completely selected or the scratches in the edge are selected.
The margins are to be aligned for the book to look clean with nice lay-outing equally in all the pages. This is achieved from this step. We can position the content in a way that we want by specifying the side, top, and bottom margin.
Once this is done, the last step, which is the Output will be left.
Here is where, we select the resolution of the page, which can either be 300 or 600 based on your choice.
The mode selection happens at this step where the frequently used mode is black and white. Once you do this, there you go, your gray-scale page is now without the grayish tinge. It has a white background.
If you have images on the pages you can select the mode appropriately.
The above-mentioned method is one of the best methods when it comes to the removal of the gray background from a set of pages. Imagine you keep opening pages in GIMP or any other software one by one and increase brightness and contrast.
This way we can reduce your work and get the job done, that too free of cost. This was the very reason for me referring to this software as great software and Kudos to the team that released the software in the open-source environment.
In addition to all this, there is a command-line interface for this where we can use command prompt and get the same job done. We can learn about this in the coming articles.
Once we scan, it will be easier for the OCR engines to convert them to text to a form that is editable, like MS Word. Please read more about this here or you can convert them to a PDF retaining them as multiple images like here.