Did you know that it is possible for electronic printers to have unique fingerprints just like us??
Researchers at IIT Gandhinagar have developed a novel technique* that can identify the source printer of a particular text document. This knowledge can be beneficial in the authentication of a printed piece of information, copyright ownership and could also be utilized during fraudulent crime investigations, in the future.
Dr. Nitin Khanna, a researcher in Electrical Engineering and a professor at the Indian Institute of Technology Gandhinagar, along with his PhD student, Sharad Joshi, has developed a new Printer Specific Local Text Descriptor (PSLTD). This is the first step towards overcoming a major limitation of present state-of-the-art systems, the requirement of letter fonts on test documents of unknown origin to be available in those used for training purposes of the said detecting algorithm. An algorithm, in simple terms, can be understood as a step-by-step procedure developed to solve given a logical problem.
“The primary objective is the creation of fingerprinting for sensors which are generating data for us. Earlier, ID cards were a method of identification — they were extrinsic to a person. Users were given a new (additional) characteristic feature in the form of IDs and hence it was called passive identification. Now, these cards are being rapidly replaced by biometric scanning such as fingerprinting, where instead of an external trait, people have started using something which is internal (inherent) as well as unique to every individual. We plan to do the same with printers, aiming for accurate identification of authenticity of documents,” said Prof. Khanna.
Prevalent in use today is the technique of watermarking, which is something like a faint design on paper, generated during its manufacture (often visible against the light), so as to identify its originality.
It distinguishes between documents generated from different sources and it can be thought of like an ID card for printers since an extrinsic characteristic is being embedded into the paper.
“I want to keep the unique trait intrinsic, like the biometric scanning. I want to utilize the inherent characteristic of printing mechanism, the way printers are working themselves. Using this technique, we can discriminate between visually same or similar content coming from two printer models of different as well as same brands. That’s the idea,” he explained.
How does it work? The printed documents are scanned and the characters are extracted. From each alphabet (letter), certain statistical features are taken. These features are based on the local texture of the content. In this way, a descriptor from each letter of the printed document is obtained. Next step is the average pooling, which gives a single descriptor for an individual printer. It serves as a unique signature, beneficial for the identification process.
“The biggest progress is that we have made this approach independent of the characters and font types used. Also, it does not require any changes to be made in printers which utilize it. It does not need any sophisticated machinery to operate and can be easily used in common places,” his PhD student, Sharad Joshi, added further.
This approach has a lot of important applications pertaining to criminal investigations. In future, it can be helpful in solving legal issues like leakage or misuse of sensitive documents such as leases and wills, by verifying the source of generation of the data — is that particular source reliable or not? In the words of Prof. Khanna, “Such data, when not from an authentic source, undergoes certain changes during its printing process. The main concern is traceability of such content and this is what this study aims to address.”
Testing of this mechanism on a large scale (say, 1000 or so printers) is an essential requirement. Another challenge to be addressed in the near future includes updating this technique such that it can adapt itself to a wide range of languages, currently, it is programmed to understand and identify the alphabets of English language only. The ability to tell apart the difference between font sizes constitutes another aspect of improvement. Prof. Nitin Khanna’s lab is working towards replacing the use of scanners in this technique, with smartphones, hence promoting ease of usability.
“Biometrics is certainly the most secure form of authentication, it is the hardest to imitate and duplicate.” — Avivah Litan, Vice President and Distinguished Analyst in Gartner Research
— — — — — — — — — — — — — — — — — — — — — — — — — —— — — — — — — — — — — — — — — — — — — — — — — — — —— — — — —
* The results of this research have been published in “Source printer classification using printer specific local texture descriptor”, IEEE Transactions on Information Forensics and Security (TIFS), DOI: 10.1109/TIFS.2019.2919869, vol. 45, no. 6, pp. 2700–2708, May 2019.
** This story has also been published on Medium.
APEKSHA SRIVASTAVA
Senior Project Associate