Page 7 - tmp
P. 7

Optical Hydrocarbons’ chemical formula recognition
                                                    Tamara Stanković
                                                   tjstankovic@gmail.com
                                             Regional Centre for Talented Youth Nis




          1 Introduction                                        Determining character connection.
          Chemical graph theory is a branch of mathematics which   It should be determined which letters are connected (and
          models  molecules  in order to gain better insight into the   how). Two letters are connected by a line which is closest
          physical,  chemical  and  biological  properties  of  the   to them. So, for each line, it can be found which two letters
          compounds and their better approximation.  Digitalization   it  connects  and  then  construct  adjacency  matrix  of
          is  translating  analog  signal  into  digital  data.  In  order  to   molecule graph.
          make digitalization faster and more precise, many optical
          character recognition algorithms have been developed and   Determining the molecule.
          one of them has been used in this paper. Purpose of this
          paper  is  finding  and  describing  an  algorithm  for  optical   Using the adjacency matrix in molecule graph, it should be
          Hydrocarbons’  chemical  formula  recognition  of  given   concluded  which  graph  is  in  the  photo.  Properties  of
          photo.  Therefore,  this  is  optical  graph  recognition   chemical compounds and value of Wiener index can give
          problem.                                              needed  information.  Value  of  Wiener  index  can  be
                                                                calculated using Floyd-Warshall algorithm and compared
                                                                with  constants  for  each  molecule.  That  way,  molecule
          2 Methods                                             from the photo has been recognized.
          An algorithm for optical Hydrocarbons’ chemical formula
          recognition of given photo has a couple of stages.    3 Results

                                                                For   the   purpose   of   this   paper,   application
          Editing the photo.                                    HemijskeFormule  has  been  written  using  programming
          In order to represent photo on the computer, it has to be   language  Java  and  all  the  results  have  been  made  using
          digitalized. First, photo which is in RGB model, should be   this  application.  Application  has  been  tested  on  150
          translated into grayscale photo. For each pixel, values of   photos. Precision has been analyzed in certain parameters
          red, green and blue channel can be determined and simple   such as: type of chemical formula, number of C-atoms in
          formula      = 0.21∙    +0.71∙     +0.07∙     can be   the molecule, type of chemical bond, ect. Recognition has
          used  to  calculate  grayscale  value  of  that  pixel.    Then,   a  precision  of  93%  for  molecule  formulas  and  71%  for
          photo should be converted into binary image. Otsu method   structural formulas. Precision also depends on number of
          has been used to make binary photo.                   C-atoms in the molecule, and type of chemical bond. Total
                                                                precision of an application is 80%.

          Extracting and editing characters and lines.
          The goal is to find all connected pieces in the matrix of the   4 Conclusion
          photo,  which  represent  some  character  or  a  line.  DFS   The obtained results are satisfying and represent progress
          (Depth-First Search) algorithm for graph search is used to   in optical graph recognition.
          do it. Components then should be reduced in order to be
          20x20.
                                                                5 References
          Character and line recognition.                       Dejan  Živković.  Osnove  dizajna  i  analize  algoritama.

          Neural  network,  a  machine  learning  algorithm,  has  been   Računarski  fakultet  Beograd  i  CET,  Beograd  2007.
          used for character recognition. For each component, it is   Andrew  Ng.  Machine  Learning.  Stanford  University,
          determined if it was a character (C, H and digits 0-9) or a   Coursera 2013.
          line.  Neural  network  has  been  implemented  using   Nobuyuki Otsu. A Threshold Selection Method from
          applicative software Neuroph. For each component, neural   Gray-Level Histograms. IEEE Transaction on Systems,
          network  can  determine  which  character  best  suits  that   Man, and Cybernetics, 1979.
          component,  whereas  lines  are  components  that  are  not
          letters or digits.
   2   3   4   5   6   7   8   9   10   11   12