Enhancement Spatial Transformer Networks for Text Classification

Ka Hou Chan, Sio Kei Im, Vai Kei Ian, Ka Man Chan, Wei Ke

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)


This paper introduces a 2D transformation based framework for arbitrary-oriented text detection in natural scene images. We present the localization networks within Spatial Transformer Networks (STN), which are designed to generate proposals with text orientation affine information including translation, scaling and rotation. This information will then be adapted as learning parameters to make the proposals to be fitted into the text regular form in terms of the orientation more accurately. Localization network is proposed to project arbitrary-oriented proposals to a feature map for a text region classifier. Compared with any previous text detection systems, this work ensures the relationship between the learning parameters, which can lead to a better approximation for orientation. As a result, this new layer greatly enhances the training accuracy. Moreover, the design and implementation can be easily deployed in the current systems built upon the standard CNNs architecture.

Original languageEnglish
Title of host publicationICGSP 2020 - Proceedings of the 4th International Conference on Graphics and Signal Processing
PublisherAssociation for Computing Machinery
Number of pages6
ISBN (Electronic)9781450377812
Publication statusPublished - 26 Jun 2020
Event4th International Conference on Graphics and Signal Processing, ICGSP 2020 - Nagoya, Virtual, Japan
Duration: 26 Jun 202028 Jun 2020

Publication series

NameACM International Conference Proceeding Series


Conference4th International Conference on Graphics and Signal Processing, ICGSP 2020
CityNagoya, Virtual


  • Affine Transformation
  • Homogeneous Matrix
  • Learning Parameters
  • Spatial Transformer Networks


Dive into the research topics of 'Enhancement Spatial Transformer Networks for Text Classification'. Together they form a unique fingerprint.

Cite this