Conference PaperPDF Available

Automated web development: theme detection and code generation using Mix-NLP

Authors:

Abstract and Figures

A website helps a business to grow by using different marketing strategies. This paper describes a novel approach to develop a website by just providing the text (description of the website) or an image as input. Using Text Input it will suggest template (screenshots) after identifying the theme of the site inferred from the input. Those templates are converted into code for further customizations for their personal use. Current problem was that a web developer will take more than 15 days only to just make the basic structure of a website. This issue is resolved by our work which will generate the complete code of the webpage/ website in less amount of time. In this paper, it will tokenize each word to find their synonyms and then mapped it with root words for the theme identification and uses deep learning model to convert templates into code.
Content may be subject to copyright.
Automated Web Development
Theme Detection and Code Generation using Mix-NLP
Nandini Sethi
Computer Science and Engineering
Lovely Professional University
Phagwara,Punjab
nandinisethi2104@gmail.com
Abhishek Kumar
Computer Science and Engineering
Lovely Professional University
Phagwara,Punjab
kumarabhishek.kumar@gmail.com
Rohit Swami
Computer Science and Engineering
Lovely Professional University
Phagwara,Punjab
rowhitswami1@gmail.com
ABSTRACT
A website helps a business to grow by using different
marketing strategies. This paper describes a novel
approach to develop a website by just providing the text
(description of the website) or an image as input. Using Text
Input it will suggest template (screenshots) after identifying
the theme of the site inferred from the input. Those
templates are converted into code for further
customizations for their personal use. Current problem
was that a web developer will take more than 15 days only
to just make the basic structure of a website. This issue is
resolved by our work which will generate the complete code
of the webpage/ website in less amount of time. In this
paper, it will tokenize each word to find their synonyms
and then mapped it with root words for the theme
identification and uses deep learning model to convert
templates into code.
KEYWORDS
Root words, Theme, Synonyms, code
1. Introduction
To survive in the digital world, website becomes the basic
requirement of each business to represent itself in
digitalized world, big or small.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for
components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
ICAICR - 2019, June 15–16, 2019, Shimla, H.P, India
© 2019 Association for Computing
Machinery ACM ISBN
978-1-4503-6652-6/19/06…$15.00
https://doi.org/10.1145/3339311.3339356
If a business does not have a website, they are losing the
number of great opportunities for their business. A
website helps a business to grow by using different
marketing strategies. So, deploying a high end and
user interactive websites are important and should be
done in minimal time as possible.
What makes a website?
A website contains HTML code which acts as its
Skeleton for website
Then comes CSS (Cascading Style Sheet) and JS
(JavaScript) which is as the name says used for design the
website
.
The user input can be in any structure of the sentence from
which tokens will be extracted and their synonymies will
be mapped to identify the theme from root word. After
identifying the theme, Suggestions will be shown and
that suggestion will be feed into Convolution neural
network (CNN) after that features will be extracted and
feed into Long Short Time Memory (LSTM) which will
generate DSL Tokens. Those tokens will be compiled to
generate code.
Paragraph Segmentation
The first step is to break the word in segments of words
which is called tokens in NLP. Using Word tokenize
package from NLTK. The tokens contain English words,
every token is extracted from the sentence of words.
Word Analysis
Now after the sentence is converted into tokens. Then part
of speech tagging is done for each word. Words containing
ICAICR - 2019
, June 15–16, 2019, Shimla, H.P, India N. Sethi et al
only Noun and adjective are extracted from the tokenized.
Only Noun and adjective are used to detect theme because:
Noun contains words like objects, actions qualities and
state of existence
The adjective is useful in identifying parts of speech
with the noun which is mainly the motive of word
analysis.Word Analysis
Now after the sentence is converted into tokens. Then part
of speech tagging is done for each word. Words containing
only Noun and adjective are extracted from the tokenized.
Only Noun and adjective are used to detect theme because:
Noun contains words like objects, actions qualities and
state of existence
The adjective is useful in identifying parts of speech with
the noun which is mainly the motive of word analysis.
Word Mapping and Template suggestion
Next Step is to map the words with their original word which
is already present in the list to map each word we need to
find synonyms for each word by using WORDNET
which is a lexical database for English language, by using
Wordnet.synset() function to find the synonyms of the word.
After finding synonyms the words will be mapped with
root words for theme after mapping. Templates are
already classified which template comes in which theme on
the base of the identified theme. Templates will be
suggested to the user.
CNN
In deep learning, to analyze visual images the most
common class of neural networks used is convolutional
neural network (CNN, or ConvNet). CNN require minimal
preprocessing because it is designed by using a
variation of multilayer perceptron.[1] Due to
shared-weights architecture and characteristics of
translation invariance they are known as shift invariant
or space invariant artificial neural networks (SIANN).
Inspiration from biological processes [4], CNN develops a
network of neurons by proving the connectivity pattern in
the way that it looks like the pattern used in animal visual
cortex. In receptive field, each individual cortical neuron will
respond only in restricted region to the stimuli of the visual
field. Different neurons show partially overlapping in the
receptive fields in such a way that they will cover the entire
visual field.
LSTM
Recurrent neural networks (RNN) have the units as
long short-term memory (LSTM). LSTM network consist of
various LSTM units to build a recurrent neural network.
Various components of LSTM unit are cell, a forget
date, an input and output gate. All the 3 gates control
the transmission of information in and out of the cell and
then the cell stores the values on arbitrary intervals of time.
To classify the data, process it and perform predictions
which are on the basis of time series data, most
commonly networks used are LSTM networks because
in important time series events there can be uneven or
unknown duration lags. While training the traditional
recurrent neural networks various vanishing and exploding
gradient problems occur and LSTMs are developed to
handle such problems. In various applications, relative
insensitivity to gap length makes LSTM better than
recurrent neural networks, hidden Markov models and
other sequence learning methods.
2. Our Approach
The proposed system is done in various steps like text
segmentation, tokenizing, part of speech tagging, word map
and theme, suggesting, CNN, LSTM and decoder as shown
in Figure 1.
.
Figure 1:.Steps of approach
2.1 Input Image and Text
In this system the user can input text in two ways:
User can upload images of UI to convert UI to code.
User can type text description which will be used to
generate suggestion and from that user can select the type
of UI they went from the suggestion and that will generate
code.
Figure 2: User Input
2.2 Feature Extraction
Both the methods of input have feature extraction
Algorithms which extracts characteristics from both inputs
each of them has their own feature extraction algorithm.
2.2.1 Image input uses CNN as a feature extraction
algorithm to get characteristics of input. CNN is widely
used in computer vision problems because of its topology
which allows them to extract minor details from the input.
We used convolutional neural network which behaves like
an encoder and performs unsupervised learning by
comparing an input image to a fixed-length vector which
was already made learn to system.
2.2.2 Text input uses NLTK python library to firstly tokenize
the inputs from which Noun and Adjectives are extracted
from the tokenized data.
EXAMPLE:
INPUT:
“Need a website with red color navigation panel and black
background”
OUTPUT:
[„Need,„website,„with,„red,„colour,„navigation,„panel
,„
black, background]
2.3 Theme Detection
After Tokenizing these phases come into play which uses
Name Entity Recognition or Chunking which extracts Noun
and Adjectives from tokenized data which are called
Chunked data from that data synonyms are extracted and
mapped with root words.
Figure 3: Theme Detection from text Input
2.4 Image Processing and DSL Token
generation
After the extraction of features from the image using CNN,
we used DSL tokens to describe UI Components. To find
different graphical components and their relation between
each other DSL Token generation is used. DSL reduces the
size of search space by reducing the total number of tokens
of vocabulary of the DSL. By providing a discrete
input, our system language model performs the modeling
at token-level which uses one-hot encoded vectors,
eliminating the need for word embedding techniques such
as word2vec.
2.5 Decoder
By using supervised learning method, model is trained
by inputting an image I and xt is a contextual sequence
of X of T tokens, t {0 . . . T − 1} as inputs; and xT token
is taken as the target label. Input image I is encoded into
vector representation p by using CNN-based vision
model. LSTM Model is used to encode the input token xt
into an intermediate representation qt which allows the
model to concentrate more on certain type of tokens and
less focus on others.
ICAICR - 2019
, June 15–16, 2019, Shimla, H.P, India N. Sethi et al
Each LSTM layer consists of 128 cells and the first
language model consist of two such layers. A single
feature vector rt is formed by the concatenation of
p(vision encoded vector)and qt(language encoded vector)
which is then given as input to a second LSTM-based
model. This model will decode the image by mapping with
various models like vision and language model. This makes
the decoder able to learn to map the relation between the
objects identified in GUI image provided as input and the
tokens present in DSL code. In out decoder each LSTM
layer consists of 512 cells and it is implemented as the
combination of two LSTM layers.
The architecture discussed above can be represented in
mathematical form as:
p = CNN(I)
qt = LSTM(xt)
rt = (q, pt)
yt = softmax(LSTM0 (rt))
xt+1 = yt
3. Result
This work will generate code for UI input provided by the
user. Table 1 shows input provided by user as an image
and code is the output generated by the system. Table 2
shows input provided by user as text and themes are
generated by the system as output.
Figure 4:Input 1
Figure 5: Output-1
Figure 6: Input-2
<body>
<div class="wrapper row2">
<!-- main content -->
<div id="homepage">
<!-- Services -->
<section id="services" class="clear">
<article class="one_third">
<figure><img src="images/demo/290x180.gif"
width="290" height="180" alt="">
<figcaption>
<h2>Indonectetus facilis</h2>
<p>Nullamlacus dui ipsum conseque loborttis
non euisque morbi penas dapibulum orna.</p>
<footer class="more"><a href="#">Read More
&raquo;</a></footer>
</figcaption>
</figure>
</article>
<article class="one_third">
<figure><img src="images/demo/290x180.gif"
width="290" height="180" alt="">
<figcaption>
<h2>Indonectetus facilis</h2>
<p>Nullamlacus dui ipsum conseque loborttis
non euisque morbi penas dapibulum orna.</p>
<footer class="more"><a href="#">Read More
&raquo;</a></footer>
</figcaption>
</figure>
</article>
</section>
</body>
</html>
Figure 7: Output-2
TABLE 1: Image INPUT and OUTPUT
S.no
INPUT
OUTPUT
1
Input-1
Output-1
2
Input-2
Output-2
TABLE 2: Text Input and OUTPUT.
INPUT
I need a responsive website. I would
like it designed and built. Social
network platform for sport with added
dare component
OUTPUT
4. Conclusion
This proposed system have discussed the theme
detection technique and suggesting themes to user and
generating code from the selected template. Current
problem was that a web developer will take more than 15
days only to just make the basic structure of a website. This
issue is resolved by our work which will generate the
complete code of the webpage/ website in less amount of
time. This system can be used by anyone to make a
website from just a text input or screenshot of a website to
generate code from it. It can be used to generate code for
Android or IOS Applications from there GUI Screenshots to
fully functional code. Our model is trained on very small
dataset increasing the size dataset can lead to more
accurate results. Our work is concerned with the generation
of static website code. This work can be further extended to
developing or generating the code for both static as well as
dynamic websites.
ICAICR - 2019
, June 15–16, 2019, Shimla, H.P, India N. Sethi et al
REFERENCES
[1] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S.
Venugopalan, K. Saenko, and T. Darrell. 2015. Long-term recurrent
convolutional networks for visual recognition and description. In
Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 2625–2634.
[2] F. A. Gers, J. Schmidhuber, and F. Cummins. 2000. Learning to forget:
Continual prediction with lstm. Neural Computation, 12(10), 2451–247.
[3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S.
Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial nets. In
Advances in neural information processing systems, pages 2672–2680.
[4] A. Karpathy and L. Fei-Fei. 2015. Deep visual-semantic
alignments for generating image descriptions. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition,
pages 3128–3137.
[5] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013.
Distributed representations of words and phrases and their
compositionality. In Advances in neural information processing systems,
pages 3111–3119.
[6] W. Zaremba, I. Sutskever, and O. Vinyals. 2014. Recurrent neural
network regularization. arXiv preprint arXiv:1409.2329.
[7] Gurpreet Kaur, Prateek Agrawal. 2016. “Optimisation of Image Fusion
using Feature Matching Based on SIFT and RANSAC”, Indian Journal of
Science and Technology, 9(47), pp 1-7.
[8] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. Metaxas.
Stackgan. 2016. Text to photo-realistic image synthesis with stacked
generative adversarial networks. arXiv preprint arXiv:1612.03242.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet
classification with deep convolutional neural networks. In Advances in
neural information processing systems, pages 1097–1105.
[10] D. Bahdanau, K. Cho, and Y. Bengio. 2014. Neural machine translation
by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
[11]A. L. Gaunt, M. Brockschmidt, R. Singh, N. Kushman, P. Kohli, J.
Taylor, and D. Tarlow. 2016. Terpret: A probabilistic programming
language for program induction. arXiv preprint arXiv:1608.04428.
[12] W. Ling, E. Grefenstette, K. M. Hermann, T. Kocisk ˇ y, A. Senior,
F. Wang, and P.Blunsom. 2016. Latent predictor networks for code
generation. arXiv preprint arXiv:1603.06744.
[13] L. Yu, W. Zhang, J. Wang, and Y. Yu. 2016. Seqgan: sequence
generative adversarial nets with policy gradient. arXiv preprint
arXiv:1609.05473.
[14] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. 2016.
Generative adversarial text to image synthesis. In Proceedings of The
33rd International Conference on Machine Learning, volume 3.
[15]K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov,
R. S. Zemel, and Y. Bengio. 2015. Show, attend and tell: Neural image
caption generation with visual attention. In ICML, volume 14, pages 77–81.
... Also, the idea of Neural networks was reintroduced in speech recognition. However, the concept was introduced earlier in the 1950s but was not useful because of practical limitations [44]. In the 1990s, after neural networks were reintroduced, many new innovations came in the area of pattern recognition. ...
Chapter
Full-text available
For the past few decades, Automatic Speech Recognition (ASR) has gained a wide range of interest among researchers. From just identifying the digits for a single speaker to authenticating the speaker has a long history of improvisations and experiments. Human’s Speech Recognition has been fascinating problem amongst speech and natural language processing researchers. Speech is the utmost vital and indispensable way of transferring information amongst the human beings. Numerous research works have been equipped in the field of speech processing and recognition in the last few decades. Accordingly, a review of various speech recognition approaches and techniques suitable for text identification from speech is conversed in this survey. The chief inspiration of this review is to discover the prevailing speech recognition approaches and techniques in such a way that the researchers of this field can incorporate entirely the essential parameters in their speech recognition system which helps in overcoming the limitations of existing systems. In this review, various challenges involved in speech recognition process are discussed and what can be the future directives for the researchers of this field is also discussed. The typical speech recognition trials were considered to determine which metrics should be involved in the system and which can be disregarded.
Article
Full-text available
Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep"' in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Article
Full-text available
Background/Objectives: Image fusion is the technique which merges the input images to obtain the focused single image. A new method is proposed in this paper to optimise image fusion using feature matching based Scale Invariant Feature Transform (SIFT) and Random Sample Consensus (RANSAC). Methods/Analysis: In our proposed method, two input images are fused using Stationary Wavelet Transform to get a single focused image. Then, feature matching technique called SIFT is applied to match the corresponding features between two images. Further, RANSAC is applied to further optimise the result of SIFT and get a final fused image. Findings: Quantitative and visual results show that a highly focused and better fused image is obtained after feature matching with SIFT and further refinement with RANSAC. The proposed method is robust and independent of scale, light intensity, orientation of camera etc. Applications: The methodology for image fusion may be applied to stereo-images. Feature matching based on SIFT and RANSAC may be used to reconstruct a 3D view from stereo-images.
Article
Full-text available
Synthesizing photo-realistic images from text descriptions is a challenging problem in computer vision and has many practical applications. Samples generated by existing text-to-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. In this paper, we propose stacked Generative Adversarial Networks (StackGAN) to generate photo-realistic images conditioned on text descriptions. The Stage-I GAN sketches the primitive shape and basic colors of the object based on the given text description, yielding Stage-I low resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs, and generates high resolution images with photo-realistic details. The Stage-II GAN is able to rectify defects and add compelling details with the refinement process. Samples generated by StackGAN are more plausible than those generated by existing approaches. Importantly, our StackGAN for the first time generates realistic 256 x 256 images conditioned on only text descriptions, while state-of-the-art methods can generate at most 128 x 128 images. To demonstrate the effectiveness of the proposed StackGAN, extensive experiments are conducted on CUB and Oxford-102 datasets, which contain enough object appearance variations and are widely-used for text-to-image generation analysis.
Article
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Technical Report
We present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate the effectiveness of our alignment model with ranking experiments on Flickr8K, Flickr30K and COCO datasets, where we substantially improve on the state of the art. We then show that the sentences created by our generative model outperform retrieval baselines on the three aforementioned datasets and a new dataset of region-level annotations.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Conference Paper
As a new way of training generative models, Generative Adversarial Net (GAN) that uses a discriminative model to guide the training of the generative model has enjoyed considerable success in generating real-valued data. However, it has limitations when the goal is for generating sequences of discrete tokens. A major reason lies in that the discrete outputs from the generative model make it difficult to pass the gradient update from the discriminative model to the generative model. Also, the discriminative model can only assess a complete sequence, while for a partially generated sequence, it is nontrivial to balance its current score and the future one once the entire sequence has been generated. In this paper, we propose a sequence generation framework, called SeqGAN, to solve the problems. Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines.