DataHour: Distinguishing Bot Text From Human Text Corpus

Online 10-12-2022 01:00 PM to 10-12-2022 02:00 PM
  • 3619


  • Knowledge and Learning


DataHour Recording

Find the resources used in the DataHour HERE.

About the DataHour:

In this DataHour, Sumeet will give you a practical walkthrough on collection of Human Text Corpus for bilingual (English and Hindi) and applying preprocessing techniques to clean it. 

He will be covering the following topics in detail:

  • Generation of word vectors using TF-IDF and using language modeling techniques (n-gram) to get paragraph vectors of text. 
  • Generation of bot text for bilingual (English and Hindi) by using a subsample of human text corpus and passing it to the LSTM Auto Text Generation Neural Network (Encoder-Decoder Architecture). 
  • Generation of vectors for the bot text by using a similar preprocessing pipeline as for the human text. 
  • Clustering human and bot text using density based clustering. 
  • Computing heuristics for generated clusters and comparing them statistically using non parametric hypothesis tests to compare if the two clusters difference is statistically significant.

 Interest in learning the emerging and trending technologies and basic understanding of NLP, basics of Neural Network, Python, Statistical Hypothesis Testing and Clustering.

Who is this DataHour for?

  • Students & Freshers who want to build a career in the Data-tech domain.
  • Working professionals who want to transition to the Data-tech domain.
  • Data science professionals who want to accelerate their career growth


Sumeet Lalla

Data Scientist at Cognizant

Sumeet Lalla has completed his Masters in Data Science from Higher School Of Economics Moscow and Bachelors of Engineering in Computer Engineering from Thapar University. With 5.5 years of experience in Data Science and Software Engineering he is currently working as a Data Scientist in Cognizant and have previously worked as Software Developer in Siemens Technology And Services and Technology Analyst in Deloitte Consulting and Pvt Ltd.

Connect with Sumeet on linkedin


Please register/login to participate in the contest

Please register to participate in the contest

Please register to participate in the contest



We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy.


We believe in making Analytics Vidhya the best experience possible for Data Science enthusiasts. Help us by providing valuable Feedback.