Skip to main content

Project Narutopedia Webscraper

Hello Guys,

It's been a while since I have posted something, isn't it? 

So, what's up you ask? Well, it happened on Monday, 22nd February, 2021. I was just passing my time, watching a Youtube video on a virtual assistant project. The video was quite impressive to be honest. They even used some package to extract data from Wikipedia. After watching that video, I thought to myself, is there any such package for Narutopedia?


Now those of you who know me personally, know how big of a Naruto and Boruto fan I am, so I started to check if I could find one, however, I couldn't find any. I am not saying that there is no Python package for Narutopedia, there could be one, it's just that I couldn't find any. So I thought, "Why not make a simple app that can extract data from Narutopedia for anything we search?". The only problem back then was that I had no idea how to do that. I then started to research and it took only a few minutes until I realized that there is something called "Web Scrapping".

I have heard a lot about how web scraping is something that every developer should try and add to his/her resume and stuffs, but I had no idea what it is. I had a feeling that it is something that can help me out in this project and guess what, I was  right. So then I started to learn web scraping by the concept in a dummy app and it took around an hour or two at the max till I completed my first dummy project that used web scraping. Then I decided to start working on "Project Narutopedia Webscraper". 

 To be honest, it's a very basic app that extracts data related to whatever search key we enter. It took around 30-40 mins at max to complete the app. It took that much time because I had to do a little research on the structure of a Narutopedia page. Well it didn't use any id or class or ever a parent tag for each section. Also a lot of paragraph tags were just for blank lines.

First of all, I made the app to extract all the paragraph tags. It worked pretty easily. For some search keys like "Himawari Uzumaki", the results were instant, however of search keys like "Naruto" or "Sasuke", it had a lot of blank lines in the beginning. The app was made in a way that it would pause after every paragraph, so getting to the actual information for search keys like "Naruto" or "Sasuke" felt like a pain. The next task was to remove these blank spaces. The " " was easy to find out and remove, however there were some paragraphs with blank lines too. It took some time to figure out and apparently using "\n" to detect them worked and therefore helped completing the app.

Another thing that I wanted to try was "git". I did already had an account in GitHib, however I didn't know how to use git and how to push the app in git to GitHub. I really wanted to learn it since quite some time, so I decided to take this project as an opportunity to learn a bit of git. I learned git on Tuesday but I was a bit busy that day, so I could not actually use it in this project. Then on Wednesday, I found some time and pushed the app in GitHub.


I actually recorded the complete process from starting to work on this project till pushing the app to GitHub, however it is taking some time to edit the video as my Video Editor is getting frozen back to back.


Not to to worry about that, I will take some time but I am trying my best to make sure that the finished video gets to Planet of Codes Youtube channel.

Comments

Popular posts from this blog

Swap two numbers without using a temporary variable

It's been a while since I've posted any blogs and vlogs. I've been trying to record videos for Planet of Codes Youtube channel, but there's quite a lot of noise whenever I try to record, so I thought why not I communicate through these blogs instead till I am able to record something for Youtube. So here we are, blogging for something that I actually tried to record in the form of a video. Usually in high schools and colleges, when most of us initially learn programming, we get a programming problem to swap two numbers. Usually the solution looks something like the below. num1 = 32 num2 = 87 temp = num1 num1 = num2 num2 = temp print(num1) print(num2) Note: Usually institutions ask students to write this program in C or C++ in first semester, however the above program is in Python, because, It's the logic that matters. This program can be written in any other programming language too. Well, here we are using a temporary variable "temp".  This problem can ac...

Significance of Development and Production Environment for any Application

The concept of the environment that any application runs on was something that I was unaware of as a student. I got introduced to these concept after becoming a part of corporate IT sector. Two of the most important environment are Development environment and Production environment. Of course, there are testing environments like  OAT and UAT environment, but at times these are taken as a sub part of the above mentioned major environments. So now the question arises, "What are  these environment? and why do I need to know about these?". Well, although subconsciously, we all are aware of these, not all of us are aware of how important these are to us and to each other. Development environment is basically the environment where the application is developed and maintained. Any changes that needs to be made to any application are made in development environment as well. The application in development environment could be any application made from scratch or any third party applica...