Hello Guys,
It's been a while since I have posted something, isn't it?
So, what's up you ask? Well, it happened on Monday, 22nd February, 2021. I was just passing my time, watching a Youtube video on a virtual assistant project. The video was quite impressive to be honest. They even used some package to extract data from Wikipedia. After watching that video, I thought to myself, is there any such package for Narutopedia?
Now those of you who know me personally, know how big of a Naruto and Boruto fan I am, so I started to check if I could find one, however, I couldn't find any. I am not saying that there is no Python package for Narutopedia, there could be one, it's just that I couldn't find any. So I thought, "Why not make a simple app that can extract data from Narutopedia for anything we search?". The only problem back then was that I had no idea how to do that. I then started to research and it took only a few minutes until I realized that there is something called "Web Scrapping".
I have heard a lot about how web scraping is something that every developer should try and add to his/her resume and stuffs, but I had no idea what it is. I had a feeling that it is something that can help me out in this project and guess what, I was right. So then I started to learn web scraping by the concept in a dummy app and it took around an hour or two at the max till I completed my first dummy project that used web scraping. Then I decided to start working on "Project Narutopedia Webscraper".
To be honest, it's a very basic app that extracts data related to whatever search key we enter. It took around 30-40 mins at max to complete the app. It took that much time because I had to do a little research on the structure of a Narutopedia page. Well it didn't use any id or class or ever a parent tag for each section. Also a lot of paragraph tags were just for blank lines.
First of all, I made the app to extract all the paragraph tags. It worked pretty easily. For some search keys like "Himawari Uzumaki", the results were instant, however of search keys like "Naruto" or "Sasuke", it had a lot of blank lines in the beginning. The app was made in a way that it would pause after every paragraph, so getting to the actual information for search keys like "Naruto" or "Sasuke" felt like a pain. The next task was to remove these blank spaces. The " " was easy to find out and remove, however there were some paragraphs with blank lines too. It took some time to figure out and apparently using "\n" to detect them worked and therefore helped completing the app.
Another thing that I wanted to try was "git". I did already had an account in GitHib, however I didn't know how to use git and how to push the app in git to GitHub. I really wanted to learn it since quite some time, so I decided to take this project as an opportunity to learn a bit of git. I learned git on Tuesday but I was a bit busy that day, so I could not actually use it in this project. Then on Wednesday, I found some time and pushed the app in GitHub.
I actually recorded the complete process from starting to work on this project till pushing the app to GitHub, however it is taking some time to edit the video as my Video Editor is getting frozen back to back.
Not to to worry about that, I will take some time but I am trying my best to make sure that the finished video gets to Planet of Codes Youtube channel.
Comments
Post a Comment