AngularJS ⋅ PHP ⋅ Firebase
Completed - Ongoing
I wanted to learn more about AngularJS applications and web scraping, so I decided to combine the two and build a chrome extension to display information from my medical schools newsfeed.
To use the Chrome extension, users would have to sign up via our website, which is stored in Firebase. An email verification system was put in place to ensure that only Sheffield medical students would have access to the newsfeed.
I used a cron job set up on DigitalOcean to automate the login (via CURL) and access of information from Minerva, the online portal set up by the University of Sheffield Medical School.
Extracting information from the newsfeed was particularly tricky, as Minerva lacked a consistent approach to displaying news items. Here are a few of the underlying problems that I discovered:
The date that an article was published is not displayed on the same page as the article itself.
Different pages on Minerva used different cookies, so logins would often have to be repeated.
There were random features such as upvoting and downvoting posts that were in the code, but not actually shown to users.
The articles switch between <br/>, <p></p>, and <td></td> tags for creating new lines.
Images and file icons on Minerva were often linked incorrectly, returning errors when shown on the original site. As a result, it took me a while of trying to locate the source of these errors, thinking they came from my end. 🙄
Nested tables were often used to display information.
Once the information was extracted, it can then be accessed via an API key that was sent to the users when they signed up. This key will also allow them to use a private, invite-only API I built for extracting data from Minerva.
This project is still ongoing, and has been distributed to over 50 users and accessed over 40,000 times between April - September 2016.