Python Examples
Web scraping with Python
What is web scraping you ask? Simply put, web scraping is an efficient means for extracting data from websites. However, the real beauty of web scraping is the ability to use or manipulate any extracted data in a way that is useful for you. Therefore, it makes sense that this technique has found a lot of recent success within monitoring applications - such as price comparisons, weather data monitoring, stock market analysis; that is, any application that benefits from constantly updating your dataset.
Requirements:
- Python v3.6.x: your programming language
- BeautifulSoup: webscraping module for Python
- Requests: Python module for 'pinging' websites
- Pandas: dataframe/table module for Python
- NumPy: basically Python's version of MATLAB
- Matplotlib: so you can make those pretty graphs
- Jupyter Notebook: technically not required, but highly useful
Mindset
Full disclosure: I love Major League Baseball. Therefore, the following web scraping example is going to revolve around one of my favorite morning activites: checking the box scores! There is indeed a level of satisfaction for writing a piece of code that allows you to fire up your laptop, grab a cup of coffee, type two words into your terminal window, and be greeted by a visual representation of what happened around the league the day before. Bliss.
Below is a snapshot of what my web scraping code produces after being run. The runtime is variable, but typically takes < 2 seconds. You can find a notebook that contains a detailed account of my code on my github account: Webscraping Notebook. I hope that there are enough details to lead you along there; however, please do not hesitate to email (jolyphant@email.arizona.edu) me with questions!
