Web (frameworks, scraping, automation, and more)
Table of contents
- The Flask framework
- The
celerymodule - The
webbrowsermodule - The
requestsmodule - The
BeautifulSoupmodule - The
googlesearchmodule - The
scrapymodule - The
seleniummodule - The
socketmodule
The Flask framework
Flask is a lightweight Python web framework used for creating websites, APIs, dashboards, and web applications. It provides routing, request handling, template rendering, and many extensions for authentication, databases, and forms.
A Flask application consists of routes. A route defines what happens when a user visits a given URL.
# Creating a basic Flask application
from flask import Flask
app = Flask(__name__)
@app.route("/")
def home():
return "Hello World!"
app.run() # starting the web server
The render_template() function allows us to display HTML templates stored in the templates directory. Templates are regular HTML files that define the structure and appearance of web pages.
from flask import Flask, render_template
app = Flask(__name__)
@app.route("/")
def home():
return render_template("index.html")
app.run()
Flask also supports handling URL parameters and form data.
from flask import Flask
app = Flask(__name__)
@app.route("/user/<name>")
def user(name):
return f"Hello {name}!"
app.run()
Flask also supports handling form data sent using POST requests.
from flask import Flask, request
app = Flask(__name__)
@app.route("/login", methods=["POST"])
def login():
username = request.form["username"] # getting form data
password = request.form["password"]
return f"Logged in as {username}"
app.run()
The celery module
The celery module is used for running background tasks asynchronously. It works with a message broker (such as Redis) to queue tasks and send them to worker processes that execute them outside the main program. It is commonly used to send emails, process files, handle queues, and run scheduled tasks without blocking the application.
from celery import Celery
app = Celery("tasks", broker="redis://localhost:6379/0")
@app.task
def add(x, y):
return x + y
result = add.delay(5, 3) # running the task asynchronously (arguments passed to the task function)
The webbrowser module
The webbrowser module allows us to open websites directly in the default web browser, which is useful for quickly launching web pages, automating simple navigation tasks, and integrating Python scripts with online resources.
import webbrowser
webbrowser.open("https://google.com") # opening a website
query = "Python tutorials"
webbrowser.open(f"https://google.com/search?q={query}") # performing a Google search
The requests module
The requests module is used for sending HTTP requests. It allows us to communicate with websites and APIs.
import requests
# Sending a GET request
response = requests.get("https://api.github.com")
print(response.status_code)
print(response.text)
# Sending a POST request
data = {"name": "Tom", "age": 25}
response = requests.post("https://httpbin.org/post", json=data)
print(response.json())
The BeautifulSoup module
The BeautifulSoup module is used for parsing HTML and XML documents. It is commonly used in web scraping.
Write the from bs4 import BeautifulSoup statement at the beginning of every example.
# Parsing HTML content
html = "<h1>Hello</h1>"
soup = BeautifulSoup(html, "html.parser")
print(soup.h1.text)
# Extracting links from HTML
html = "<a href='https://google.com'>Google</a>"
soup = BeautifulSoup(html, "html.parser")
link = soup.a["href"] # getting the href attribute
print(link)
The googlesearch module
The googlesearch module allows us to perform Google searches and extract search results.
from googlesearch import search
for result in search("Python programming", num_results=5):
print(result)
The scrapy module
The scrapy module is a powerful web scraping framework used for extracting data from websites.
# Creating a Scrapy spider that extracts the page title from a website
import scrapy
class ExampleSpider(scrapy.Spider):
name = "example"
start_urls = ["https://example.com"]
def parse(self, response):
yield {
"title": response.css("title::text").get()
}
The selenium module
The selenium module is used for browser automation and testing web applications. It allows us to open websites, click buttons, fill forms, and interact with webpage elements automatically. Selenium requires a browser driver such as ChromeDriver for Google Chrome.
from selenium import webdriver
driver = webdriver.Chrome() # opening the Chrome browser
driver.get("https://google.com") # opening a webpage
print(driver.title) # displaying the page title
driver.quit() # closing the browser
# Filling a text field and clicking a button
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://google.com")
search = driver.find_element(By.NAME, "q") # finding the search field
search.send_keys("Python Selenium") # typing text into the field
button = driver.find_element(By.NAME, "btnK") # finding the search button
button.click() # clicking the button
driver.quit()
The socket module
The socket module allows us to create network connections and communicate over TCP or UDP protocols.
import socket
# Creating a client socket
client = socket.socket()
client.connect(("google.com", 80)) # connecting to a server
client.send(b"Hello") # sending bytes
client.close() # closing the connection
# Receiving data from a server
client = socket.socket()
client.connect(("example.com", 80))
client.send(b"GET / HTTP/1.1\r\nHost: example.com\r\n\r\n") # sending an HTTP request
data = client.recv(1024) # receiving up to 1024 bytes
print(data.decode()) # converting bytes to text
client.close()
To choose TCP or UDP in the socket module, we set the socket type: TCP uses SOCK_STREAM for reliable connections like socket.socket(socket.AF_INET, socket.SOCK_STREAM), while UDP uses SOCK_DGRAM for fast, connectionless communication like socket.socket(socket.AF_INET, socket.SOCK_DGRAM). AF_INET defines the IPv4 address family used for internet communication.