Get和Post请求爬取网页内容 – 课后作业

作业要求:
1.自定义一个MyServlet,分别处理GET和POST请求
2.创建两个爬虫文件 ,分别以GET和POST方式请求MyServlet.
可知:需要IDEA去创建MyServlet,Python去创建爬虫。
Admin

Servlet

package Servlet;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.io.PrintWriter;

@WebServlet("/TestServlet")
public class TestServlet extends HttpServlet {
    protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        response.setContentType("text/html;charset=utf-8");
        request.setCharacterEncoding("utf-8");
        PrintWriter out=response.getWriter();
        out.write("使用Post方式才能请求到我!");
    }

    protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        response.setContentType("text/html;charset=utf-8");
        request.setCharacterEncoding("utf-8");
        PrintWriter out=response.getWriter();
        out.write("使用Get方式才能请求到我!");
    }
}

Get.py

import urllib.request
import urllib.parse
# 导入包
header = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.50'}
response = urllib.request.urlopen("http://localhost:8080/TestServlet",data=None)
# data=None为不传入值,即为不带值请求(只看不操作)
html = response.read().decode("UTF-8")
# 读取响应,编码格式为“UTF-8”,将此值存入“html”。
print(html)
# 输出“html”。

Post.py

import urllib.request
import urllib.parse
# 导包
Postdata = bytes(urllib.parse.urlencode({"hello": "你好!"}).encode("utf-8"))
'''
1.将键入值("hello"为键,"hello"为值)转为bytes(字节型内容),编码格式为"utf-8"
2.将转化的值存入Postdata
'''
header = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.50'}
response = urllib.request.urlopen("http://localhost:8080/TestServlet", data=Postdata)
# 请求(打开)地址,传入数据"Postdata"的内容,上方已经定义,且将该请求存入response。
# 输入内容为post,为输入内容为get.
# 通俗易懂的解释:如果打开页面只是看,不去向服务器传输内容,那么就是get请求,如果是需要向服务器请求,那就是post,就类似登录为post.
html = response.read().decode("utf-8")
# 读取响应,编码格式为"utf-8"。
print(html)

项目打包下载

项目打包下载