samedi 19 août 2023

How to bypass the agreement checkbox with Python Requests?

I'm trying to bypass the agreement section before scraping this page. I had an idea to injecting PHP Sessions to open the page, but later realized that I must do the checkbox agreement manually before I run the code and get an access to the expected page.

This script below is what I'm doing so far:

import re
import requests
from bs4 import BeautifulSoup

session = requests.session()

#capture PHPSESSID
session.get("https://aviation.bmkg.go.id/web/")
c = str(session.cookies.get_dict())
c = re.sub("{|}|'", "", c).replace(": ","=")

newHeaders = {
    "Content-Type": "application/json",
    "User Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36",
    "Cookie": c
}

page = session.post("https://aviation.bmkg.go.id/web/metar_speci.php?", headers=newHeaders)

print("Status code: ", page.status_code)

print(page.content)

I had some suspicions to do a session.get with another page after doing some inspect elements but seems I can't get past the 400 status code with this route. TIA




Aucun commentaire:

Enregistrer un commentaire