I'm trying to bypass the agreement section before scraping this page. I had an idea to injecting PHP Sessions to open the page, but later realized that I must do the checkbox agreement manually before I run the code and get an access to the expected page.
This script below is what I'm doing so far:
import re
import requests
from bs4 import BeautifulSoup
session = requests.session()
#capture PHPSESSID
session.get("https://aviation.bmkg.go.id/web/")
c = str(session.cookies.get_dict())
c = re.sub("{|}|'", "", c).replace(": ","=")
newHeaders = {
"Content-Type": "application/json",
"User Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36",
"Cookie": c
}
page = session.post("https://aviation.bmkg.go.id/web/metar_speci.php?", headers=newHeaders)
print("Status code: ", page.status_code)
print(page.content)
I had some suspicions to do a session.get with another page after doing some inspect elements but seems I can't get past the 400 status code with this route. TIA
Aucun commentaire:
Enregistrer un commentaire