Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] How to Get an HTML Page from a URL in Python?

#1
How to Get an HTML Page from a URL in Python?

This tutorial shows you how to perform simple HTTP get requests to get an HTML page from a given URL in Python!

Problem Formulation


Given a URL as a string. How to extract the HTML from the given URL and store the result in a Python string variable?

Example: Say, you want to accomplish the following:

url = 'https://google.com' # ... Code to extract HTML page here ... print(result)
# ... Google HTML file: '''
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title>... '''

Let’s study the four most important methods to access a website in your Python script!

Method 1: requests.get(url)


How to Get an HTML Page from a URL in Python?

The simplest solution is the following:

import requests
print(requests.get(url = 'https://google.com').text)

Here’s how this one-liner works:

  • Import the Python library requests that handles the details of requesting the websites from the server in an easy-to-process format.
  • Use the requests.get(...) method to access the website and pass the URL 'https://google.com' as an argument so that the function knows which location to access.
  • Access the actual body of the get request (the return value is a request object that also contains some useful meta information like the file type, etc.).
  • Print the result to the shell.

The output is the desired Google website:

'''
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title>... '''

Note that you may have to install the requests library with the following command in your operating system terminal:

$ pip install requests

Method 2: One-Liner with requests.get()


Sometimes you don’t want to open an interactive Python session to access the URL. No problem, you can make the previous solution a one-liner and run it from your operating system command line or terminal.

Note that the semicolon is used to one-linerize the previously discussed method. This is useful if you want to run this command from your operating system with the following command:

python -r "import requests; print(requests.get(url = 'https://google.com').text)"

The output, again, is the desired Google HTML page:

'''
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title>... '''

Method 3: urllib.request


A recommended way to fetch web resources from a website is the urllib.request() function. This also works to create a simple one-liner to access the Google website in Python 3 as before:

import urllib.request as r
page = r.urlopen('https://google.com')
print(page.read())

Again, you return a Request object that can be accessed to read the server’s response.

Note that this reads the file as a byte string. If you want to read the HTML file as a string, you need to convert the result using Python’s decode() method:

import urllib.request as r
page = r.urlopen('https://google.com')
print(page.read().decode('utf8'))

Here’s the output of this code snippet with most of the HTML content omitted for brevity.

<!doctype html>...</html> 

Method 4: One-Liner with urllib.request


You can also cram everything into a single line so that you can run it from your OS’s terminal:

python -r "import urllib.request as r; print(r.urlopen('https://google.com').read())"

Try It Yourself


You can try Methods 1 and 3 yourself in our interactive Juypter notebook with your own desired website URL:

How to Get an HTML Page from a URL in Python?<br />
Interactive Shell
Click to get the HTML code from your own URL in your browser.

To boost your skills in Python, feel free to check out the world’s most comprehensive Python email academy and download your Python cheat sheets here:

The post How to Get an HTML Page from a URL in Python? first appeared on Finxter.



https://www.sickgaming.net/blog/2021/03/...in-python/
Reply



Possibly Related Threads…
Thread Author Replies Views Last Post
  [Tut] Python help() xSicKxBot 0 65 01-22-2021, 01:02 AM
Last Post: xSicKxBot
  [Tut] Python complex() — A Useless Python Feature? xSicKxBot 0 73 12-19-2020, 02:05 AM
Last Post: xSicKxBot
  [Tut] Python next() xSicKxBot 0 69 11-22-2020, 08:17 AM
Last Post: xSicKxBot
  [Tut] Hello World! A Python One-Liner to Get Started with Python Quickly xSicKxBot 0 121 09-07-2020, 06:10 PM
Last Post: xSicKxBot
  [Tut] HTML Parsing using Python and LXML xSicKxBot 0 146 07-19-2020, 07:47 PM
Last Post: xSicKxBot
  [Tut] Python Join List of Bytes (and What’s a Python Byte Anyway?) xSicKxBot 0 188 06-10-2020, 05:38 AM
Last Post: xSicKxBot
  [Tut] Python Re Dot xSicKxBot 0 317 01-28-2020, 08:01 AM
Last Post: xSicKxBot

Forum Jump:

[-]
Active Threads
[Tut] Private and Public Attributes in P...
Last Post: xSicKxBot
Yesterday 11:05 PM
» Replies: 0
» Views: 3
(Indie Deal) Adventure Tales Bundle, Qua...
Last Post: xSicKxBot
Yesterday 11:05 PM
» Replies: 0
» Views: 4
Mobile - Arena battler Smash Legends goe...
Last Post: xSicKxBot
Yesterday 11:05 PM
» Replies: 0
» Views: 3
News - Epic Secures $1 Billion In Fundin...
Last Post: xSicKxBot
Yesterday 11:04 PM
» Replies: 0
» Views: 3
News - Anniversary: The Animal Crossing ...
Last Post: xSicKxBot
Yesterday 11:04 PM
» Replies: 0
» Views: 3
News - TMNT: Shredder's Revenge Confirme...
Last Post: xSicKxBot
Yesterday 11:04 PM
» Replies: 0
» Views: 4
Unigine 2.14 Released
Last Post: xSicKxBot
Yesterday 03:06 PM
» Replies: 0
» Views: 8
News - Lawson Will Offer New Pokémon Sna...
Last Post: xSicKxBot
Yesterday 03:06 PM
» Replies: 0
» Views: 7
News - Nacon Announces WRC 10, Speeds On...
Last Post: xSicKxBot
Yesterday 03:06 PM
» Replies: 0
» Views: 5
News - Blizzard Arcade Collection Gets T...
Last Post: xSicKxBot
Yesterday 03:06 PM
» Replies: 0
» Views: 15

[-]
Twitter

[-]
Sponsored
Get the Deal of the Week at RefurBees.com

Copyright © SickGaming.net 2012-2020