How To Web Scrape Dynamic Website And JSON Decode With Detailed Examples

In this article, Flutter / Dart is the main language used to demonstrate how to web scrape a dynamic website and then JSON decode into Map or List. But the general idea also applies to Java and Python.


When we talk about web scraping, we think of Beautiful Soup immediately. Unfortunately, most websites nowadays are dynamic. And Beautiful Soup only works for static websites.

Today I'll explain how to scrape a dynamic website. I'll use TheWeatherNetwork.com as an example. The Weather Network is my favorite weather forecast website.

1. Determine whether the website is dynamic

Take a look at the screen capture of The Weather Network website showing current weather of Vancouver, BC Canada.

Scrape Dynamic Website

This demo will grab those 3 pieces of information in red circles. They're current condition description, current temperature and current feels-like temperature.

Beautiful Soup can only analyze the page source content. Let's see if we can find current condition "A few clouds" in source page. 

Here we use Google Chrome. Firefox will also work similarly.

Right click any blank area of the page and select "View page source".

Web Scrape Dynamic

A new tab/page opens. A quick Ctrl+F search of "A few clouds" finds 0/0 result.

Scrape Dynamic Website

This means TheWeatherNetwork is a dynamic website. "A few clouds" is not hard coded in the web page. It must be coming from somewhere else. And we are going to find out that "somewhere else".

Now go back to TheWeatherNetwork.com and right click a blank area. Select "Inspect".

Web Scraping Dynamic

If you don't see "Network" on the top menu, click ">>" to select "Network".

Dynamic Website Scraping

On this "Network" page, you should see almost nothing. Because page loading is done and there is no network activity right now. If you do see some items show up, click "Clear" button to clear them. Then press Ctrl+R to reload the page. Press XHR to filter the items to only "XMLHttpRequest".

XHR Scraping Dynamic Websites

Now we get a bunch of XHR on the left side column. Click each item and click "Preview". Take a quick look at the content. The below XHR item is apparently about Amazon ad.

Network tool Scraping Dynami

A quick review finds out most are ad-related requests. Only the first 2 or 3 items have weather data. And this "cabc0308" is exactly what we want. Note "cabc0308" is the place code for Vancouver. Other cities will have different codes.

Scraping Dynamic Website

Now click "Headers". Highlight the request URL and right click to select to go to that URL.

Web Scrape Dynamic

A new tab appears with all the data we want. After a little bit reformatting, we get this:
{
  "observation":
   {
     "time":{"local":"2020-08-28T17:45","utc":"2020-08-29T00:45"},
     "weatherCode":
      {"value":"SCT","icon":2,"text":"A few clouds","bgimage":"clearday","overlay":"sunny"},
     "temperature":21,"dewPoint":15,"feelsLike":23,
     "wind":{"direction":"W","speed":11,"gust":17},
     "relativeHumidity":69,"pressure":{"value":101.5,"trendKey":1},
     "visibility":32,"ceiling":10000
   },

  "display":
   {
     "imageUrl":"//s1.twnmm.com/images/en_ca/",
     "unit":
      {"temperature":"C","dewPoint":"C","wind":"km/h","relativeHumidity":"%",
       "pressure":"kPa","visibility":"km","ceiling":"m"
      }
   }

}       
 

It starts with a "{". So it's a Map with 2 major keys: "observation" and "display". The keyword "A few clouds" is under 

  "observation" - "weatherCode" - "text" - "A few clouds"

The current temperature is under 

  "observation" - "temperature" - 21, which is an integer.

Now we can just reproduce the request:

 https://weatherapi.pelmorex.com/api/v1/observation/placecode/cabc0308

by writing some simple codes and we'll get all the data.

import 'package:http/http.dart' as http;
...
var placeCode = "cabc0308";
var _searchURL =
        'https://weatherapi.pelmorex.com/api/v1/observation/placecode/' +
            placeCode;
var response = await http.Client().get(Uri.parse(_searchURL));       
 


2. Flutter: JSON Decode In Simple Way

you can find the official JSON Decoding documents here: JSON and serialization

The method used in this article is simple manual approach. It's good for most small projects.

First convert the above "response" into Flutter Map.

import 'dart:convert';
...
Map <String dynamic> mResponse = json.decode(response.body);
        

To get the value "A few clouds" in the above example, we can code in several different ways:

print(mResponse['observation']['weatherCode']['text']);
//or
var observation = Map<String, dynamic>.from(mResponse['observation'] ?? '');
print(observation['weatherCode']['text'];
//or
var weatherCode = Map<String, dynamic>.from(observation['weatherCode'] ?? '');
print(weatherCode['text'];       
 

Similarly, we can grab current temperature and current feels-like temperature.

There's one problem here. In the above request URL

 https://weatherapi.pelmorex.com/api/v1/observation/placecode/cabc0308

we used the place code "cabc0308" for Vancouver. What if we don't know the city name beforehand. How can we get any city's place code programmatically? 

Let's go back to TheWeatherNetwork page. There's a location search bar on the top area. Type in a city name without hitting Enter and some location suggestion shows up immediately.

Web Scraping Dynamic

Now get into "Network" page. Type "toronto" in the search bar again. An XHR item shows up. Click "Headers" and we can get the request URL right away. 

Dynamic Website Scraping

 

The original URL is

https://www.theweathernetwork.com/ca/api/location/search?searchText=toronto&lat=49.1566&long=-123.0996 

We can rewrite it as

https://www.theweathernetwork.com/api/location/search?searchText=toronto&lat=&long=

Copy and paste the URL to the browser and get the raw data like this:

It starts with a "[". So it's a List of Maps. Here are the codes to JSON decode the raw data:

var _searchURL =
'https://www.theweathernetwork.com/api/location/search?searchText=' +
_cityInputValue +
'&lat=&long=';
final response = await http.Client().get(Uri.parse(_searchURL));
if (response.statusCode == 200) { // connection successful
setState(() {
_saving = false;
});

cityList = new List();
var jList = json.decode(response.body) as List;
jList.forEach((element) {
var mElement = Map<String, dynamic>.from(element ?? '');
if (mElement['type'] == 'city') {
cityList.add(mElement);
}
});

We can search Toronto ON Canada by 

"type" == "city" and "province" == "Ontario".

Then we get Toronto's place code 

"code":"caon0696".

Below are the complete codes of this demo and an app running gif.

 

 File pubspec.yaml

name: flutter_web_scraping_dynamic
description: A new Flutter application.

publish_to: 'none' # Remove this line if you wish to publish to pub.dev

version: 1.0.0+1

environment:
sdk: ">=2.7.0 <3.0.0"

dependencies:
flutter:
sdk: flutter

cupertino_icons: ^0.1.3
# TODO: add these dependencies
modal_progress_hud: ^0.1.3
http: ^0.12.2

dev_dependencies:
flutter_test:
sdk: flutter

flutter:

uses-material-design: true


File main.dart

import 'package:flutter/material.dart';
import 'package:modal_progress_hud/modal_progress_hud.dart';
// when loading, display a circle progress indicator
import 'package:http/http.dart' as http;
import 'dart:convert';

List<Map> cityList;
var cityIndex;

void main() {
runApp(MyApp());
}

class MyApp extends StatelessWidget {
@override
Widget build(BuildContext context) {
return MaterialApp(
title: 'Flutter Web Scraping Dynamic Demo',
theme: ThemeData(
primarySwatch: Colors.blue,
visualDensity: VisualDensity.adaptivePlatformDensity,
),
home: MyHomePage(title: 'Flutter Web Scraping Dynamic Demo'),
);
}
}

class MyHomePage extends StatefulWidget {
MyHomePage({Key key, this.title}) : super(key: key);
final String title;

@override
_MyHomePageState createState() => _MyHomePageState();
}

class _MyHomePageState extends State<MyHomePage> {
String _cityInputValue;
String _strSearchTips = '';
bool _saving = false; // for modal_progress_hud

ListView _searchPage() {
return ListView(
padding: const EdgeInsets.all(8),
children: <Widget>[
ListTile(
title: Text('Enter your city name:'),
subtitle: Text('(e.g. Vancouver)'),
),
TextField(
onChanged: (value) {
_cityInputValue = value;
},
// add a decorating border
decoration: InputDecoration(
contentPadding: EdgeInsets.all(10.0),
border: OutlineInputBorder(
borderRadius: BorderRadius.circular(15.0),
)),
),
Container(
margin:
const EdgeInsets.only(left: 120, right: 120, top: 30, bottom: 20),
child: RaisedButton(
onPressed: () {
_locationSearch();
},
child: const Text(
'SEARCH',
style: TextStyle(fontSize: 16),
),
),
),
Text(_strSearchTips, //displaying warning message
style: TextStyle(
color: Colors.brown,
)),
],
);
}

@override
Widget build(BuildContext context) {
return Scaffold(
appBar: AppBar(
title: Text(widget.title),
),
body: ModalProgressHUD(child: _searchPage(), inAsyncCall: _saving),
//while loading, display progress indicator
);
} //widget build end

_locationSearch() async {
setState(() {
_strSearchTips = '';
_saving = true;
});

if (_cityInputValue != null) {
_cityInputValue = _cityInputValue.trim();
}

if (_cityInputValue == null || _cityInputValue == '') {
FocusScope.of(context).unfocus(); //remove keyboard
setState(() {
_strSearchTips = '!!!Please enter a valid location name.';
_saving = false;
});
return;
}

var _searchURL =
'https://www.theweathernetwork.com/api/location/search?searchText=' +
_cityInputValue +
'&lat=&long=';
final response = await http.Client().get(Uri.parse(_searchURL));
if (response.statusCode == 200) { // connection successful
setState(() {
_saving = false;
});

cityList = new List();
var jList = json.decode(response.body) as List;
jList.forEach((element) {
var mElement = Map<String, dynamic>.from(element ?? '');
if (mElement['type'] == 'city') {
cityList.add(mElement);
}
});

if (cityList.length == 0) {
FocusScope.of(context).unfocus();
setState(() {
_strSearchTips = '!!!No matching location found. Try another name.';
});
} else {
Navigator.push(
context,
MaterialPageRoute(builder: (context) => SelectResultRoute()),
);
}
} else {
// status code != 200
setState(() {
_saving = false;
});
throw Exception('Server busy. Try again later.');
}
} //_location search end

} //MyHomePage end


class SelectResultRoute extends StatelessWidget {
@override
Widget build(BuildContext context) {
return Scaffold(
appBar: AppBar(
title: Text("Flutter Web Scraping Dynamic Demo"),
),
body: Column(
children: <Widget>[
ListTile(
title: Text(
'Select your location below:',
style: TextStyle(color: Colors.brown),
),
),
Expanded(
child: ListView.builder(
padding: const EdgeInsets.all(8),
itemCount: cityList.length,
itemBuilder: (BuildContext context, int index) {
var city = cityList[index]['name'] +
' ' +
cityList[index]['provcode'] +
' ' +
cityList[index]['country'];
var colorIndex = ((index + 1) % 9) * 100; //background color
return Card(
color: Colors.blue[(colorIndex == 0) ? 50 : colorIndex],
child: ListTile(
title: Text(city),
onTap: () {
cityIndex = index;
Navigator.push(
context,
MaterialPageRoute(
builder: (context) => CurrentWeatherRoute()),
);
},
));
}, //item builder
),
)
],
),
);
} //widget build

}


class CurrentWeatherRoute extends StatefulWidget {
@override
_CurrentWeatherState createState() => _CurrentWeatherState();
}

class _CurrentWeatherState extends State<CurrentWeatherRoute> {
var _weather1, _weather2, _weather3; //info to display
bool _saving = false;

@override
void initState() {
super.initState();
_getCurrentWeather();
}

Future _getCurrentWeather() async {
var placeCode = cityList[cityIndex]['code'] ?? '';
var _searchURL =
'https://weatherapi.pelmorex.com/api/v1/observation/placecode/' +
placeCode;
var response = await http.Client().get(Uri.parse(_searchURL));
if (response.statusCode != 200) {
throw Exception('Server busy. Try again later.');
} else {
Map<String, dynamic> mResponse = json.decode(response.body);
var observation =
Map<String, dynamic>.from(mResponse['observation'] ?? '');
var weatherCode =
Map<String, dynamic>.from(observation['weatherCode'] ?? '');
_weather1 = weatherCode['text'];
_weather2 = observation['temperature'].toString() +
'°' +
mResponse['display']['unit']['temperature'];
_weather3 = observation['feelsLike'].toString() +
'°' +
mResponse['display']['unit']['temperature'];
}

setState(() { //after get all information, refresh page
_saving = false;
});
}

Widget _result() {
return ListView(
children: <Widget>[
ListTile(
title: Text(cityList[cityIndex]['name'] +
' ' +
cityList[cityIndex]['provcode'] +
' ' +
cityList[cityIndex]['country']),
),
ListTile(title: Text('Current Weather:')),
ListTile(
title: Row(
children: <Widget>[
Text(_weather1 ?? ''), //_weather1 is null before we get data from website
//so display blank meanwhile
Text(' '),
Text(_weather2 ?? ''),
Text(' Feels Like '),
Text(_weather3 ?? ''),
],
),
),
],
);
}

@override
Widget build(BuildContext context) {
return Scaffold(
appBar: AppBar(
title: Text("Flutter Web Scraping Dynamic Demo"),
),
body: ModalProgressHUD(child: _result(), inAsyncCall: _saving),
);
}
}


Comments

Popular posts from this blog

Android CameraX Picture And Video Capture Complete Code Tutorial

Flutter: call a method from another class / setState a page from outside that class with StreamController

How To Add AdMob To Flutter App Quick Tutorial