How To Web Scrape Dynamic Website And JSON Decode With Detailed Examples
In this article, Flutter / Dart is the main language used to demonstrate how to web scrape a dynamic website and then JSON decode into Map or List. But the general idea also applies to Java and Python.
When we talk about web scraping, we think of Beautiful Soup immediately. Unfortunately, most websites nowadays are dynamic. And Beautiful Soup only works for static websites.
Today I'll explain how to scrape a dynamic website. I'll use TheWeatherNetwork.com as an example. The Weather Network is my favorite weather forecast website.
1. Determine whether the website is dynamic
Take a look at the screen capture of The Weather Network website showing current weather of Vancouver, BC Canada.
This demo will grab those 3 pieces of information in red circles. They're current condition description, current temperature and current feels-like temperature.
Beautiful Soup can only analyze the page source content. Let's see if we can find current condition "A few clouds" in source page.
Here we use Google Chrome. Firefox will also work similarly.
Right click any blank area of the page and select "View page source".
A new tab/page opens. A quick Ctrl+F search of "A few clouds" finds 0/0 result.
This means TheWeatherNetwork is a dynamic website. "A few clouds" is not hard coded in the web page. It must be coming from somewhere else. And we are going to find out that "somewhere else".
Now go back to TheWeatherNetwork.com and right click a blank area. Select "Inspect".
If you don't see "Network" on the top menu, click ">>" to select "Network".
On this "Network" page, you should see almost nothing. Because page loading is done and there is no network activity right now. If you do see some items show up, click "Clear" button to clear them. Then press Ctrl+R to reload the page. Press XHR to filter the items to only "XMLHttpRequest".
Now we get a bunch of XHR on the left side column. Click each item and click "Preview". Take a quick look at the content. The below XHR item is apparently about Amazon ad.
A quick review finds out most are ad-related requests. Only the first 2 or 3 items have weather data. And this "cabc0308" is exactly what we want. Note "cabc0308" is the place code for Vancouver. Other cities will have different codes.
Now click "Headers". Highlight the request URL and right click to select to go to that URL.
A new tab appears with all the data we want. After a little bit reformatting, we get this:
{
"observation":
{
"time":{"local":"2020-08-28T17:45","utc":"2020-08-29T00:45"},
"weatherCode":
{"value":"SCT","icon":2,"text":"A few clouds","bgimage":"clearday","overlay":"sunny"},
"temperature":21,"dewPoint":15,"feelsLike":23,
"wind":{"direction":"W","speed":11,"gust":17},
"relativeHumidity":69,"pressure":{"value":101.5,"trendKey":1},
"visibility":32,"ceiling":10000
},
"display":
{
"imageUrl":"//s1.twnmm.com/images/en_ca/",
"unit":
{"temperature":"C","dewPoint":"C","wind":"km/h","relativeHumidity":"%",
"pressure":"kPa","visibility":"km","ceiling":"m"
}
}
}
It starts with a "{". So it's a Map with 2 major keys: "observation" and "display". The keyword "A few clouds" is under
"observation" - "weatherCode" - "text" - "A few clouds"
The current temperature is under
"observation" - "temperature" - 21, which is an integer.
Now we can just reproduce the request:
https://weatherapi.pelmorex.com/api/v1/observation/placecode/cabc0308
by writing some simple codes and we'll get all the data.
import 'package:http/http.dart' as http;
...
var placeCode = "cabc0308";
var _searchURL =
'https://weatherapi.pelmorex.com/api/v1/observation/placecode/' +
placeCode;
var response = await http.Client().get(Uri.parse(_searchURL));
2. Flutter: JSON Decode In Simple Way
you can find the official JSON Decoding documents here: JSON and serialization
The method used in this article is simple manual approach. It's good for most small projects.
First convert the above "response" into Flutter Map.
import 'dart:convert';
...
Map <String dynamic> mResponse = json.decode(response.body);
To get the value "A few clouds" in the above example, we can code in several different ways:
print(mResponse['observation']['weatherCode']['text']);
//or
var observation = Map<String, dynamic>.from(mResponse['observation'] ?? '');
print(observation['weatherCode']['text'];
//or
var weatherCode = Map<String, dynamic>.from(observation['weatherCode'] ?? '');
print(weatherCode['text'];
Similarly, we can grab current temperature and current feels-like temperature.
There's one problem here. In the above request URL
https://weatherapi.pelmorex.com/api/v1/observation/placecode/cabc0308
we used the place code "cabc0308" for Vancouver. What if we don't know the city name beforehand. How can we get any city's place code programmatically?
Let's go back to TheWeatherNetwork page. There's a location search bar on the top area. Type in a city name without hitting Enter and some location suggestion shows up immediately.
Now get into "Network" page. Type "toronto" in the search bar again. An XHR item shows up. Click "Headers" and we can get the request URL right away.
The original URL is
We can rewrite it as
https://www.theweathernetwork.com/api/location/search?searchText=toronto&lat=&long=
Copy and paste the URL to the browser and get the raw data like this:
It starts with a "[". So it's a List of Maps. Here are the codes to JSON decode the raw data:var _searchURL =
'https://www.theweathernetwork.com/api/location/search?searchText=' +
_cityInputValue +
'&lat=&long=';
final response = await http.Client().get(Uri.parse(_searchURL));
if (response.statusCode == 200) { // connection successful
setState(() {
_saving = false;
});
cityList = new List();
var jList = json.decode(response.body) as List;
jList.forEach((element) {
var mElement = Map<String, dynamic>.from(element ?? '');
if (mElement['type'] == 'city') {
cityList.add(mElement);
}
});
We can search Toronto ON Canada by
"type" == "city" and "province" == "Ontario".
Then we get Toronto's place code
"code":"caon0696".
Below are the complete codes of this demo and an app running gif.
File pubspec.yaml
name: flutter_web_scraping_dynamic
description: A new Flutter application.
publish_to: 'none' # Remove this line if you wish to publish to pub.dev
version: 1.0.0+1
environment:
sdk: ">=2.7.0 <3.0.0"
dependencies:
flutter:
sdk: flutter
cupertino_icons: ^0.1.3
# TODO: add these dependencies
modal_progress_hud: ^0.1.3
http: ^0.12.2
dev_dependencies:
flutter_test:
sdk: flutter
flutter:
uses-material-design: true
File main.dart
import 'package:flutter/material.dart';
import 'package:modal_progress_hud/modal_progress_hud.dart';
// when loading, display a circle progress indicator
import 'package:http/http.dart' as http;
import 'dart:convert';
List<Map> cityList;
var cityIndex;
void main() {
runApp(MyApp());
}
class MyApp extends StatelessWidget {
@override
Widget build(BuildContext context) {
return MaterialApp(
title: 'Flutter Web Scraping Dynamic Demo',
theme: ThemeData(
primarySwatch: Colors.blue,
visualDensity: VisualDensity.adaptivePlatformDensity,
),
home: MyHomePage(title: 'Flutter Web Scraping Dynamic Demo'),
);
}
}
class MyHomePage extends StatefulWidget {
MyHomePage({Key key, this.title}) : super(key: key);
final String title;
@override
_MyHomePageState createState() => _MyHomePageState();
}
class _MyHomePageState extends State<MyHomePage> {
String _cityInputValue;
String _strSearchTips = '';
bool _saving = false; // for modal_progress_hud
ListView _searchPage() {
return ListView(
padding: const EdgeInsets.all(8),
children: <Widget>[
ListTile(
title: Text('Enter your city name:'),
subtitle: Text('(e.g. Vancouver)'),
),
TextField(
onChanged: (value) {
_cityInputValue = value;
},
// add a decorating border
decoration: InputDecoration(
contentPadding: EdgeInsets.all(10.0),
border: OutlineInputBorder(
borderRadius: BorderRadius.circular(15.0),
)),
),
Container(
margin:
const EdgeInsets.only(left: 120, right: 120, top: 30, bottom: 20),
child: RaisedButton(
onPressed: () {
_locationSearch();
},
child: const Text(
'SEARCH',
style: TextStyle(fontSize: 16),
),
),
),
Text(_strSearchTips, //displaying warning message
style: TextStyle(
color: Colors.brown,
)),
],
);
}
@override
Widget build(BuildContext context) {
return Scaffold(
appBar: AppBar(
title: Text(widget.title),
),
body: ModalProgressHUD(child: _searchPage(), inAsyncCall: _saving),
//while loading, display progress indicator
);
} //widget build end
_locationSearch() async {
setState(() {
_strSearchTips = '';
_saving = true;
});
if (_cityInputValue != null) {
_cityInputValue = _cityInputValue.trim();
}
if (_cityInputValue == null || _cityInputValue == '') {
FocusScope.of(context).unfocus(); //remove keyboard
setState(() {
_strSearchTips = '!!!Please enter a valid location name.';
_saving = false;
});
return;
}
var _searchURL =
'https://www.theweathernetwork.com/api/location/search?searchText=' +
_cityInputValue +
'&lat=&long=';
final response = await http.Client().get(Uri.parse(_searchURL));
if (response.statusCode == 200) { // connection successful
setState(() {
_saving = false;
});
cityList = new List();
var jList = json.decode(response.body) as List;
jList.forEach((element) {
var mElement = Map<String, dynamic>.from(element ?? '');
if (mElement['type'] == 'city') {
cityList.add(mElement);
}
});
if (cityList.length == 0) {
FocusScope.of(context).unfocus();
setState(() {
_strSearchTips = '!!!No matching location found. Try another name.';
});
} else {
Navigator.push(
context,
MaterialPageRoute(builder: (context) => SelectResultRoute()),
);
}
} else {
// status code != 200
setState(() {
_saving = false;
});
throw Exception('Server busy. Try again later.');
}
} //_location search end
} //MyHomePage end
class SelectResultRoute extends StatelessWidget {
@override
Widget build(BuildContext context) {
return Scaffold(
appBar: AppBar(
title: Text("Flutter Web Scraping Dynamic Demo"),
),
body: Column(
children: <Widget>[
ListTile(
title: Text(
'Select your location below:',
style: TextStyle(color: Colors.brown),
),
),
Expanded(
child: ListView.builder(
padding: const EdgeInsets.all(8),
itemCount: cityList.length,
itemBuilder: (BuildContext context, int index) {
var city = cityList[index]['name'] +
' ' +
cityList[index]['provcode'] +
' ' +
cityList[index]['country'];
var colorIndex = ((index + 1) % 9) * 100; //background color
return Card(
color: Colors.blue[(colorIndex == 0) ? 50 : colorIndex],
child: ListTile(
title: Text(city),
onTap: () {
cityIndex = index;
Navigator.push(
context,
MaterialPageRoute(
builder: (context) => CurrentWeatherRoute()),
);
},
));
}, //item builder
),
)
],
),
);
} //widget build
}
class CurrentWeatherRoute extends StatefulWidget {
@override
_CurrentWeatherState createState() => _CurrentWeatherState();
}
class _CurrentWeatherState extends State<CurrentWeatherRoute> {
var _weather1, _weather2, _weather3; //info to display
bool _saving = false;
@override
void initState() {
super.initState();
_getCurrentWeather();
}
Future _getCurrentWeather() async {
var placeCode = cityList[cityIndex]['code'] ?? '';
var _searchURL =
'https://weatherapi.pelmorex.com/api/v1/observation/placecode/' +
placeCode;
var response = await http.Client().get(Uri.parse(_searchURL));
if (response.statusCode != 200) {
throw Exception('Server busy. Try again later.');
} else {
Map<String, dynamic> mResponse = json.decode(response.body);
var observation =
Map<String, dynamic>.from(mResponse['observation'] ?? '');
var weatherCode =
Map<String, dynamic>.from(observation['weatherCode'] ?? '');
_weather1 = weatherCode['text'];
_weather2 = observation['temperature'].toString() +
'°' +
mResponse['display']['unit']['temperature'];
_weather3 = observation['feelsLike'].toString() +
'°' +
mResponse['display']['unit']['temperature'];
}
setState(() { //after get all information, refresh page
_saving = false;
});
}
Widget _result() {
return ListView(
children: <Widget>[
ListTile(
title: Text(cityList[cityIndex]['name'] +
' ' +
cityList[cityIndex]['provcode'] +
' ' +
cityList[cityIndex]['country']),
),
ListTile(title: Text('Current Weather:')),
ListTile(
title: Row(
children: <Widget>[
Text(_weather1 ?? ''), //_weather1 is null before we get data from website
//so display blank meanwhile
Text(' '),
Text(_weather2 ?? ''),
Text(' Feels Like '),
Text(_weather3 ?? ''),
],
),
),
],
);
}
@override
Widget build(BuildContext context) {
return Scaffold(
appBar: AppBar(
title: Text("Flutter Web Scraping Dynamic Demo"),
),
body: ModalProgressHUD(child: _result(), inAsyncCall: _saving),
);
}
}
Comments
Post a Comment