Web Crawler Practice @ NTNU
Course Content

Many datasets are available on line and often consist of a vast amount of data. Downloading the entire dataset quickly can be a challenge. However, manually selecting data to download can result in missing important information. This course aims to teach you the most widely used web crawler package in Python for both static and dynamic websites, enabling you to efficiently download the desired datasets.

Course Intro.

01 :: Course Introduction
Contents: (1) About CCH (2) Course intro (3) Grading policy (4) Why do you need to take this course? (5) What will you learn from this course? (6) Textbook

Web Crawler Ethics

02 :: Web Crawler Ethics
Contents: (1) The Definition of Web Crawler (2) Legal Issues (3) Regulations (4) Ethical Problems (5) LawsuitExamples (6) Questions

Web Design I :: HTML

03 :: Web Design I – HTML
Content: (1) What is HTML? (2) Editors: Sublime (3) How to Design My First Website? (4) Website Architecture (5) Lorem Ipsum (6) Metadata (7) Elements (8) Template (9) GitHub (10) Assignment

Web Design II :: CSS

04 :: Web Design II – CSS
Content: (1) What is CSS? (2) CSS Syntax (3) Comments (4) Selectors (5) Units (6) Colors (7) Boundaries (8) Texts (9) Lists (10) Positions (11) Images (12) Navigation Bar (13) Layout (14) Responsive Web Design (15) Assignment

Web Design III :: JavaScript Basic

05 :: Web Design III – JavaScript Basic
Content: (1) Why JavaScript? (2) Syntax (3) JS in HTML (4) Output (5) Variables (6) Operators (7) Data Types (8) Objects (9) Events (10) Strings (11) Numbers (12) Arrays (13) Math (14) Type Conversion (15) Flow Control (16) Assignment

Web Design IV :: JavaScript Advance

06 :: Web Design IV – JavaScript Advance
Content: (1) What’s JSON? (2) JSON I/O (3) What’s Chart.js? (4) What’s D3.js?



Web Architecture

07 :: Web Architecture
Content: (1) Introduction (2) Equipment (3) Interconnection Model (4) Lab Practice

Static Web Crawler :: PTT Crawler

08 :: Static Web Crawler: PTT Crawler
Content: (1) Packages (2) Elements (3) BeautifulSoup

reCAPTCHA

09 :: reCAPTCHA
Content: (1) reCAPTCHA (2) Apply reCAPTCHA API (3) Pytesseract

Dynamic Web Crawler I :: Selenium & TikTok

10 :: Dynamic Web Crawler I: Selenium & TikTok
Content: (1) Introduction (2) Selenium (3) Web Driver (4) Element Indexing (5) Assignment

Dynamic Web Crawler II :: Facebook Crawler

11 :: Dynamic Web Crawler II: Facebook Crawler
Content: (1) Target Settings (2) Environment & Tool Preparation (3) FB Crawler Framework (4) Lab Practice

Dynamic Web Crawler III :: 591 Housing Crawler

12 :: Dynamic Web Crawler III: 591 Crawler
Content: (1) 591 Website (2) Website Framework (3) Dropdown Selection (4) Condition Filtering (5) Assignment