Logging & Monitoring
Logging & Monitoring
Overview
The Serverless Scrapper application uses a consistent, structured format for logging time-related information. This document details the timestamp and duration formats, log levels, and endpoint-specific examples to ensure clarity and traceability in performance monitoring and debugging. All logs include precise timing, memory usage, and contextual details to support effective analysis across the application's operations.
Time Logging Format
Timestamp Format
All timestamps are logged in UTC timezone using the ISO 8601 format:
YYYY-MM-DDTHH:MM:SSZExample: 2025-05-23T07:49:44Z
Duration Format
Time durations are logged in seconds with two decimal places:
X.XXsExample: 2.34s (2.34 seconds)
Log Levels
The application uses three log levels, each with specific time-related information:
INFO Level
Standard operational logs with timing information:
[INFO][req-xxxxxxx] Starting scrape job: url=..., user=..., project=...
[INFO][req-xxxxxxx] Page loaded in: X.XXs
[INFO][req-xxxxxxx] Total job completed in X.XXsDEBUG Level
Detailed timing information for debugging purposes:
[DEBUG][req-xxxxxxx] Handler selection took: X.XXs
[DEBUG][req-xxxxxxx] Memory usage after scraping: X.XXMB
[DEBUG][req-xxxxxxx] Browser initialization time: X.XXs
[DEBUG][req-xxxxxxx] URL parsing completed in: X.XXs
[DEBUG][req-xxxxxxx] Connection established in: X.XXsERROR Level
Error logs with timing context:
[ERROR][req-xxxxxxx] Timeout after X.XXs while connecting to URL
[ERROR][req-xxxxxxx] Failed to upload to S3 after X.XXs
[ERROR][req-xxxxxxx] Script execution exceeded time limit of X.XXs
[ERROR][req-xxxxxxx] Memory exceeded limit of X.XXMB at timestamp: YYYY-MM-DDTHH:MM:SSZEndpoint-Specific Log Examples
Each endpoint produces a mix of DEBUG, INFO, and ERROR level logs depending on the execution flow and outcome.
Root Endpoint (/) Logs
Example of successful root endpoint access:
[DEBUG][req-ab12cd34] Received request at: 2025-05-23T08:01:31Z
[DEBUG][req-ab12cd34] Handler selection took: 0.002s
[DEBUG][req-ab12cd34] Memory usage at handler start: 98.3MB
[INFO][req-ab12cd34] Request received from IP: 192.168.1.1, User-Agent: Chrome/136.0.7103.113
[INFO][req-ab12cd34] API version requested: v1
[INFO][req-ab12cd34] Root endpoint accessed, returning API info
[INFO][req-ab12cd34] Response generated in: 0.023s
[DEBUG][req-ab12cd34] Memory usage at completion: 98.5MBExample of error on root endpoint:
[DEBUG][req-ef56gh78] Received request at: 2025-05-23T08:02:45Z
[INFO][req-ef56gh78] Request received from IP: 192.168.1.1, User-Agent: Chrome/136.0.7103.113
[ERROR][req-ef56gh78] Error processing request after 0.015s: Invalid authorization header
[ERROR][req-ef56gh78] Request terminated at: 2025-05-23T08:02:45Z/scrape Endpoint Logs
Example of successful scrape operation:
[DEBUG][req-12345678] Handler received event at: 2025-05-23T08:05:00Z
[DEBUG][req-12345678] Handler selection took: 0.003s
[INFO][req-12345678] Starting scrape job: url=https://example.com, user=ariful, project=documentation
[DEBUG][req-12345678] Browser initialization time: 1.23s
[DEBUG][req-12345678] Memory usage before page load: 245.8MB
[INFO][req-12345678] Page loaded in: 2.34s
[DEBUG][req-12345678] DOM ready in: 1.04s
[DEBUG][req-12345678] Starting HTML extraction
[INFO][req-12345678] HTML saved in: 0.12s
[INFO][req-12345678] Markdown saved in: 0.16s
[INFO][req-12345678] Screenshots captured in 1.45s
[DEBUG][req-12345678] Memory usage after scraping: 432.5MB
[DEBUG][req-12345678] Memory increased by: 186.7MB
[DEBUG][req-12345678] S3 client initialization: 0.05s
[INFO][req-12345678] Starting upload to S3 bucket: serverless-scrapper-bucket
[INFO][req-12345678] Uploads completed in 3.21s
[INFO][req-12345678] Completed uploading 4 files to S3
[INFO][req-12345678] Total job completed in 7.12s
[DEBUG][req-12345678] Final memory usage: 433.1MBExample of error during scrape operation:
[DEBUG][req-87654321] Handler received event at: 2025-05-23T08:10:00Z
[INFO][req-87654321] Starting scrape job: url=https://problematic-site.com, user=ariful, project=documentation
[DEBUG][req-87654321] Browser initialization time: 1.45s
[DEBUG][req-87654321] Memory usage before page load: 250.3MB
[DEBUG][req-87654321] Page load started at: 2025-05-23T08:10:02Z
[ERROR][req-87654321] Timeout after 30.00s while connecting to URL
[ERROR][req-87654321] Memory at failure point: 512.3MB
[ERROR][req-87654321] Failed after running for: 31.52s
[ERROR][req-87654321] Request terminated at: 2025-05-23T08:10:33Z
[DEBUG][req-87654321] Browser cleanup time: 0.34s
[DEBUG][req-87654321] Final memory usage: 255.6MB/retrieve Endpoint Logs
Example of successful results retrieval:
[DEBUG][req-qwerty12] Handler received event at: 2025-05-23T08:15:00Z
[DEBUG][req-qwerty12] Handler selection took: 0.002s
[INFO][req-qwerty12] Starting results retrieval: user=ariful, project=documentation, job_id=123456789012
[DEBUG][req-qwerty12] S3 client initialization: 0.04s
[INFO][req-qwerty12] Checking S3 bucket for job results
[DEBUG][req-qwerty12] S3 list operation took: 0.21s
[INFO][req-qwerty12] Found 4 files in job folder
[DEBUG][req-qwerty12] File types: html, md, png, xz
[INFO][req-qwerty12] Generating pre-signed URLs with expiry: 1 hour
[DEBUG][req-qwerty12] URL generation took: 0.11s
[INFO][req-qwerty12] Results retrieval completed in 0.56s
[DEBUG][req-qwerty12] Response size: 1.2KB
[DEBUG][req-qwerty12] Final memory usage: 102.3MBExample of error during results retrieval:
[DEBUG][req-asdfgh56] Handler received event at: 2025-05-23T08:20:00Z
[INFO][req-asdfgh56] Starting results retrieval: user=ariful, project=documentation, job_id=000000000000
[DEBUG][req-asdfgh56] S3 client initialization: 0.05s
[INFO][req-asdfgh56] Checking S3 bucket for job results
[ERROR][req-asdfgh56] Error after 0.25s: Job results not found in S3
[ERROR][req-asdfgh56] Bucket check failed at: 2025-05-23T08:20:01Z
[DEBUG][req-asdfgh56] Error response generated in: 0.01s
[DEBUG][req-asdfgh56] Final memory usage: 101.2MBCommon Time-Related Log Entries
Job Duration
Logs the total time taken for a job to complete:
[INFO][req-xxxxxxx] Total job completed in X.XXsOperation Timings
Logs the time taken for specific operations:
[INFO][req-xxxxxxx] Page loaded in: X.XXs
[INFO][req-xxxxxxx] HTML saved in: X.XXs
[INFO][req-xxxxxxx] Markdown saved in: X.XXs
[INFO][req-xxxxxxx] Screenshots captured in X.XXs
[INFO][req-xxxxxxx] Uploads completed in X.XXsMemory Usage
Logs memory usage with time context:
[INFO][req-xxxxxxx] Memory usage after scraping: X.XXMBTime Zone Handling
- All timestamps are logged in UTC timezone
- The 'Z' suffix indicates UTC time
- The application maintains consistent time zone throughout its operations
Best Practices for Time Logging
- Always log time durations with two decimal places
- Use consistent units (seconds) for all timing measurements
- Include timestamps for key operations
- Log memory usage alongside time-sensitive operations
- Use consistent formatting across all time-related logs
- Include endpoint-specific context in log messages
Example Log Sequences by Level
INFO Level Example (Successful Scrape)
[INFO][req-12345678] Starting scrape job: url=..., user=..., project=...
[INFO][req-12345678] Page loaded in: 2.34s
[INFO][req-12345678] HTML saved in: 0.12s
[INFO][req-12345678] Markdown saved in: 0.16s
[INFO][req-12345678] Screenshots captured in 1.45s
[INFO][req-12345678] Starting upload to S3 bucket: serverless-scrapper-bucket
[INFO][req-12345678] Uploads completed in 3.21s
[INFO][req-12345678] Completed uploading 4 files to S3
[INFO][req-12345678] Total job completed in 7.12sDEBUG Level Example
[DEBUG][req-12345678] Handler received event at: 2025-05-23T08:00:01Z
[DEBUG][req-12345678] Handler selection took: 0.01s
[DEBUG][req-12345678] Browser initialization time: 1.23s
[DEBUG][req-12345678] DOM ready in: 1.04s
[DEBUG][req-12345678] Request headers received in: 0.07s
[DEBUG][req-12345678] Memory usage before scraping: 245.8MB
[DEBUG][req-12345678] Memory usage after scraping: 432.5MB
[DEBUG][req-12345678] Memory increased by: 186.7MB
[DEBUG][req-12345678] CPU usage during scraping: 78.5%
[DEBUG][req-12345678] S3 client initialization: 0.05sERROR Level Example
[ERROR][req-12345678] Timeout after 30.00s while connecting to URL
[ERROR][req-12345678] Connection attempt made at: 2025-05-23T08:00:10Z
[ERROR][req-12345678] Memory at failure point: 512.3MB
[ERROR][req-12345678] Failed after running for: 30.12s
[ERROR][req-12345678] Request terminated at: 2025-05-23T08:00:40ZImplementation Details
- Time measurements are captured using
perf_counter()for high precision - All time logs include a request ID
[req-xxxxxxx]for traceability - Timestamps are generated using
datetime.datetime.now(datetime.timezone.utc) - Memory usage is tracked using
psutil.Process.memory_info() - Log level is controlled by the
LOG_LEVELenvironment variable - Log levels are mapped to numerical values: debug (10), info (20), error (30)