TSS Docs

Logging & Monitoring

Logging & Monitoring

Overview

The Serverless Scrapper application uses a consistent, structured format for logging time-related information. This document details the timestamp and duration formats, log levels, and endpoint-specific examples to ensure clarity and traceability in performance monitoring and debugging. All logs include precise timing, memory usage, and contextual details to support effective analysis across the application's operations.

Time Logging Format

Timestamp Format

All timestamps are logged in UTC timezone using the ISO 8601 format:

YYYY-MM-DDTHH:MM:SSZ

Example: 2025-05-23T07:49:44Z

Duration Format

Time durations are logged in seconds with two decimal places:

X.XXs

Example: 2.34s (2.34 seconds)

Log Levels

The application uses three log levels, each with specific time-related information:

INFO Level

Standard operational logs with timing information:

[INFO][req-xxxxxxx] Starting scrape job: url=..., user=..., project=...
[INFO][req-xxxxxxx] Page loaded in: X.XXs
[INFO][req-xxxxxxx] Total job completed in X.XXs

DEBUG Level

Detailed timing information for debugging purposes:

[DEBUG][req-xxxxxxx] Handler selection took: X.XXs
[DEBUG][req-xxxxxxx] Memory usage after scraping: X.XXMB
[DEBUG][req-xxxxxxx] Browser initialization time: X.XXs
[DEBUG][req-xxxxxxx] URL parsing completed in: X.XXs
[DEBUG][req-xxxxxxx] Connection established in: X.XXs

ERROR Level

Error logs with timing context:

[ERROR][req-xxxxxxx] Timeout after X.XXs while connecting to URL
[ERROR][req-xxxxxxx] Failed to upload to S3 after X.XXs
[ERROR][req-xxxxxxx] Script execution exceeded time limit of X.XXs
[ERROR][req-xxxxxxx] Memory exceeded limit of X.XXMB at timestamp: YYYY-MM-DDTHH:MM:SSZ

Endpoint-Specific Log Examples

Each endpoint produces a mix of DEBUG, INFO, and ERROR level logs depending on the execution flow and outcome.

Root Endpoint (/) Logs

Example of successful root endpoint access:

[DEBUG][req-ab12cd34] Received request at: 2025-05-23T08:01:31Z
[DEBUG][req-ab12cd34] Handler selection took: 0.002s
[DEBUG][req-ab12cd34] Memory usage at handler start: 98.3MB
[INFO][req-ab12cd34] Request received from IP: 192.168.1.1, User-Agent: Chrome/136.0.7103.113
[INFO][req-ab12cd34] API version requested: v1
[INFO][req-ab12cd34] Root endpoint accessed, returning API info
[INFO][req-ab12cd34] Response generated in: 0.023s
[DEBUG][req-ab12cd34] Memory usage at completion: 98.5MB

Example of error on root endpoint:

[DEBUG][req-ef56gh78] Received request at: 2025-05-23T08:02:45Z
[INFO][req-ef56gh78] Request received from IP: 192.168.1.1, User-Agent: Chrome/136.0.7103.113
[ERROR][req-ef56gh78] Error processing request after 0.015s: Invalid authorization header
[ERROR][req-ef56gh78] Request terminated at: 2025-05-23T08:02:45Z

/scrape Endpoint Logs

Example of successful scrape operation:

[DEBUG][req-12345678] Handler received event at: 2025-05-23T08:05:00Z
[DEBUG][req-12345678] Handler selection took: 0.003s
[INFO][req-12345678] Starting scrape job: url=https://example.com, user=ariful, project=documentation
[DEBUG][req-12345678] Browser initialization time: 1.23s
[DEBUG][req-12345678] Memory usage before page load: 245.8MB
[INFO][req-12345678] Page loaded in: 2.34s
[DEBUG][req-12345678] DOM ready in: 1.04s
[DEBUG][req-12345678] Starting HTML extraction
[INFO][req-12345678] HTML saved in: 0.12s
[INFO][req-12345678] Markdown saved in: 0.16s
[INFO][req-12345678] Screenshots captured in 1.45s
[DEBUG][req-12345678] Memory usage after scraping: 432.5MB
[DEBUG][req-12345678] Memory increased by: 186.7MB
[DEBUG][req-12345678] S3 client initialization: 0.05s
[INFO][req-12345678] Starting upload to S3 bucket: serverless-scrapper-bucket
[INFO][req-12345678] Uploads completed in 3.21s
[INFO][req-12345678] Completed uploading 4 files to S3
[INFO][req-12345678] Total job completed in 7.12s
[DEBUG][req-12345678] Final memory usage: 433.1MB

Example of error during scrape operation:

[DEBUG][req-87654321] Handler received event at: 2025-05-23T08:10:00Z
[INFO][req-87654321] Starting scrape job: url=https://problematic-site.com, user=ariful, project=documentation
[DEBUG][req-87654321] Browser initialization time: 1.45s
[DEBUG][req-87654321] Memory usage before page load: 250.3MB
[DEBUG][req-87654321] Page load started at: 2025-05-23T08:10:02Z
[ERROR][req-87654321] Timeout after 30.00s while connecting to URL
[ERROR][req-87654321] Memory at failure point: 512.3MB
[ERROR][req-87654321] Failed after running for: 31.52s
[ERROR][req-87654321] Request terminated at: 2025-05-23T08:10:33Z
[DEBUG][req-87654321] Browser cleanup time: 0.34s
[DEBUG][req-87654321] Final memory usage: 255.6MB

/retrieve Endpoint Logs

Example of successful results retrieval:

[DEBUG][req-qwerty12] Handler received event at: 2025-05-23T08:15:00Z
[DEBUG][req-qwerty12] Handler selection took: 0.002s
[INFO][req-qwerty12] Starting results retrieval: user=ariful, project=documentation, job_id=123456789012
[DEBUG][req-qwerty12] S3 client initialization: 0.04s
[INFO][req-qwerty12] Checking S3 bucket for job results
[DEBUG][req-qwerty12] S3 list operation took: 0.21s
[INFO][req-qwerty12] Found 4 files in job folder
[DEBUG][req-qwerty12] File types: html, md, png, xz
[INFO][req-qwerty12] Generating pre-signed URLs with expiry: 1 hour
[DEBUG][req-qwerty12] URL generation took: 0.11s
[INFO][req-qwerty12] Results retrieval completed in 0.56s
[DEBUG][req-qwerty12] Response size: 1.2KB
[DEBUG][req-qwerty12] Final memory usage: 102.3MB

Example of error during results retrieval:

[DEBUG][req-asdfgh56] Handler received event at: 2025-05-23T08:20:00Z
[INFO][req-asdfgh56] Starting results retrieval: user=ariful, project=documentation, job_id=000000000000
[DEBUG][req-asdfgh56] S3 client initialization: 0.05s
[INFO][req-asdfgh56] Checking S3 bucket for job results
[ERROR][req-asdfgh56] Error after 0.25s: Job results not found in S3
[ERROR][req-asdfgh56] Bucket check failed at: 2025-05-23T08:20:01Z
[DEBUG][req-asdfgh56] Error response generated in: 0.01s
[DEBUG][req-asdfgh56] Final memory usage: 101.2MB

Job Duration

Logs the total time taken for a job to complete:

[INFO][req-xxxxxxx] Total job completed in X.XXs

Operation Timings

Logs the time taken for specific operations:

[INFO][req-xxxxxxx] Page loaded in: X.XXs
[INFO][req-xxxxxxx] HTML saved in: X.XXs
[INFO][req-xxxxxxx] Markdown saved in: X.XXs
[INFO][req-xxxxxxx] Screenshots captured in X.XXs
[INFO][req-xxxxxxx] Uploads completed in X.XXs

Memory Usage

Logs memory usage with time context:

[INFO][req-xxxxxxx] Memory usage after scraping: X.XXMB

Time Zone Handling

  • All timestamps are logged in UTC timezone
  • The 'Z' suffix indicates UTC time
  • The application maintains consistent time zone throughout its operations

Best Practices for Time Logging

  1. Always log time durations with two decimal places
  2. Use consistent units (seconds) for all timing measurements
  3. Include timestamps for key operations
  4. Log memory usage alongside time-sensitive operations
  5. Use consistent formatting across all time-related logs
  6. Include endpoint-specific context in log messages

Example Log Sequences by Level

INFO Level Example (Successful Scrape)

[INFO][req-12345678] Starting scrape job: url=..., user=..., project=...
[INFO][req-12345678] Page loaded in: 2.34s
[INFO][req-12345678] HTML saved in: 0.12s
[INFO][req-12345678] Markdown saved in: 0.16s
[INFO][req-12345678] Screenshots captured in 1.45s
[INFO][req-12345678] Starting upload to S3 bucket: serverless-scrapper-bucket
[INFO][req-12345678] Uploads completed in 3.21s
[INFO][req-12345678] Completed uploading 4 files to S3
[INFO][req-12345678] Total job completed in 7.12s

DEBUG Level Example

[DEBUG][req-12345678] Handler received event at: 2025-05-23T08:00:01Z
[DEBUG][req-12345678] Handler selection took: 0.01s
[DEBUG][req-12345678] Browser initialization time: 1.23s
[DEBUG][req-12345678] DOM ready in: 1.04s
[DEBUG][req-12345678] Request headers received in: 0.07s
[DEBUG][req-12345678] Memory usage before scraping: 245.8MB
[DEBUG][req-12345678] Memory usage after scraping: 432.5MB
[DEBUG][req-12345678] Memory increased by: 186.7MB
[DEBUG][req-12345678] CPU usage during scraping: 78.5%
[DEBUG][req-12345678] S3 client initialization: 0.05s

ERROR Level Example

[ERROR][req-12345678] Timeout after 30.00s while connecting to URL
[ERROR][req-12345678] Connection attempt made at: 2025-05-23T08:00:10Z
[ERROR][req-12345678] Memory at failure point: 512.3MB
[ERROR][req-12345678] Failed after running for: 30.12s
[ERROR][req-12345678] Request terminated at: 2025-05-23T08:00:40Z

Implementation Details

  • Time measurements are captured using perf_counter() for high precision
  • All time logs include a request ID [req-xxxxxxx] for traceability
  • Timestamps are generated using datetime.datetime.now(datetime.timezone.utc)
  • Memory usage is tracked using psutil.Process.memory_info()
  • Log level is controlled by the LOG_LEVEL environment variable
  • Log levels are mapped to numerical values: debug (10), info (20), error (30)