Product Tracker

Date
Clock 11 min read
Tag
#project #python #fastapi
Product Tracker

A backend-focused system for tracking products, inventory movements, and operational events with reliability, auditability, and scalability in mind.


Author’s note
I built MLScraper as a hands-on, practical system to track product listings and prices across multiple Mexican and international e-commerce sites. I use it to spot new listings and meaningful price drops, and to push alerts to Telegram and a small web UI. Below I explain what the project is, how it is structured, why I made the engineering choices I did, and what I would improve next.


Overview

MLScraper is a modular, asyncio-based web scraper designed to:

  • Crawl search results and listing pages from multiple online retailers (e.g., MercadoLibre, Amazon MX, Liverpool, El Palacio de Hierro).
  • Normalize results into a singleArticlemodel and keep per-search persistent history.
  • Detect new listings and price drops and deliver notifications via Telegram and a small websocket-driven frontend.
  • Persist state to JSON files in./data/to keep the system simple and portable.

The project is implemented in Python 3, usesaiohttpfor HTTP operations,FastAPI+ WebSockets for a lightweight dashboard, and simple JSON files for storage. The codebase is organized so you can add new providers (sites) by implementing aMotorsubclass and adding it to the motor generator.


High-Level Goals

When I designed MLScraper I had these goals in mind:

  1. Modularity — Each provider (MercadoLibre, Amazon, Liverpool, Palacio de Hierro, etc.) should be encapsulated as aMotorso adding a new site is straightforward.
  2. Asynchronous efficiency — Useasyncio/aiohttpto keep I/O non-blocking and allow concurrent fetching/pagination.
  3. Simple persistence — Store results as JSON per-search so the system is easy to inspect, move, and debug.
  4. Actionable alerts — Notify on two important events:
    • A new item is discovered for a tracked search.
    • A significant price drop (the implementation uses a 14% threshold).
  5. Developer ergonomics — Include a small static UI served byFastAPIand a devcontainer for reproducible development.

System Architecture

Below is a simplified architecture diagram in ASCII to explain the moving parts at a glance:

+--------------------+ +-----------------+ +----------------------+ | provider motors | -----> | Scraper engine | -----> | Local JSON storage | | (amazon, ml, etc.) | | (Motor orchestr)| | (./data/*.json) | +--------------------+ +-----------------+ +----------------------+ | | | +--> Notifier (utils/telegram.py) -> Telegram Bot | | | +--> Web API / WebSocket (FastAPI app.py) -> Browser UI

Key components and where they live in the repo:

  • app.py— FastAPI app and WebSocket connection manager (web UI + live updates).
  • scrapper.py— Orchestrates periodic scraping cycles across motors and handles global logic for broadcasting notifications.
  • provider/*— Provider implementations:
    • provider/mercado_libre/*
    • provider/amazon/*
    • provider/liverpool/*
    • provider/palacio_de_hierro/*Each provider implements ascrape_pagemethod that extracts listings and pagination links.
  • scraper/motor.py— Abstract baseMotorclass (core scraping loop, pagination handling, state transitions).
  • scraper/article.pyArticledataclass andArticleHistoryfor price/time history.
  • scraper/stream.pyStreamcontainer foractiveandfinishedlistings.
  • utils/— Helpers:file_manager.py,headers.py(random user agents),telegram.py, andsecret.py(credentials).
  • static/— Minimal dashboard UI served by FastAPI (HTML, CSS, JS, notification sound).
  • .devcontainer/— Dockerfile and devcontainer config to reproduce dev environment.

How the Project Works

In this section I walk through the flow from configuration to alert delivery.


Configuration & Entry Points

To run the project a few configuration points must be defined.

Telegram credentials

Insideutils/secret.pythe Telegram bot credentials are configured:

apiToken, chatID = "", ""

Once these values are filled, the scraper will automatically send notifications when events are detected.

Search configuration

All searches are defined in:

provider/generator.py

This module returns the list of motors that the system will run.

Example:

from .mercado_libre.motor import MercadoLibre as ML from .liverpool.motor import Liverpool as LV from .amazon.motor import Amazon as AZ, Seller from .palacio_de_hierro.motor import PalacioDeHierro as PH def get_motors(): return [ ML('zelda wii'), LV(search_term='LV PS5', url='https://www.liverpool.com.mx/...'), AZ(search_term='pokemon tcg', seller=Seller.amazon_mx), PH(search_term='PH Electrodomesticos', url='https://www.elpalaciodehierro.com/...') ]

Each element defines:

  • the search label
  • the initial search URL
  • the provider motor responsible for scraping

Running the application

The project can be executed with:

python app.py

or with uvicorn:

uvicorn app:app --host 0.0.0.0 --port 8000

A development container is also included to run everything inside Docker with a reproducible environment.


The Motor Abstraction

The Motor class is the core abstraction of the scraper.

Every provider extends the base class located at:

scraper/motor.py

Each motor must implement a function with the following responsibility:

def scrape_page(self, body):

This method receives the HTTP response and must:

  1. Parse the HTML
  2. Extract the products
  3. Return structured results

The method returns:

(items, next_url)

Where:

  • items→ list of parsed products
  • next_url→ optional pagination link

This architecture allows the main scraper loop to stay generic while each provider focuses only on parsing.


The Scraping Pipeline

The scraping process follows a consistent pipeline:

  1. Create an async HTTP session (aiohttp)
  2. Fetch the search page
  3. Parse results with the provider motor
  4. Normalize items intoArticleobjects
  5. Compare them with previous results
  6. Detect new listings or price changes
  7. Persist results
  8. Trigger notifications

The core loop lives inside the Motor implementation.

Network requests include retry logic with exponential backoff to handle temporary failures.


Data Model

Each product is represented by anArticleobject.

Example JSON representation:

{ "search_term": "AZ amazon_usa - DK books", "url": "https://www.amazon.com.mx/dp/1465482512/", "identifier": "1465482512", "title": "Zoology: Inside the Secret World of Animals", "price": 825.57, "datetime": "2024-11-16 07:23:23.193375", "status": "active", "history": [ { "datetime": "2024-11-17 23:35:34.266925", "price": 374.36 }, { "datetime": "2024-11-15 11:08:11.463581", "price": 378.75 } ] }

Important characteristics:

  • Each article contains a price history.
  • Status can beactiveorfinished.
  • The identifier allows deduplication across scrapes.

Persistence Layer

MLScraper intentionally keeps persistence extremely simple.

All results are stored as JSON files under:

./data/

Each search generates its own file.

Advantages of this approach:

  • Human-readable
  • No database required
  • Easy debugging
  • Portable

However, it does not scale well to large datasets or concurrent writers.


Notification System

The project currently implements two notification channels.

Telegram Alerts

Telegram integration is implemented in:

utils/telegram.py

Two alert types exist.

New item detected:

send_new_to_telegram(article)

Price drop detected:

send_price_drop_to_telegram(article)

The alert includes:

  • product title
  • link
  • previous price
  • new price
  • timestamp

A price-drop notification triggers when the drop exceeds roughly 14%.


Web Dashboard

A small dashboard is served using FastAPI.

app.pyexposes:

  • static frontend
  • websocket endpoint/ws/

The UI listens to the websocket and displays real-time events.

When a new product is detected the page also plays a notification sound.


HTTP Strategy

To reduce scraping blocks the system uses randomized headers.

Located in:

utils/headers.py

It randomly selects modern browser User-Agent strings and sets additional client hints to mimic real browser traffic.

Requests also include:

  • retry logic
  • exponential backoff
  • timeout protection

This makes the scraper more resilient against temporary failures.


Engineering Decisions & Design Tradeoffs

Several design decisions were made intentionally to balance simplicity and capability.

JSON vs Database

Decision: Store results in JSON files.

Pros

  • No external dependency
  • Easy inspection
  • Portable
  • Great for small projects

Cons

  • Poor scalability
  • No indexing
  • Not safe for concurrent writes

A future version will likely migrate to SQLite or Postgres.


HTML Scraping vs Official APIs

Decision: Parse HTML pages.

Pros

  • Works with any public search page
  • No API keys needed
  • More flexible

Cons

  • Fragile when page layouts change
  • Potentially against terms of service
  • Requires constant maintenance

Async Scraping with Sync Notifications

Scraping usesaiohttpwhile Telegram calls userequests.

This simplifies implementation but theoretically could block the event loop if many alerts fire simultaneously.

For a personal project the tradeoff was acceptable.


Single Process Architecture

Everything runs inside one process:

  • scraper
  • API server
  • notification logic

Advantages:

  • easy debugging
  • minimal deployment complexity

Disadvantages:

  • limited scalability
  • less fault isolation

Why This Project Matters

MLScraper started as a practical tool but it also became a valuable learning project.

It touches many real-world engineering concerns:

  • asynchronous networking
  • scraping reliability
  • system architecture
  • data persistence
  • event notification
  • containerized development

Beyond the technical aspects, it solves a real problem: tracking product availability and pricing across multiple marketplaces without constantly checking websites manually.


Future Improvements

There are many improvements I would like to implement.

Database Migration

Replace JSON storage with SQLite or Postgres to enable:

  • indexing
  • better queries
  • analytics
  • multi-process safety

Async Notification Pipeline

Introduce a queue-based notification system to prevent blocking.

Proxy Support

Add rotating proxies and optional headless browser support using Playwright for sites with stronger bot protection.

Configurable Searches

Move search configuration to YAML or a database so searches can be modified without editing code.

Historical Analytics

Add charts and analytics for:

  • price trends
  • average discounts
  • best purchase windows

Automated Tests

Provider parsers should include unit tests using stored HTML snapshots to detect breakage.

Distributed Scraping

Move motors to worker processes coordinated by a scheduler.


Closing Thoughts

MLScraper is intentionally simple but surprisingly capable.

By combining a modular scraping architecture, asynchronous networking, and lightweight persistence, it provides a flexible platform for monitoring product listings across multiple online stores.

The system is easy to extend: adding a new provider typically means writing a single parser and registering it in the generator.

For me, the project strikes a nice balance between experimentation and real-world usefulness. It has already helped me discover listings and price drops that I would have otherwise missed.

If you are interested in scraping systems, event-driven architectures, or simply automating repetitive browsing tasks, building something like MLScraper is an excellent exercise.

And if you decide to extend it — adding a new provider, improving the architecture, or integrating analytics — you’ll quickly see how a small personal tool can evolve into a surprisingly sophisticated system.