Frog Wizard

Driving an API-less blog from a Python CLI

This blog is a git repo of markdown files. I edit posts locally and push them live with a 440-line Python CLI (bearblog.py, requests the only dependency). Bear Blog has no public API, so the CLI logs in with account credentials, keeps the session cookie, and drives the same dashboard form endpoints the web UI uses.

python bearblog.py list                 # uid + title of every post
python bearblog.py new post.md          # create & publish from a file
python bearblog.py edit <uid> post.md   # overwrite an existing post

Posts are header_content, body_content, publish, posted to /<blog>/dashboard/posts/new/ (and /<uid>/ to edit). Scraping a form-driven Django app instead of an API turned up five quirks:

  1. No API, so isolate the surface. Every endpoint lives in one BearBlog class at the top of the file. When the dashboard HTML shifts, there's one place to patch.
  2. The login response lies. django-allauth keeps a logged-out session alive on /accounts/login/, so a 200 there means nothing. The real auth check is whether the session can reach the blog dashboard.
  3. CSRF wants three things. Django needs the token in the form body and a matching Referer and an Origin header. Two of three still gets you a 403.
  4. The header splits on CRLF only. Send the key: value header block with bare \n and Bear Blog parses it as one line: the whole block lands in title and the slug mangles. Normalize to \r\n.
  5. HTTP 200 can mean rejected. Overflow the ~200-char header limit and the save fails, but the response is still 200. The only signal is a lightsalmon <p> banner in the body. Grep for "has not been saved" and raise.

This post is a markdown file in post/, published with python bearblog.py new.

#blogging #django #python #scraping