Documentation
Take control back on the web by silently trapping non‑compliant crawlers and abusive bots. This PHP package integrates with the BanTheBots cloud API to:
- Drop a hidden "blackhole" link in your pages that only misbehaving bots will follow
- Record the offending IP/user‑agent with the central API
- Deny further access to flagged clients with a 403 and a friendly error page
- Keep good crawlers (that obey robots.txt) unaffected
- This document explains installation, configuration, and usage.
Requirements
- cURL extension enabled
- Web server user must have write permission to your public web root (or the directory where robots.txt lives)
Installation
Composer (recommended)
composer require youds/ban-the-bots
Manual (no Composer)
Copy src/BanTheBots.class.php into your project
Require it in your bootstrap:
require_once __DIR__ . '/src/BanTheBots.class.php';
Quick start
Instantiate $btb = new BanTheBots() and call $btb->apply() to enforce a decision for the current request.
In your layout/footer, call $btb->outputBlackHole() once to render the hidden link that naughty bots will follow.
Example minimal integration:
Before any output, run this:
$btb = new BanTheBots([
'checkInterval' => 4,
'robotsPath' => '/',
]);
$btb->apply();
Then in your HTML, do this:
<!doctype html>
<html>
<head>...</head>
<body>
... your page content ...
<?php $btb->outputBlackHole(); ?>
</body>
</html>
Configuration
Create the instance with an associative array. All options are optional; sensible defaults are used when omitted.
checkInterval (int, seconds, default 4)
Minimum interval between API checks per session. The library stores a timestamp in $_SESSION['visited'] and only hits the API if the interval has elapsed or the client is currently flagged.
errorMessage (string, default "BanTheBots API error: ")
robotsPath (string, default is intended to be the directory that contains robots.txt)
Path to the directory where robots.txt lives and where the library can drop helper files. This should be a filesystem path and usually equals your public web root.
Examples:
- $_SERVER['DOCUMENT_ROOT'] . '/'
- /var/www/html/
Ensure this directory is writable by your web server user.
What the library does
On construction via new BanTheBots($config), the library will:
- Start a session if not already started.
- Generate or reuse a random per‑site "blackhole" path (e.g., a1b2c3.php) and persist it in a small PHP file next to robots.txt.
- Create the blackhole endpoint file if it doesn’t exist. When a client requests this URL, the library:
- Reports the IP and base64‑encoded user agent to the cloud API POST /v1/bad-bots/create (performed here via a file_get_contents() GET call), then
- Responds with HTTP/1.1 403 Forbidden,
- Redirects to a human‑readable page at https://api.banthebots.cloud/blackhole?returnTo=..., and
- Outputs a blackhole challenge for the user to complete
- On each request when you call
apply(): - If the session was recently checked, it won’t call the API again until checkInterval seconds elapse.
- Otherwise it queries the API: GET https://api.banthebots.cloud/v1/bad-bots/read?ip=<ip>&userAgent=<useragent>.
- If the API flags the client as a bad bot, the request is terminated with a 403 and redirect as above.
- When you call outputBlackHole() in your footer/layout:
- A hidden snippet is printed that includes the unique blackhole URL. Human visitors won’t see it, but naive crawlers that click every link will follow it and self‑identify as non‑compliant.
Security and permissions
Give your web server user write access to the directory specified by robotsPath. The library needs to create:
- A small PHP file holding the generated blackhole filename
- The blackhole endpoint script itself
robots.txt(if it doesn’t exist)- Do not expose directory listings in your web root.
- Review server‑side caching or reverse proxy behavior to ensure the dynamic 403 behavior is respected.
Troubleshooting
I get permission errors creating files
- Ensure robotsPath points to a writable directory (usually $_SERVER['DOCUMENT_ROOT'] . '/').
- Check file ownership (www-data, apache, nginx, etc.).
- The package says PHP 5.4+ make sure you are using the right version.
Nothing happens when I call apply()
- Make sure sessions are enabled and not started too late. The library starts a session if needed, but your framework might interfere if sessions are disabled.
- Confirm the server can reach https://api.banthebots.cloud from your environment (no firewall blocks, valid SSL CA bundle for cURL).
Search engines are hitting my blackhole
Verify robots.txt is present at https://your-domain/robots.txt and contains the Disallow rule for the generated path.
Example framework integration
- WordPress: instantiate in wp-config.php or a small mu‑plugin before output; echo outputBlackHole() in your theme’s footer.php.
- Laravel/Symfony: register a small middleware executed early that creates the instance and calls apply(); in your base layout (Blade/Twig) call outputBlackHole().
- Plain PHP: see the Quick start example above.
Roadmap / Contributions
- Multi‑language ports are welcome — re‑implement based on this PHP kit
- Improvements to configuration validation and PHP version constraints
- Tests and CI integration
- Contributions are welcome via pull requests.
License
This project is licensed under the terms of the MIT License. See LICENSE for details.
Courtesy of Youds Media Ltd · All Rights Reserved
