-
-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track amount of visits per short URL so that we don't have to do COUNT
aggregates at runtime
#2036
Comments
How to generate a random number in every DB engine:
These are not supported by DQL by default, so we may need to add a custom function that can be used with all of them https://www.doctrine-project.org/projects/doctrine-orm/en/3.0/cookbook/dql-user-defined-functions.html This library can be used for inspiration, as it implements a bunch of them: https://github.com/beberlei/DoctrineExtensions |
This should allow supporting use Doctrine\DBAL\Platforms\PostgreSQLPlatform;
use Doctrine\DBAL\Platforms\SQLitePlatform;
use Doctrine\ORM\Query\AST\Functions\FunctionNode;
use Doctrine\ORM\Query\Parser;
use Doctrine\ORM\Query\SqlWalker;
use Doctrine\ORM\Query\TokenType;
class Rand extends FunctionNode
{
public function getSql(SqlWalker $sqlWalker): string
{
return match ($sqlWalker->getConnection()->getDatabasePlatform()::class) {
PostgreSQLPlatform::class, SQLitePlatform::class => 'random()',
default => 'RAND()',
};
}
public function parse(Parser $parser): void
{
$parser->match(TokenType::T_IDENTIFIER);
$parser->match(TokenType::T_OPEN_PARENTHESIS);
$parser->match(TokenType::T_CLOSE_PARENTHESIS);
}
} Registered as $entityManager->getConfiguration()->addCustomNumericFunction('RAND', Rand::class); However, considering the insertion may happen at an entity level, it might make sense to generate the EDIT Actually, the kind of query needed for this to be fully efficient will require a native query, not DQL or entities, so the above is probably not needed. INSERT INTO short_url_visits_counts(short_url_id, potential_bot, slot_id, count)
VALUES (123, false, RAND() * 100, 1)
ON DUPLICATE KEY UPDATE count = count + 1; |
A listener to set/update counts every time a Visit is persisted: use Doctrine\Common\EventSubscriber;
use Doctrine\DBAL\Connection;
use Doctrine\DBAL\Platforms\PostgreSQLPlatform;
use Doctrine\DBAL\Platforms\SQLitePlatform;
use Doctrine\DBAL\Platforms\SQLServerPlatform;
use Doctrine\ORM\EntityManagerInterface;
use Doctrine\ORM\Event\PreFlushEventArgs;
use Shlinkio\Shlink\Core\Visit\Entity\Visit;
$em->getEventManager()->addEventSubscriber(new class implements EventSubscriber {
public function getSubscribedEvents(): array
{
return ['preFlush'];
}
public function preFlush(PreFlushEventArgs $args): void
{
$em = $args->getObjectManager();
$entitiesToBeCreated = $em->getUnitOfWork()->getScheduledEntityInsertions();
foreach ($entitiesToBeCreated as $entity) {
$this->trackVisitCount($em, $entity);
}
}
private function trackVisitCount(EntityManagerInterface $em, object $entity): void
{
// This is not a non-orphan visit
if (!$entity instanceof Visit || $entity->shortUrl === null) {
return;
}
$visit = $entity;
// The short URL is not persisted yet
$shortUrlId = $visit->shortUrl->getId();
if ($shortUrlId === null || $shortUrlId === '') {
return;
}
$isBot = $visit->potentialBot;
$conn = $em->getConnection();
$platformClass = $conn->getDatabasePlatform();
match ($platformClass::class) {
PostgreSQLPlatform::class => $this->incrementForPostgres($conn, $shortUrlId, $isBot),
SQLitePlatform::class, SQLServerPlatform::class => $this->incrementForOthers($conn, $shortUrlId, $isBot),
default => $this->incrementForMySQL($conn, $shortUrlId, $isBot),
};
}
private function incrementForMySQL(Connection $conn, string $shortUrlId, bool $potentialBot): void
{
$this->incrementWithPreparedStatement($conn, $shortUrlId, $potentialBot, <<<QUERY
INSERT INTO short_url_visits_counts (short_url_id, potential_bot, slot_id, count)
VALUES (:short_url_id, :potential_bot, RAND() * 100, 1)
ON DUPLICATE KEY UPDATE count = count + 1;
QUERY);
}
private function incrementForPostgres(Connection $conn, string $shortUrlId, bool $potentialBot): void
{
$this->incrementWithPreparedStatement($conn, $shortUrlId, $potentialBot, <<<QUERY
INSERT INTO short_url_visits_counts (short_url_id, potential_bot, slot_id, count)
VALUES (:short_url_id, :potential_bot, random() * 100, 1)
ON CONFLICT (short_url_id, potential_bot, slot_id) DO UPDATE
SET count = count + 1;
QUERY);
}
private function incrementWithPreparedStatement(
Connection $conn,
string $shortUrlId,
bool $potentialBot,
string $query,
): void {
$statement = $conn->prepare($query);
$statement->bindValue('short_url_id', $shortUrlId);
$statement->bindValue('potential_bot', $potentialBot);
$statement->executeStatement();
}
private function incrementForOthers(Connection $conn, string $shortUrlId, bool $potentialBot): void
{
$slotId = rand(1, 100);
// For engines without a specific UPSERT syntax, do a regular locked select followed by an insert or update
$qb = $conn->createQueryBuilder();
$qb->select('id')
->from('short_url_visits_counts')
->where($qb->expr()->and(
$qb->expr()->eq('short_url_id', ':short_url_id'),
$qb->expr()->eq('potential_bot', ':potential_bot'),
$qb->expr()->eq('slot_id', ':slot_id'),
))
->setParameter('short_url_id', $shortUrlId)
->setParameter('potential_bot', $potentialBot)
->setParameter('slot_id', $slotId)
->forUpdate()
->setMaxResults(1);
$resultSet = $qb->executeQuery()->fetchOne();
$writeQb = ! $resultSet
? $conn->createQueryBuilder()
->insert('short_url_visits_counts')
->values([
'short_url_id' => ':short_url_id',
'potential_bot' => ':potential_bot',
'slot_id' => ':slot_id',
])
: $conn->createQueryBuilder()
->update('short_url_visits_counts')
->set('count', 'count + 1')
->where($qb->expr()->and(
$qb->expr()->eq('short_url_id', ':short_url_id'),
$qb->expr()->eq('potential_bot', ':potential_bot'),
$qb->expr()->eq('slot_id', ':slot_id'),
));
$writeQb->setParameter('short_url_id', $shortUrlId)
->setParameter('potential_bot', $potentialBot)
->setParameter('slot_id', $slotId)
->executeStatement();
}
}); |
The short URLs list can be inefficient when ordering by amount of visits, if there are a lot of visits in the database. This is because we have to join with visits, and group the list by the
COUNT(DISTINCT visits))
aggregate.In other queries we have solved similar problems by using sub-queries, but this presents other problems and introduces a lot of complexity because of doctrine limitations.
One possible solution would be to track an incremental value per short URL that gets updated every time there's a new visit, and use that value when ordering by short URL or when trying to get the amount of visits for a short URL.
This would introduce another problem though. It would lock the short URL row for every visit, while trying to update it, causing delays and database locks for instances with a lot of traffic.
There's a pattern to work around that though, the Slotted Counter Pattern, which consists on tracking that in multiple rows from a related table, rather than the short URLs table itself.
Let's say there's 100 rows per short URL, and each one of those holds a partial amount of visits. When a visit occurs, one of those rows is picked randomly and updated, reducing the chances of concurrent visits to lock each other 100 times.
In order to get the actual amount of visits, we would have to
SUM
all partial totals for a short URL, which is way more efficient thanCOUNT
ing different visits.Considerations
This could potentially be used to improve performance when loading tags with stats, as there are also COUNT aggregates and sub-queries involved there. It would not cover the case in which there are a lot of short URLs, but I imagine the usual is to have way more visits than short URLs in a particular instance.
The text was updated successfully, but these errors were encountered: