MMCM
30th of December 2004 (Thu), 16:20
After the worm attack on my EE server, which totally messed up the photo counters, I tried to figure out a way to rebuild them.
My first attempt was to parse the access_log and count all accesses to photos generated by worms, and reduce the counter in the database accordingly. Then I realized, in some cases, where the URL generated was not a valid parameter for EE, the displayed photo was in no way related to the input.
So the second attempt was to count all valid photo requests, which were NOT generated by the worm. That method worked well, I generated an SQL-Script, which corrected the counter values in the database.
Then I wondered, why there were still so many photo accesses, and I realized again, that accesses by robots from google, msn, ... where counted too.
Now I am thinking of a way to prevent that. If it's not possible to do, the best thing for me is to forget all about counters :-(
After a quick look at photo.php, on method could be to query the useragent parameter, and if its a robot, simply don't count the access. One would of course have to list all possible types of robots, which I think is nearly impossible.
Then I took a look at www.photocommunity.de, where I have several accounts, how they manage to do it. Only the owner can see the click counter there, and I think it's not affected by robots. Alas, my attempt to read robots.txt failed. In the source code, however, I discovered that they reference a small dummy gif-image (like <img src="http://www.fotocommunity.de/pc/fotocount.php?id=9999999" width=1 height=1>), and that access is obviously used to increase the counter. Using my own robots.txt, it's an easy task to prevent well behaved robot from retrieving that URL.
What do the other EE users think of that solution? Removing the counter code from photo.php and putting it into a new small php script would be an easy task to do.
Is anybody else interested? My consideration is to avoid all unnecessary code changes to EE, because when Pekka rolls out the next (offical) version, I have to redo it all again.
My first attempt was to parse the access_log and count all accesses to photos generated by worms, and reduce the counter in the database accordingly. Then I realized, in some cases, where the URL generated was not a valid parameter for EE, the displayed photo was in no way related to the input.
So the second attempt was to count all valid photo requests, which were NOT generated by the worm. That method worked well, I generated an SQL-Script, which corrected the counter values in the database.
Then I wondered, why there were still so many photo accesses, and I realized again, that accesses by robots from google, msn, ... where counted too.
Now I am thinking of a way to prevent that. If it's not possible to do, the best thing for me is to forget all about counters :-(
After a quick look at photo.php, on method could be to query the useragent parameter, and if its a robot, simply don't count the access. One would of course have to list all possible types of robots, which I think is nearly impossible.
Then I took a look at www.photocommunity.de, where I have several accounts, how they manage to do it. Only the owner can see the click counter there, and I think it's not affected by robots. Alas, my attempt to read robots.txt failed. In the source code, however, I discovered that they reference a small dummy gif-image (like <img src="http://www.fotocommunity.de/pc/fotocount.php?id=9999999" width=1 height=1>), and that access is obviously used to increase the counter. Using my own robots.txt, it's an easy task to prevent well behaved robot from retrieving that URL.
What do the other EE users think of that solution? Removing the counter code from photo.php and putting it into a new small php script would be an easy task to do.
Is anybody else interested? My consideration is to avoid all unnecessary code changes to EE, because when Pekka rolls out the next (offical) version, I have to redo it all again.