Hi,
I am writing a Proximity Mailer for PokerDIY – it’s basically a scheduled task that periodically grabs all the users (who have opted in via a role) and emails them a summary of players (users) in their area, as well as new players in the last x days. It sounds really simple but I am finding the SQL queries to be the hardest bit due to the de-normalised structure of the Profile table and need a little design advice… it would be easy to code this inefficiently but it needs to scale (there are almost 40 000 players on PokerDIY now so there would be a massive amount of data in these queries and a lot of emails).
I can release it as part of the UserProfile module if anyone thinks it might be useful on their site – it’ll be easy to change the email template to suit your business terminology.
So, the end result:
There will be a public role: "Subscribe to summary updates (emailed once a month) – Yes/No"
All those users who have opted in will receive a monthly summary based on their country/region/city profile settings. The email will look something like this:
“Hi Rodney,
Your location is Sydney in NSW, Australia. There are:
Poker Players in Sydney: 38 (5 new)
Total players in NSW: 300
Total players in Australia: 854
Poker Games: 10 (1 new)
Poker Leagues: 5 (1 new)
Poker Groups: 10 (0 new)
More blurb… “
Each of these will link to the respective area on PokerDIY and the Location Search.
The first version will be stupidly simple and will only do the players (users). I need to write the other parts in the league module, ventrian PA mod, groups etc. to help populate the full summary.
So… this is my proposed design in pseudo-code:
Grab all the users who have opted into the role AND have their city/region/country data populated (the first couple of versions will not be based on distance of location – it will purely do a string match on city/region and country). It could be tweaked to use only Region and Country but I think the people who will find this feature useful will have their city data populated. Sort it by Country, Region then City.
Aggregate the total users for each city (where there is at least 1 user who has opted in to the role), as well as for each area and country. So you will end up with a table of data for Countries and the total users, another table for each country’s regions and all their users and then another table for each city in each region. This will mean that a country like the US will have over 50 tables = at least 50 database hits. This is one of the biggest countries in terms of number of regions (States) and cities so there will probably less than 300 database hits (at a guess) every time the task is run (1 a month probably).
For each user,
Check their city/region/country and populate an email based on the template.
Send the email as a BCC to each group of users from the same city (the Region and Country data will be the same for every city)
The scheduled task would have to start new threads for each city
The problem I am facing is trying to write the best SQL to achieve the location data aggregation. I do not have experience with Pivots (2005 SQL feature) but I think somehow this could be the way forward. For example, to get all the Region totals for a Country I use this:
SELECT PropertyValue, COUNT(U.UserID) As TotalUsers
FROM {objectQualifier}Users U
INNER JOIN {objectQualifier}UserPortals UP ON UP.UserID = U.UserID
INNER JOIN {objectQualifier}UserProfile P ON P.UserID = U.UserID
WHERE PortalID = @PortalID
AND P.PropertyDefinitionID = @RegionPropertyID
AND P.PropertyValue <> ''
AND UP.Authorised = 1
AND U.UserID IN
(SELECT UserID from {objectQualifier}UserProfile P1 WHERE P1.PropertyDefinitionID = @CountryPropertyID AND P1.PropertyValue = @Country)
GROUP BY P.PropertyDefinitionID, P.PropertyValue
ORDER BY PropertyValue
This does it for a particular country (@Country) – the problem is that if you do it for all regions across all countries then if a country had the same named region it would total them together (the same problem exists for cities). If I could grab the Country name in a column in the same table I could do all the regions in the world in one hit instead of for each country. The same applies for the cities. The other problem is the (5 new users). Basically this is the exact same query as the other aggregate queries but it must check the DateTime stamp minus a week or so to get these totals. I can't think of a better way other than running the whole set of queries again! This will double the time/queries etc. and seems frightfully inefficient...
This is really hard to explain, but if anyone would like to help please let me know and I can show you the problem in more depth. It really requires a greater knowledge of SQL queries than I have. If you may find this feature useful then let me know if you have any design considerations too… If anyone wants a SQL challenge I can tell you the format that I need to data in (table columns) and you can try and write a query to do it ;) (it's the core user tables)