Anyways, I think my livable cities formula is broken, no matter what I do with it or add to it, the top 25 cities stay the same, just in different orders.
It's always
St.Paul, Minneapolis
Albany, Alameda, San Francisco, Oakland, Emeryville, and Berkeley California
Portland Oregon
Denver and Boulder, CO
Arlington, VA
Washington DC
Seattle, WA
Newark, Paterson, Trenton, and Elizabeth NJ
Champaign-Urbana, Cicero, and Berwyn IL
Baltimore
Mount Vernon, NY
And Eureka CA
May I ask what the formula is?
It is very thrown together but here it is
((((((Composite of Walk, Bike and Transit Scores × 200/(((((studio rent*1.1)+(1 bedroom rent*2))/2) + (price or a loaf of bread*15)*12) / (median wage in the metro area/2.8))) * (state level queer rights protection score*1.11)) * (1.2+ % of a city with a park with a 10 minute walk from home) * (0.05*(150-AQI on June 19th)))/1.3) * (1+FEMA National Risk Index/6))/155030)
Would you mind sharing the reasoning for the constants, or a link to a good source to learn more?
I'd love to learn more about this sort of thinking! Why is, for example, studio rent * 1,1? Or why is it exactly 0,05 times (150-AQI score) - and why 150, specifically?
I wanted the price of a studio to have more effect on the overall formula to try to punish more expensive cities, the AQI is reduced by a lot because it fluctuates often and we are in a heat wave so it is unnaturally high right now, and it's 150 because that gives an approximate limit on how high the AQI should be. I might change some of these numbers though now that I think about it
It's hard to tell bc of the parentheses but it looks like you're implicitly comparing things that are not in fact comparable (ie "units" don't match). It's easier to build and understand formulas like this if you can write separate multiplicative terms that are normalized to the same range (usually something like 0 to 1), or if you want terms weighted differently then add them rather than multiplying. Multiplication means one deficit can dominate, while adding gives you control over exactly how much influence each term can have. Each term is a different conceptual factor you want to account for. It looks like youre mostly doing this already but I always find it helpful to have a reminder when I'm in the weeds w something.
In this case it looks like your conceptual equation would be sth like (transit)*(affordability)*(park access)*(environmental risk). Id recommend removing bread price (too specific & variable), aqi (or at least use a summary stat like worst 8 hour aqi, i think these are available on EPA's website), and queer rights (not really quantifiable and incommensurate w other factors), and divide housing price by 1/3 income for the affordability factor. Or better yet, use (cost of living)/(median income), since CoL or living wage is an existing robust and consistent metric for cities across the country (no need to reinvent the wheel).
I have reformatted the equation based off your suggestions and I think I like the list it gives me a lot more









