Understanding Sample Sizes in Data

Feb. 10, 2026

Jay Stadelman

On the City Health Dashboard, some metrics are based on data from national surveys like the Behavioral Risk Factor Surveillance System (BRFSS) and the American Community Survey (ACS), which attempt to characterize the health and social conditions of the whole country, and also jurisdictions within it, by asking questions of a representative sample of people. In this blog, we will explain how samples can generate accurate estimates when using a representative sampling frame and explore how the size of a sample affects the quality of the data produced by surveys.

How small samples can cause big swings

Imagine you have a bag of 100 marbles, 20 red and 80 green. The red marbles represent people with a particular disease. If you randomly pull two marbles from the mix, and happen to get one red and one green, you might think 1 of every 2 people has the disease. But if you pull 50 marbles, you are more likely to get a ratio that resembles the actual ratio in the whole bag. As sample size grows, the information in the sample gets closer to accurately representing what’s going on in the population. In other words, larger samples give more accurate and reliable results, while smaller samples are more volatile and prone to inaccuracies.

So why don’t we just survey more people? Public health surveys are resource-intensive. They require sophisticated logistics, many hours of work, and highly trained, detailed-oriented staff, making them quite expensive to conduct. Sometimes we may want to draw a larger sample, but cannot afford to do so.

What do data scientists do to address small sample sizes?

Public health agencies encourage caution when interpreting statistics based on small samples. But it’s incorrect to assume that all metric estimates calculated based on small samples are inaccurate. Data scientists can compensate for small sample sizes in numerous ways, employing more advanced methods to smooth out some of the uncertainty.

One of those advanced methods includes making sure that the sample - regardless of its size - closely resembles the population from which it’s drawn. Data scientists first use techniques to make sure the sample is drawn from a broader population so that everyone had a chance of being selected. This generally results in samples that reflect the same percentage as the broader population of people by race and ethnicity, sex, education, wealth, geographic, and other groups. If they are successful, the data scientists will have generated what is known as a ‘representative’ sample. Representative samples, even if small, are often successful in producing accurate and realizable metric estimates.

What can data users do?

Both producers and consumers of statistics need to exercise care in working with datasets produced via small samples. If an estimate looks wrong, and you know it’s based on a small sample, the estimate might have been influenced by the sample size.

When exploring data, especially data that may be based on small samples, here are a few tips:

Explore all available years of data to help gain a fuller picture.
Explore confidence intervals – a calculated range of values that likely contain the true value – for neighborhood level metrics. A ‘90% confidence interval’ means we are 90% certain the true value lies between the two listed values.
- Find Confidence Intervals in the Metric Table view under ‘Error Margins’.
Compare Dashboard data against other available data sources, including qualitative data from residents as well as your own experience, to gain additional perspectives.
Check out our Technical Document to learn more about our data practices and methods.

We take data quality very seriously at the Dashboard. If we cannot provide an estimate that we’re confident in, we do not publish the data. We validate our metric estimates and apply data source-specific censoring criteria where warranted to ensure accuracy and reliability. We also do our best to communicate when our data have limitations that may affect their accuracy, including if they are based on small sample sizes (see Data Tips on each Metric Detail page).

If you ever see data on the Dashboard that you think may be incorrect, or if you want to learn more about our data and validation processes, please reach out to us. We hope this blog helped you better understand the challenges of working with data sampled from smaller numbers of people and provides you with ways to think about and interpret what’s really happening in your community.

Story type

Data Deep Dives

Blog

Understanding Sample Sizes in Data

Feb. 10, 2026

Jay Stadelman

How small samples can cause big swings

What do data scientists do to address small sample sizes?

What can data users do?

Explore More

Online Software Evaluates Health Issues In Nine Largest Cities Of NJ

Want to Improve Health and Equity? Take a Look at Your Budget

Calling All Data Enthusiasts!