What each number actually measures
- Words β runs of letters and digits, segmented locale-aware via Intl.Segmenter.
- Characters β grapheme clusters, so emoji and combining marks count as one.
- No-space characters β useful for typography and ad-copy fits.
- Sentences β split on
.,!,?followed by whitespace. - Paragraphs β separated by one or more blank lines.
- Bytes (UTF-8) β the encoded size, which is what API and database limits use.