Methodology
How We Score Ingredient Safety
Our methodology for the 43,000+ ingredient safety assessments powering theformulator.ai
Every formulation generated on theformulator.ai includes per-ingredient safety scores across six hazard axes. This page explains how those scores are calculated, what data sources feed them, and why we built a proprietary system instead of licensing an existing one.
Why existing safety databases fall short for formulators
The two most widely referenced ingredient safety databases — EWG Skin Deep and SkinSafe — were designed for consumers, not formulators. This creates fundamental problems when R&D teams try to use them as data inputs for formulation decisions.
EWG Skin Deep scores approximately 11,500 ingredients. Its methodology applies a data gap penalty: ingredients with limited safety studies receive elevated hazard scores. For a formulator working with novel or niche raw materials, this means an absence of negative evidence gets treated as evidence of harm. The scoring algorithm is opaque — weights and methodology are not published. Exposure context (whether an ingredient appears in a leave-on serum at 2% or a rinse-off shampoo at 0.5%) is not factored into the score. And the database is consumer-facing, with framing that tends toward alarmism rather than risk-based assessment.
SkinSafe takes a different approach, focusing on allergen and irritant avoidance based on Mayo Clinic data. While clinically grounded for contact dermatitis, it does not cover the full hazard spectrum that regulatory affairs teams need — carcinogenicity, reproductive toxicity, and endocrine disruption are outside its scope.
Neither system provides per-market regulatory status. A formulator developing for both EU and China needs to know that an ingredient permitted in Europe may be prohibited or restricted under NMPA — this cross-market view does not exist in consumer-facing databases. Neither system updates automatically from primary regulatory sources. Neither system adjusts scores based on product type or exposure duration.
We needed a scoring system that:
- Draws from the same primary sources regulatory authorities use
- Treats absence of data honestly — as a data gap, not a hazard signal
- Accounts for exposure context (leave-on vs rinse-off)
- Covers the full hazard spectrum across six independent axes
- Provides per-market regulatory intelligence alongside safety scores
- Updates from primary sources automatically, not via periodic manual review
Six axes of hazard assessment
Each ingredient is scored 0–10 on six independent hazard axes. A composite score is calculated, but all six axis scores are always visible to the formulator — because a single number hides critical information.
Carcinogenicity
What it measures
Evidence of cancer-causing potential from chronic or repeated exposure.
Primary sources
IARC Monographs (Group 1, 2A, 2B classifications), ECHA CLP Annex VI (H350, H351 hazard statements), NTP Report on Carcinogens.
Scoring approach
IARC Group 1 (known carcinogen) maps to the highest severity range. Group 2A (probable) and 2B (possible) map progressively lower. When multiple authoritative sources provide different classifications for the same ingredient, the highest score is retained — we never downgrade a carcinogenicity assessment. Any ingredient classified as a known carcinogen receives a composite score floor regardless of how clean its other five axes are.
Developmental & Reproductive Toxicity (DART)
What it measures
Risk of harm to fertility, fetal development, or lactation.
Primary sources
ECHA CLP Annex VI (H360, H361, H362 hazard statements), SCCS opinions on specific cosmetic ingredients.
Scoring approach
H360 (may damage fertility or the unborn child) maps to the highest severity. H361 (suspected) maps to moderate severity. H362 (may cause harm to breastfed children) is scored based on exposure route relevance to cosmetic use.
Sensitization
What it measures
Potential to cause allergic contact dermatitis on repeated exposure.
Primary sources
ECHA CLP (H317 — skin sensitizer classification), NACDG (North American Contact Dermatitis Group) patch test prevalence data, CIR safety assessments.
Scoring approach
The ECHA H317 classification provides a regulatory baseline. NACDG prevalence rates add clinical weight — a sensitizer that causes positive patch test reactions in 5% of patients is scored higher than one affecting 0.3%. Strong sensitizers receive a composite score floor to prevent other clean axes from masking the risk.
Systemic Toxicity
What it measures
Potential for organ damage from single or repeated exposure via dermal, oral, or inhalation routes.
Primary sources
ECHA CLP (H370, H371 for single exposure; H372, H373 for repeated exposure), SCCS safety opinions.
Scoring approach
"Causes organ damage" classifications map to high severity. "May cause organ damage" maps lower. Repeated-exposure classifications are scored based on exposure route relevance — an inhalation-only hazard is less relevant for a topical cream than for a spray product.
Irritation
What it measures
Potential to cause non-allergic skin or eye irritation on contact.
Primary sources
CIR safety assessments (clinical patch test data), Zein Number dissolution data (for surfactants specifically), ECHA CLP (H315 skin irritation, H319 eye irritation).
Scoring approach
For surfactants, the Zein Number provides a quantitative protein denaturation measure that correlates directly with skin irritation potential — this is more predictive than binary classification. CIR clinical data provides human-relevant irritation thresholds. High irritation scores receive a modest composite floor to ensure highly irritating ingredients are flagged even when all other axes are clean.
Endocrine Disruption
What it measures
Potential to interfere with hormonal systems (estrogen, androgen, thyroid, steroidogenesis pathways).
Primary sources
ECHA Endocrine Disruptor assessment list, REACH SVHC (Substances of Very High Concern) identifications with ED concern, EU Community Rolling Action Plan (CoRAP) evaluations.
Scoring approach
Confirmed endocrine disruptors on the ECHA list score highest. Ingredients under active assessment score at moderate severity. SVHC identification with endocrine disruption concern adds weight to the score.
EXTENDED SAFETY AXES
Beyond the composite — product-relevant and environmental safety data backed by REACH IUCLID study values
Supplementary, Product-Category-Specific Axes
These axes are not part of the human safety composite. They surface conditionally — only when relevant to the product being formulated. Ecotoxicity measures environmental impact and is always reported separately.
Aquatic / Ecotoxicity
What it measures
Impact on aquatic organisms — algae, daphnia, fish. Bioaccumulation potential and biodegradation rate.
Primary sources
REACH IUCLID registered substance dossiers. EC50 (algae growth inhibition), LC50 (daphnia and fish acute toxicity), BCF (bioconcentration factor), biodegradation percentage. 26,800+ dossiers, 14.4M study endpoints.
Scoring approach
EC50 and LC50 values mapped to GHS aquatic toxicity categories. BCF above 500 flags bioaccumulation concern. Biodegradation below 60% in 28 days flags environmental persistence. Shown for rinse-off products and sunscreens. Not part of human safety composite — environmental impact is a separate dimension.
Shown for
Shampoos, cleansers, rinse-off products, sunscreens (reef-safe)
Eye Irritation
What it measures
Potential to cause eye irritation or serious eye damage on direct or incidental contact.
Primary sources
REACH IUCLID Draize eye irritation studies (OECD TG 405), registered substance dossier data.
Scoring approach
Draize eye scores mapped to GHS eye damage categories. Scores reflect severity and reversibility of observed effects. Only shown when the product is applied near the eye area, where incidental contact is expected.
Shown for
Eye creams, eye serums, mascaras, eyeliners
Skin Irritation (REACH)
What it measures
Potential to cause skin irritation from prolonged contact. Distinct from the core Irritation axis which incorporates CIR assessments and Zein Number data.
Primary sources
REACH IUCLID Draize skin irritation studies — Primary Irritation Index (PII) from OECD TG 404 studies on registered substances.
Scoring approach
Draize PII mapped to severity bands. PII below 0.5 is negligible, 0.5 to 2.3 is mild, above 2.3 is classified irritant. Provides a second, independent irritation signal from in vivo regulatory studies rather than panel assessments.
Shown for
Leave-on products, baby and children formulations
Oral Toxicity
What it measures
Acute toxicity risk from ingestion — relevant for products applied to or near the mouth where incidental ingestion occurs.
Primary sources
REACH IUCLID LD50 oral values from OECD TG 401 and TG 423 acute oral toxicity studies.
Scoring approach
LD50 values mapped to GHS acute toxicity categories. Category 1 (LD50 at or below 5 mg/kg) is highest concern. Category 5 (LD50 between 2,000 and 5,000 mg/kg) is low concern. Above 5,000 mg/kg is not classified.
Shown for
Lip products (lipstick, lip balm, lip gloss), oral care (toothpaste, mouthwash)
Genotoxicity
What it measures
Potential to cause DNA damage — gene mutations, chromosomal aberrations, or DNA strand breaks.
Primary sources
REACH IUCLID in vitro studies (Ames test, chromosomal aberration assay) and in vivo studies (micronucleus test, comet assay) from registered substance dossiers. Only Klimisch reliability 1-2 studies used.
Scoring approach
Scored based on proportion of positive results across available studies. In vivo positive results carry greater weight than in vitro positives. A positive in vivo result with a negative in vitro is more concerning than the reverse. Requires positive rate above 15% to score — prevents isolated false positives from triggering flags.
Shown for
Baby products, children's formulations
Phototoxicity
What it measures
Potential to cause adverse skin reactions when the ingredient is exposed to UV light — relevant for UV-absorbing compounds.
Primary sources
REACH IUCLID phototoxicity studies, SCCS scientific opinions on UV filters.
Scoring approach
Based on phototoxicity study results from registered substance dossiers and SCCS evaluations. Scored when data is available for UV-absorbing ingredients.
Shown for
Sunscreens, SPF products, UV-protective formulations
Why multi-axis scoring matters
A single composite score can mask critical safety signals. Two ingredients can land at the same composite value with completely different underlying hazard profiles. Our system always shows both the composite and the individual axis breakdown — so the formulator decides what matters for their product.
Same score, different story
Benzophenone-3 (Oxybenzone)
Composite 1.50·GREEN·HIGH confidence
Phenoxyethanol
Composite 1.50·GREEN·HIGH confidence
Benzophenone-3 and Phenoxyethanol both carry a composite score of 1.50 — both in the GREEN band. But the underlying hazard profiles are entirely different.
Benzophenone-3 shows signals across carcinogenicity (2), developmental toxicity (3), and endocrine disruption (4) — a broad, low-level concern pattern driven primarily by its endocrine activity.
Phenoxyethanol concentrates its risk in irritation (7) and endocrine disruption (4), with a minor DART signal (2). For a product applied near the eyes or on compromised skin, this irritation spike matters.
A single composite score treats these as identical. Our 6-axis system shows they are not.
Green doesn't mean zero concern
Salicylic Acid
Composite 1.90·GREEN·HIGH confidence
Salicylic Acid scores 1.90 — comfortably in the GREEN band. But the radar chart reveals three elevated axes: irritation at 7 (classified H318 — causes serious eye damage), DART at 4 (developmental/reproductive toxicity signal), and endocrine disruption at 4.
For a general-purpose exfoliant in a rinse-off cleanser, this profile is acceptable. For a leave-on product targeted at pregnant consumers, the DART signal at 4 is information a formulator needs to see — and a composite score of 1.90 would never surface it.
We don't make the decision. We surface the data. The formulator decides.
Data-driven ingredient substitution
Propylene Glycol
Composite 0.80·GREEN·HIGH confidence
Propanediol
Composite 0.40·GREEN·HIGH confidence
Propylene Glycol and Propanediol are functionally interchangeable humectants and solvent carriers. Both are GREEN. But Propylene Glycol carries an irritation score of 4 — a moderate signal from ECHA CLP classification — while Propanediol shows no irritation concern.
For sensitive skin formulations, this difference matters. Our system makes it visible so the substitution decision is informed by data, not marketing.
Same ingredient, different context
Static safety databases show you the same score regardless of what you're making. A preservative scores identically whether it goes into a rinse-off shampoo or a leave-on eye cream — but the risks are not identical. Our system adapts which safety axes it surfaces based on the product being formulated.
Rinse-off Shampoo
Phenoxyethanol
1.50·GREEN·HIGH confidence
In a rinse-off product, the ecotoxicity axis activates — Phenoxyethanol's aquatic impact data from REACH studies is surfaced so the formulator can assess environmental profile alongside human safety.
Eye Cream
Phenoxyethanol
1.50·GREEN·HIGH confidence
For eye-area application, the eye irritation axis activates. Phenoxyethanol's Draize eye irritation score from REACH OECD TG 405 studies shows moderate irritation potential — critical information for products applied near the eye where incidental contact is expected.
Baby Body Lotion
Phenoxyethanol
1.50·GREEN·HIGH confidence
Baby products trigger the most comprehensive safety view. Skin irritation, genotoxicity, and ecotoxicity all surface. For Phenoxyethanol, the genotoxicity score of 0 provides reassurance, while the irritation score of 7 on the core axis remains the primary concern — exactly the kind of nuance a single composite score would hide.
The composite score for Phenoxyethanol is 1.50 in all three contexts — that doesn't change. What changes is the supplementary intelligence the formulator sees. This is what context-aware safety scoring means: not a different number, but a more complete picture when the product demands it.
Scores derived from 13 authoritative sources including ECHA CLP Annex VI, IARC/NTP carcinogen classifications, CIR safety assessments, ESSCA clinical patch test data, and EPA CompTox predictions, REACH IUCLID registered substance dossiers (26,800+ dossiers, 14.4M study endpoints). Full methodology and source hierarchy published on this page.
Five principles that guide our scoring
- 1
No data gap penalty
If an ingredient has no studies addressing a particular hazard axis, that axis scores 0 — not an elevated score. Absence of evidence is not evidence of harm. This is the single largest methodological difference between our system and EWG Skin Deep, which inflates scores for under-studied ingredients. A novel botanical extract with limited toxicology literature should not receive a higher hazard score than a well-studied petrochemical with confirmed clean safety data.
- 2
Exposure context matters
The same ingredient at the same concentration poses different risks in a leave-on facial serum (hours of skin contact) versus a rinse-off shampoo (30 seconds of contact). Our scoring applies an exposure modifier based on product type. This is standard practice in regulatory toxicology — SCCS and CIR both evaluate safety in the context of exposure — but is absent from consumer-facing databases.
- 3
Source hierarchy
Not all safety data is equal. Peer-reviewed regulatory assessments (ECHA CLP, SCCS opinions, CIR final reports) carry more weight than preliminary findings or supplier-provided data. Our system enforces a clear hierarchy: regulatory authority classifications first, then peer-reviewed clinical data, then curated literature. Scores from higher-tier sources are never downgraded by lower-tier data.
- 4
Confidence tiering
Every safety score carries a confidence level: high (direct ECHA, CIR, or IARC classification), medium (extracted from peer-reviewed literature or CIR group assessments), or low (baseline from COSING registration with no hazard evidence found). Only medium and high confidence scores are displayed to users. Low confidence ingredients show "No hazard data available in our sources" rather than a green score that implies safety has been confirmed.
- 5
Conservative on carcinogenicity
Carcinogenicity is treated asymmetrically. A known carcinogen classified by IARC or NTP cannot have its composite score pulled below a floor by clean scores on other axes. When multiple sources provide different carcinogenicity classifications, the highest (most conservative) score is retained. This reflects the irreversible nature of carcinogenic harm compared to reversible irritation.
Where our data comes from
| Source | What we extract | Coverage |
|---|---|---|
| ECHA CLP Annex VI | H-statement hazard classifications mapped to 6 axes | 3,831 cosmetic-relevant ingredients |
| CIR (Cosmetic Ingredient Review) | Safety assessment conclusions, clinical patch test data | 2,654 assessment reports |
| IARC Monographs | Carcinogen group classifications (1, 2A, 2B, 3) | 811 substances classified |
| NTP Report on Carcinogens | Known and reasonably anticipated carcinogens | 256 substances |
| NACDG Patch Test Data | Sensitization prevalence rates (% positive reactions) | Top allergens with prevalence data |
| SCCS Scientific Opinions | EU-specific safety evaluations for cosmetic ingredients | Referenced per ingredient |
| PubMed / PMC | Hazard signals extracted from peer-reviewed literature | 19,000+ papers mined |
| REACH SVHC List | Substances of very high concern (ED, CMR, PBT flags) | 213 SVHC-flagged ingredients |
| REACH IUCLID Dossiers | LD50, EC50, LC50, Draize PII, LLNA SI, BCF, biodegradation, genotoxicity results from registered substance dossiers | 26,800+ dossiers, 14.4M study endpoints, 3,242 matched INCIs |
| ESSCA Clinical Patch Test Data | Contact allergen prevalence rates from the European Surveillance System on Contact Allergies | Top 30 allergens with clinical prevalence data |
| EPA CompTox / ToxCast | In vitro high-throughput screening predictions for reproductive toxicity, carcinogenicity, and systemic effects | 2,099 cosmetic-relevant ingredients |
| PubChem GHS Classifications | GHS hazard classifications (H-statements) mapped to 6 axes via PubChem API | ~520 ingredients enriched |
| COSING (EU) | INCI identity, Annex II–VI regulatory status | Baseline identity for all EU-listed ingredients |
All extraction pipelines run automatically. Regulatory source pages across 16 markets are monitored daily for changes. When a regulatory authority updates an ingredient classification or restriction, the change is detected via content hashing, logged as a structured event, and propagated to affected safety scores without manual intervention.
Safety scores are only half the picture
An ingredient can score well on all six hazard axes and still be prohibited in your target market. Formaldehyde releasers are a clear example — permitted with concentration limits in some markets, banned outright in others. Safety scoring without market-specific regulatory status is incomplete.
Every ingredient in our system carries per-market regulatory status across 16 markets: EU, US (FDA), China (NMPA IECIC), Japan (MHLW), South Korea (MFDS), India (BIS IS 4707), Canada, Australia (TGA), Brazil (ANVISA), Thailand, Malaysia (NPRA), Singapore (HSA), Indonesia, Vietnam, Philippines, and ASEAN harmonised standards. Status categories include: permitted, restricted (with maximum concentrations, required labelling, and product-type limitations), prohibited, and not listed.
What we deliberately exclude
No certification claims. We do not recommend or imply COSMOS, Ecocert, Vegan, Halal, or any other certification status. Certifications are granted to finished products by certifying bodies based on the full supply chain — they cannot be determined from an ingredient's INCI name alone. Any platform that tells you "this ingredient is COSMOS-approved" based solely on identity data is misleading you.
No trade names. Every ingredient is identified by its INCI name only. We do not surface supplier trade names, branded ingredient names, or proprietary blend names in any output. This ensures supplier neutrality — our recommendations are based on chemistry, not commercial relationships.
No EWG or SkinSafe data. We do not use, reference, or derive scores from EWG Skin Deep or SkinSafe databases. Our scoring is built entirely from primary regulatory and scientific sources. This is a deliberate architectural decision, not an oversight.
No "clean beauty" judgments. We provide hazard data and regulatory status. We do not label ingredients as "clean," "toxic," "natural," or "synthetic" — these are marketing categories, not scientific ones. Formulators make their own informed decisions based on the data we provide.
This methodology is not static. As regulatory authorities update classifications, as new clinical data enters the peer-reviewed literature, and as our automated pipelines expand coverage, scores are refined. We publish this methodology because transparency is a prerequisite for trust — and trust is what separates a tool formulators rely on from one they dismiss.
Last updated: May 2026