Methodology

GoodGuide's ratings are developed using methodologies that are grounded in the sciences of informatics, health and environmental risk assessment, life cycle assessment and social impact analysis. We identify the issues that define health, environmental and social performance and select indicators that can be used to evaluate these at the product or company level. Data that we acquire for each indicator are then scored. Indicator-level scores are then rolled up into sub-scores and into GoodGuide's final summary rating.

Selecting Products and Companies to Rate

GoodGuide focuses on rating everyday household products that consumers buy from offline or online retail outlets like supermarkets or e-commerce sites. Our core product categories are personal care, household chemical and food products. We also rate pet food, paper products, lighting products, home appliances, cell phones and cars. Our goal is to rate the products that comprise the top 80% of current sales in a category, plus innovative products that are marketed as having health, environmental or social benefits. We use a variety of sources to define the catalogue of products we want to cover, and then identify the brands and companies responsible for these products. Once our universe of ratable entities is defined, we collect information about product and company attributes that we need for our ratings system.

To identify and track the relationships between products, brands, companies, and product categories, GoodGuide relies on informatics standards that are used to organize consumer product and corporate information and extends these whenever necessary to meet the special requirements of our ratings system. For example, we use standard UPC codes to identify unique products. We can then link our product records to retailer-specific product identifiers as well as respond to bar code scans from our mobile users. We supplement this with a custom classification system to organize products into categories, because there is no standardized method for grouping products into consumer-relevant product categories.

Designing a Ratings System

GoodGuide uses “ontologies,” or structural frameworks for organizing information, to define “what matters” when assessing the health, environmental or social performance of a product or company. The major issues covered under Health, Environment and Society are summarized in our ratings overview. Our issue framework is derived from current standards of practice in the scientific domains relevant to assessing health, environmental or social impacts. Under Health, for example, we track issues that mirror the standard output of chemical risk assessments or nutritional evaluations. Under Environment, we track issues that mirror the standard output of life cycle assessments. Under Society, we track the issues that are included in standardized reporting on corporate social responsibility, as defined by the Global Reporting Initiative. Our reliance on the informatics systems that have been developed by scientific, regulatory or other authorities to address specific issues ensures that our system provides science-based ratings and can take advantage of standardized information generation.

For each issue, we then identify a set of “indicators” that provide evidence about how a product or company performs on that issue. Currently, GoodGuide utilizes over 1,000 indicators to generate its product and company ratings. There are three major classes of indicators:

  1. Performance Indicators are based on evidence related to real-world impacts and include: Quantitative Metrics (e.g., greenhouse gas emissions per dollar of revenue, occupational injuries per million hours worked), Certifications & Awards (e.g., is the company certified under the ISO 14001 environmental management standard?), Counts of Events (e.g., number of controversies or health & safety fines), and 3rd Party Ratings (e.g., what is the company's Freedom House Score on its involvement with oppressive regimes?).
  2. Policy and Program Indicators are based on evidence related to a company's public commitments or practices and include policy statements (e.g., does the company have a climate change policy) as well as programs and initiatives (e.g., does the company monitor its supply chain for compliance with its policies).
  3. Product-level Indicators are based on attributes of a product related to its potential health impacts (e.g., the level of health concern about the ingredients of a personal care product), environmental impacts (e.g., the amount of recycled content in a paper product), or social impacts (e.g., whether a product is certified fair trade).

One of the most important criteria for selecting indicators is data availability - information for an indicator needs to be publicly available for the majority of products or companies we need to rate in order to ensure comparability in our rating system. Data availability influences GoodGuide's rating system in two important ways:

  • In many cases, data availability considerations require GoodGuide to rely on “screening-level” indicators rather than “data-intensive” indicators. In a world of perfect information, for example, product health ratings would be based on detailed health risk assessments that combine information about the health hazards of ingredients with data characterizing consumer exposure to those chemicals. Unfortunately, these data are almost never made available by manufacturers, so GoodGuide utilizes more readily ascertainable hazard indicators (e.g., the number of ingredients of health concern in a product).
  • Because the pervasive lack of transparency about product attributes and company operations undermines the public's ability to evaluate performance, GoodGuide has created a number of indicators that track data availability and impact company or product scores. At the company level, Transparency indicators measure the relative amount of information available from a company for assessing its environmental or social performance. Companies exhibiting multiple data gaps are penalized in our scoring system for their lack of transparency. At the product level, Data Adequacy indicators track whether the specific data elements that are needed to assess a product's health or environmental impact are public. Personal care or household chemical products that lack complete ingredient lists are penalized in our scoring system because they lack the data needed to assess chemical safety.

At the company-level, our ratings system is designed to support comparisons of companies across one standard set of issues that define environmental or social performance. This allows users to compare company x with company y on issue z, whether the companies are public or private, large or small, or operating in different industrial sectors. While there are frequently differences in the specific indicators used to assess company x on issue y due to data availability, use of one consistent issues framework maintains comparability across companies. In two product categories (apparel and cell phones), we have modified this approach to add several category-specific issues and indicators to company ratings. These supplemental issues are not generally applicable to other product categories but are important aspects of performance in a specific sector (e.g., does an electronics company exhibit extended producer responsibility, as measured by takeback recycling programs).

At the product-level, our ratings system is designed to support comparisons of products within a product category. In contrast with company ratings, it does not make sense to apply a single framework to all products independent of their category. The evaluative framework used to assess personal care products contains a different set of issues and indicators than the framework used to assess paper products - the former focuses on characterizing the health impacts of ingredients, while the latter focuses on characterizing the environmental impacts of raw material sourcing and production processes.

Collecting Data

For each issue and its associated indicators, GoodGuide proceeds to acquire data. We currently use over 1,000 different sources, including scientific institutions, governmental agencies, commercial data aggregators, non-governmental organizations, media outlets and corporations. See our data page for information about our data quality procedures, update frequency and error correction policies.

  • Product-level information is typically obtained from a manufacturer's website or from product labels. GoodGuide defines the data elements required by each indicator used to rate a product in a category and uses automated scraping and information organization tools to create structured data from online data sources. Note that GoodGuide itself does not test products to generate the data we use in our ratings.
  • Company-level information is acquired from various sources. GoodGuide rates two types of companies: (1) publicly traded companies, which are legally required to disclose information on their environmental and social performance, and (2) private companies, which are typically much less transparent about their performance. Company-level information for public firms is primarily obtained from rating services that serve the socially responsible investment market (Thomson-Reuters Asset4 and IW Financial). Note that GoodGuide may rate a subsidiary of a public parent company as if it were a separate company if there is evidence that the subsidiary has independent management as well as environmental and social policies and practices that are distinct from those of the parent company. This may be the case, for example, when a previously independent company with a strong green brand has been acquired by a larger corporation. Subsidiaries whose independence has not been determined receive their parent company score.

Scoring Indicators

Once we have data for an indicator, we score observed values using GoodGuide's standard scale of 0 to 10, where 0 represents the worst performance and 10 the best. Indicator scoring is a complex process because of the variety of quantitative and qualitative types of evidence we utilize. Available evidence includes absolute measures (e.g., the amount of toxic chemicals a company releases to the environment), standardized measures (e.g., the amount of greenhouse gas emissions per $ of revenue), relative measures (e.g., the percent of energy use derived from renewable sources), counts (e.g., number of controversies), binary indicators (e.g., does the company have a climate change policy), and content variables (e.g., which certifications has a product or company received).

For company-level data, GoodGuide applies a consistent set of scoring rules to indicators based on their type. Indicators are classified as either a Performance or a Policy and Program Indicator. This classification influences the score range that will be applied to an indicator. Performance Indicators can be used to assign scores across our entire 0-10 scale and are the only evidence type used at the low (0-3) and high (9-10) ends because we believe that real-world evidence is needed to identify the worst and best performers. Policy & Program Indicators are constrained to scores in the mid-range of our scale (4-8) because this kind of qualitative evidence characterizes the relative capability of an organization to address an issue, but is usually insufficient to reliably identify the worst and best performers. Note that most GoodGuide company ratings are driven by qualitative Policy and Program indicators, because quantitative data on real-world performance are rarely made public and relatively few companies are the subject of controversies or the recipient of awards/certifications.

Within our broad indicator classes, additional scoring rules are required to address various differences between indicators. Indicators vary in polarity (e.g., does an indicator provide a positive or negative signal about an issue) as well as the distributional characteristics of observed data. For example, data values for Quantitative Metrics (like greenhouse gas emissions per dollar of revenue) are typically continuous and provide valuable evidence across the entire observed distribution. Data like this are sorted into decile bins: observations in the best-performing decile are scored 10, while observations in the worst performing decile are scored 1. All other observations receive scores based on their assigned decile. As a result, GoodGuide's scoring scale from 1-10 can typically be utilized with continuous data. In contrast, data values for Certification and Award variables constitute a positive signal about a company and are binary. Observations indicating a company has a specific certification receive a score determined by the strength and coverage of the certification, while uncertified entities receive no score. In this case, only the positive portion of GoodGuide's scoring scale will be utilized (e.g., top-tier certifications will be awarded scores between 8 - 10). Data values for Counts of Events (like controversies) constitute a negative signal about a company and are typically a skewed distribution because of the rarity of such events. Counts of just one controversy generally receive a score of 4, while 5 or more controversies generally receive a score of 0. In this case, only the negative portion of GoodGuide's scoring scale will be utilized. See our scoring rules page for complete details on the scoring rules applied to each indicator type, as well as examples.

For product-level data, GoodGuide selects indicators and utilizes scoring methods that vary by product category. Full details of the issues addressed and scoring methods used in different product categories are provided in the following pages:

  • Apparel
  • Appliances
  • Candy
  • Cars
  • Cell Phones
  • Coffee and Tea
  • Diapers
  • Drinks
  • Food
  • Lighting Products
  • Paper Products
  • Personal Care and Household Chemicals
  • Pet Food
  • Tampons
  • Aggregating Indicator Scores to Generate Ratings

    Once indicators are scored, we combine scores from groups of indicators or issues to assign ratings. All issues and indicators are not equal. To generate a rating that accurately reflects the relative importance of different issues or indicators it is necessary to apply weights to issues or utilize different aggregation algorithms. Some issues are more important than others and should therefore carry more weight when calculating a rating. Some indicators are better than others because they come from a more reliable source, or provide a stronger signal about relative performance. In assessing Occupational Safety and Health, for example, we may have a choice between indicators based on policy actions and indicators based on actual fatality or injury data. In such cases, we give extra emphasis to the score assigned to the strongest available indicator - the one based on real-world data.

    Our Health, Environment and Society rating frameworks define what is known as a “value tree” in multi-attribute utility theory - they specify sets of indicators, sub-issues or major issues that are hierarchically organized into “nodes.” For each node, we specify the weights or aggregation algorithm that is used to roll up scores from the constituents of that node.

    Weights are used sparingly when we rate companies, and are used more frequently in category-specific product ratings. The most important examples of node weighting at the company-level include:

    • Making Transparency indicators (which measure how much data a company publicly discloses) a major component of a company's summary Environmental (25%) and Social (20%) scores
    • Giving extra weight to Compliance and Controversy scores when evaluating a company's management performance on environmental or social issues.
    • Giving extra weight to Supply Chain indicators when evaluating a company's governance practices.

    Aggregation algorithms are used throughout our ratings system to combine sets of scores. Available methods include:

    • Maximum (select the highest score in a set). This is generally used in positive nodes that include Policy and Program indicators because it promotes the most positive signal about company practices regarding an issue, without dilution due to inaction or no data on other indicators relevant to the same issue.
    • Minimum (select the lowest score in a set). This is generally used in negative nodes that include Compliance and Controversy indicators because it promotes the most negative signal about company behavior on an issue, without dilution by positive values on other indicators relevant to the same issue.
    • Mean (calculate the average of all scores in a set). This is generally used 1) when aggregating scores from a set of positive and negative sub-nodes in order to allow real world signals (from either quantitative metrics or compliance/controversy counts) to influence a policy score in either a positive or negative direction and 2) in quantitative nodes, where it averages positive and/or negative signals from a set of performance metrics.
    • Count (assign score based on count of different score values in a set). This is used in nodes where the number and quality of different activities is important to evaluating performance. It allows greater credit to be given for multiple high impact actions compared to Maximum.
    • Preferred (select score from top available indicator in a rank ordered set of indicators). This is used in nodes where data sources or indicators have been rank ordered based on quality or relevance to an issue. It promotes the score from the best available source or indicator.
    • Matrix (apply a custom calculation to a set of indicators). This is used in product-level ratings when a set of indicators have to be combined using domain-specific rules to correctly characterize an issue. Prominent examples include the scoring rules applied to rate food products on their nutritional value and personal care products on their potential human health impact.

    The Role of Value Judgments

    Value judgments are unavoidable in rating systems, and GoodGuide's is no exception. Even the most scientifically grounded assessment of environmental, health, or social performance requires value judgments about the relative importance of various issues and types of evidence, as well as the treatment of data gaps. The design of our rating system reflects the following major value judgments:

    • In order to provide our users with actionable, easy-to-understand guidance, GoodGuide provides a single summary rating for a product, derived by giving equal weight to Health, Environment and Society sub-scores. Rational people can disagree over the relative weight to give health vs. environment vs. social impacts and there is no objective, correct solution to the problem of how to aggregate such disparate concerns. GoodGuide opted for equal weighting because we believe Health, Environment and Social considerations should be integrated into all consumer product decision-making. Users with different preferences can select products based only on the sub-score they care most about.
    • In order to address the extensive lack of environmental and social data at the product-level, GoodGuide generally relies on company-level environmental and social ratings to characterize the performance of a product. While Health scores can be generated at the product level because of the public availability of product ingredient lists, scoring environmental impacts at the product level typically requires detailed information about where a product is manufactured, the types of emissions or resource use associated with that specific product, etc. Companies that make a product often make numerous other products, and information about the impacts of a company's manufacturing facilities almost never have product-level resolution. If such data even exist, they are typically known only to the manufacturer and are not generally available to systems like GoodGuide. The situation is even clearer with Social scores, where it is virtually impossible to acquire information about the impacts a specific product has on social attributes such as worker rights, community engagement, etc. At the present time, company-level performance on environmental and social attributes is the most widely available and reliable proxy for product-level impacts in these areas.
    • In some product categories, we have identified methodologies that can be used to characterize product-level environmental or social performance. If product-level ratings are available, they are combined with company-level ratings to generate GoodGuide's Environment or Social sub-score for a product. Depending on the product category, company-level and product-level scores are combined using one of the following weighting rules:
      • 75% Company / 25% Product — Available product data are an insufficient proxy for the overall impacts of a product, so more weight is given to the company's Environmental or Social score to ensure full coverage of relevant attributes.
      • 50% Company / 50% Product — Product- and company-level scores contribute equally to the characterization of the overall impacts of a product.
      • 25% Company / 75% Product — Product-level scores incorporate the most significant aspects of overall impacts of a product, but company-level scores are included to address product-level data gaps.