SEC EDGAR Risk Factor Trend Analysis

By Eldar · Published June 29, 2026

Extract and analyze risk factors from 10-K filings for AAPL, MSFT, NVDA, and TSLA using SEC APIs, NLP scoring, and financial metrics to identify emerging business risks.

  • sec-edgar
  • nlp
  • financial-analysis
  • risk-assessment
  • time-series
33 cells1 experiment28 views0 forks

Inside this notebook

# SEC EDGAR Risk Factor Trend Analysis — AAPL, MSFT, NVDA, TSLA **Objective:** Extract revenue, margins, segment performance, and risk-factor language from recent 10-K filings for Apple, Microsoft, NVIDIA, and Tesla. Build an NLP-driven scoring workflow that flags which business risks are increasing over time, and deliver a consulting-style memo on the biggest changes. **Data Sources:** - SEC EDGAR REST APIs (free, public) — company submissions, XBRL financial data, full-text filing search - 10-K annual filings (most recent 3–5 fiscal years per company) **Deliverables:** - Financial trend tables (revenue, gross margin, operating margin, segment breakdown) - Risk factor text corpus aligned across years - Per-risk-category intensity scores with year-over-year deltas - Consulting memo summarising top-5 rising risks and financial context

# === Imports & Setup ===
import json
import re
import time
import warnings
from datetime import datetime, timezone
from collections import defaultdict, Counter
from typing import Optional

import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup

# NLP
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.decomposition import NMF
…
All imports loaded. Environment ready.
# === SEC EDGAR API Configuration ===

# CIK lookup table (from SEC's company_tickers.json)
CIK_MAP = {
    "AAPL": "0000320193",
    "MSFT": "0000789019",
    "NVDA": "0001045810",
    "TSLA": "0001318605",
}

# Human-readable names
COMPANY_NAMES = {
    "AAPL": "Apple Inc.",
    "MSFT": "Microsoft Corporation",
    "NVDA": "NVIDIA Corporation",
    "TSLA": "Tesla, Inc.",
}

…
SEC API configuration ready.
Companies: AAPL, MSFT, NVDA, TSLA
# === Fetch Recent 10-K Filings from SEC EDGAR ===

def fetch_company_filings(cik: str, ticker: str, max_filings: int = 20) -> pd.DataFrame:
    """Fetch recent filings from SEC EDGAR submissions endpoint."""
    url = f"{BASE_URL}/submissions/CIK{cik}.json"
    resp = requests.get(url, headers=HEADERS, timeout=30)
    resp.raise_for_status()
    data = resp.json()
    
    recent = data["filings"]["recent"]
    df = pd.DataFrame({
        "form": recent["form"],
        "filing_date": recent["filingDate"],
        "report_date": recent.get("reportDate", [""]*len(recent["form"])),
        "primary_doc": recent.get("primaryDocument", [""]*len(recent["form"])),
        "accession_number": recent["accessionNumber"],
    })
    
…
Fetching filings for AAPL (Apple Inc.)...
  → 11 10-K filings, years: [2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025]
Fetching filings for MSFT (Microsoft Corporation)...
  → 6 10-K filings, years: [2020, 2021, 2022, 2023, 2024, 2025]
Fetching filings for NVDA (NVIDIA Corporation)...
  → 6 10-K filings, years: [2021, 2022, 2023, 2024, 2025, 2026]
Fetching filings for TSLA (Tesla, Inc.)...
  → 8 10-K filings, years: [2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025]

=== Selected…

## 1. Financial Data Extraction — Revenue, Margins & Segments Using the SEC XBRL `companyconcept` API to pull structured financial metrics from each company's filings.

# === Extract Financial Data from SEC XBRL API ===

# XBRL taxonomy concepts we need
FINANCIAL_CONCEPTS = {
    "Revenue": "us-gaap/RevenueFromContractWithCustomerExcludingAssessedTax",
    "CostOfRevenue": "us-gaap/CostOfRevenue",
    "GrossProfit": "us-gaap/GrossProfit",
    "OperatingIncome": "us-gaap/OperatingIncomeLoss",
    "NetIncome": "us-gaap/NetIncomeLoss",
    "R&D": "us-gaap/ResearchAndDevelopmentExpense",
    "SG&A": "us-gaap/SellingGeneralAndAdministrativeExpense",
    "TotalAssets": "us-gaap/Assets",
    "TotalLiabilities": "us-gaap/Liabilities",
    "OperatingCashFlow": "us-gaap/NetCashProvidedByOperatingActivities",
}

def fetch_xbrl_concept(cik: str, concept_path: str) -> Optional[list]:
    """Fetch XBRL data for a specific concept from SEC API."""
…
=== AAPL (Apple Inc.) ===
  Extracted financial concepts for years [2022, 2023, 2024, 2025, 2026]

=== MSFT (Microsoft Corporation) ===
  Extracted financial concepts for years [2022, 2023, 2024, 2025, 2026]

=== NVDA (NVIDIA Corporation) ===
  Extracted financial concepts for years [2022, 2023, 2024, 2025, 2026]

=== TSLA (Tesla, Inc.) ===
  Extracted financial concepts for years [2022, 2023, 2024, 2025, 2026]

Total financial data points: 136
# === Financial Trend Visualization ===

# Check which metrics are available
available_metrics = fin_df["metric"].unique()
print("Available metrics:", list(available_metrics))

# Revenue and margin trends
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle("Financial Performance Trends (2022–2026)", fontsize=15, fontweight="bold", y=1.01)

companies_order = ["AAPL", "MSFT", "NVDA", "TSLA"]
colors = {"AAPL": "#555555", "MSFT": "#00A4EF", "NVDA": "#76B900", "TSLA": "#E82127"}

# 1. Revenue
ax = axes[0, 0]
for ticker in companies_order:
    df_t = fin_pivot[fin_pivot["ticker"] == ticker].sort_values("fiscal_year")
    ax.plot(df_t["fiscal_year"], df_t["Revenue"], marker="o", label=ticker, 
…
Available metrics: ['Revenue', 'GrossProfit', 'OperatingIncome', 'NetIncome', 'R&D', 'SG&A', 'TotalAssets', 'TotalLiabilities', 'CostOfRevenue']
✅ Financial trends chart saved.

This is a preview. Open the live notebook to see all 33 cells with their charts and full outputs, or fork it into your own Clusy workspace.