⚠ NEXT ENVIRONMENT — changes here affect production data
DATA QUALITY

Your Screener Is Lying to You

Academic research found that 68 of 73 commonly-used financial data items show significant discrepancies between what companies file with the SEC and what data providers show you. Here is exactly how it happens.

This is not a hunch. It's peer-reviewed.

In 2023, researchers from Penn State and Columbia published a study in the Journal of Accounting and Economics titled “Lost in Standardization.” They compared the raw financial data in SEC filings against the same data as reported by Compustat, the S&P Global database that powers most of the financial tools retail investors use.

Their findings were stark: 68 of 73 commonly-used financial data items showed statistically significant discrepancies. Revenue alone differed in roughly 26% of the companies they sampled. And Compustat wasn't even the worst offender — FactSet data showed even larger deviations from the original filings.

A separate study titled “Using XBRL to Conduct a Large-Scale Study of Discrepancies” published in the Journal of Information Systems found similar massive gaps, demonstrating that data providers frequently alter reported numbers to fit uniform templates.

68 / 73
data items with significant discrepancies between Compustat and SEC filings
~26%
of companies had revenue figures that didn't match their own 10-K filings
17 / 30
analyzed variables significantly differed from the 10-K source in a separate XBRL study

Case study: What your screener hides about NVIDIA

In NVIDIA's fiscal 2025 10-K, the company reports $130.5 billion in revenue. A typical screener shows you this as a single “Revenue” line, maybe split into two segments. But the actual filing tells a much richer story.

Typical screener shows
Revenue$130.5B
Cost of Revenue$32.6B
R&D$12.9B
SG&A$3.5B
Net Income$72.9B
5 line items
NVIDIA's 10-K actually reports
Data Center revenue$115.2B
Gaming revenue$11.4B
Professional Visualization$1.9B
Automotive$1.7B
Interest income / Other, net$2.6B
“All Other” (unallocated SBC, infra)separate
6+ distinct line items

The screener version collapses four distinct revenue streams into one number. NVIDIA's “All Other” category — where the company parks unallocated stock-based compensation, corporate infrastructure costs, and acquisition-related charges — simply vanishes. If you're trying to understand NVIDIA's actual cost structure, the screener version isn't just simplified. It's structurally misleading.

Case study: Palantir's $692 million hidden in plain sight

According to Palantir's 2024 10-K filing, the company reported $2.87 billion in revenue with $692 million in stock-based compensation — 24% of revenue. This is one of the most important numbers for valuing Palantir, because SBC represents real dilution to shareholders.

In Palantir's actual 10-K, stock-based compensation is broken out as a dedicated line and further detailed in the notes, showing how it's distributed across cost of revenue, sales and marketing, R&D, and G&A. This distribution matters: SBC concentrated in R&D signals investment in future product; SBC concentrated in sales signals customer acquisition cost.

A typical screener shows you a single “Stock-Based Compensation” number in the cash flow statement — if it shows it at all. Some roll it back into the operating expense categories, making it impossible to separate real cash costs from non-cash dilution. The functional breakdown that tells you where the company is spending equity disappears entirely.

Palantir FY2024 — What's at stake
$2.87B
Revenue
$692M
Stock-based comp
24%
SBC / Revenue
$310M
GAAP operating income

Palantir's SBC exceeds its GAAP operating income by more than 2×. A screener that buries this in generic opex categories makes it easy to miss that the company is paying for growth primarily with shareholder equity.

It's not just messy. It changes what you conclude.

The “Lost in Standardization” researchers tested something specific: do these data discrepancies actually change investment conclusions? They examined the accruals anomaly, one of the most well-documented patterns in finance — companies with high accruals tend to have lower future stock returns.

When they calculated accruals from the original SEC filings, the anomaly was statistically significant. It worked as a return predictor. When they calculated the same metric from Compustat data, the anomaly disappeared entirely. The standardization process had introduced enough noise to destroy a real, exploitable signal.

This isn't a theoretical concern. If you're building a DCF model, screening for quality metrics, or comparing margins across an industry — and you're using data from a typical screener — the numbers you see may not support the conclusions you draw from them.

Why data providers do this

Data providers aren't malicious. They face a genuine engineering problem: every company structures its financial statements differently. NVIDIA has an “All Other” segment. Palantir embeds SBC across every function. A regional bank has line items that don't exist for a SaaS company.

To make their databases queryable and their screeners functional, providers like Compustat and FactSet map every company's unique line items into a fixed, universal schema. When a line item doesn't fit — and the research shows this happens more often with complex, high-growth, or smaller firms — it gets merged, renamed, or dropped.

The result is a database that's easy to query but subtly unfaithful to the source. The discrepancies are largest exactly where they matter most: for firms with novel business models, industry-specific reporting, or granular sub-segment breakdowns.

What we do differently

DeepFundamental parses SEC filings directly and preserves the company's own reporting structure. We don't force NVIDIA's four revenue segments into one line. We don't collapse Palantir's SBC detail into a generic bucket. Every line item traces back to the original filing.

The goal isn't to replace screeners for quick filtering. The goal is to give you the actual data when you sit down to do real analysis — so the numbers in your model match the numbers the company reported under oath.

See what your screener leaves out

Compare NVIDIA's income statement on your current tool against the actual SEC filing. DeepFundamental shows you the difference.

Sources

  1. Du, Huddart & Jiang — "Lost in Standardization: Effects of Financial Statement Database Discrepancies on Inference" (Journal of Accounting and Economics, 2023)
  2. Chychyla & Kogan — "Using XBRL to Conduct a Large-Scale Study of Discrepancies between Compustat and SEC 10-K Filings" (Journal of Information Systems, 2015)
  3. NVIDIA Corporation 10-K Annual Report — Fiscal Year Ended January 26, 2025
  4. Palantir Technologies 10-K Annual Report — Fiscal Year Ended December 31, 2024