Detection Science

01 — DC Layer

DC Layer — Panel & String Physics

The DC side of a solar plant is where energy conversion begins — and where the most faults originate. Each string is a series chain of modules, making it sensitive to any single point of current interruption. Detection at this layer relies on comparing strings against each other and against an irradiance-corrected model, since a single string's behavior is only meaningful in the context of its peers.

Referenced standards IEC 61724-1· IEA PVPS T13· NREL PVWatts

1.01

One or more strings producing zero current despite available irradiance.

In a series string, a single open-circuit failure silences the entire chain. The symptom is unambiguous — but the root cause (broken connector, blown fuse, damaged module) requires secondary discrimination.

Observed via String current = 0 Irradiance > threshold Peer string comparison

1.02

Unequal current output across strings within the same inverter input — caused by aging, partial failure, or module-level degradation.

Because strings share a DC bus, a weak string forces the MPPT to a suboptimal operating point for all. The yield loss compounds: it is not just the weak string underperforming, but the healthy strings being dragged down.

Observed via Current imbalance across peers Statistical spread vs baseline

1.03

Particulate accumulation — dust, sand, pollen, bird droppings — on the module surface, blocking a fraction of incident irradiance.

Soiling loss is spectrally selective and non-uniform, making it hard to distinguish from real irradiance reduction. Detection requires a long-horizon Performance Ratio trend separated from temperature and degradation components.

Observed via PR trend vs irradiance-corrected model Pre/post-cleaning comparison

1.04

Obstruction by trees, structures, or cloud edges causing sharp, spatially localized current dips across one or more strings.

Partial shade on a single cell activates bypass diodes, redirecting current around the shaded sub-string. This creates a staircase I–V curve with multiple local maxima, causing MPPT confusion and yield loss larger than the shaded fraction alone.

Observed via Fast ramp drops Spatial pattern across strings Irradiance correlation

1.05

A bypass diode that has failed open (shade impact maximized) or failed closed (permanent partial short-circuit within the module).

A healthy bypass diode limits shade-induced voltage reversal across sub-strings. An open failure exposes cells to reverse bias, accelerating hotspot formation. A shorted diode permanently bypasses sub-strings, creating a fixed power loss invisible to string-level current monitoring alone.

Observed via V×I inconsistency under shade Module-level thermal imaging

1.06

Degraded or loose MC4 connectors introducing series resistance, causing intermittent or thermally-variable power loss.

Connector resistance increases non-linearly with temperature and load current. At peak irradiance, a faulty connector dissipates more heat, worsening resistance — creating a feedback loop visible only during high-production hours.

Observed via Resistive loss signature Intermittent current drop Time-of-day correlation

1.07

A break in the string conductor — voltage is present, current is zero.

Physically distinct from a string outage caused by fusing or a module failure: the voltage floats at Voc while current is entirely absent. The V≠0, I=0 signature is the diagnostic signature for conductor or terminal failure.

Observed via V ≠ 0, I = 0 Continuity check

1.08

Unintended low-impedance path to ground or between conductors — Voc collapses, current behavior becomes abnormal.

Ground faults in large DC arrays can be difficult to localize because current finds the path of least resistance across the entire field. They represent both yield loss and a safety hazard under certain failure modes.

Observed via Voc collapse Ground fault monitor trigger Unexpected Isc behavior

1.09

Potential-Induced Degradation — leakage current driven by high system voltage causes long-term module power loss.

PID disproportionately affects strings at the negative end of a floating system. The degradation is reversible in early stages via reverse-bias regeneration, making early detection economically decisive.

Observed via Gradual PR decline String Isc drop Spatial position within array

1.10

Light-Induced Degradation (LID) and Light- and Elevated-Temperature-Induced Degradation (LETID) — early-life power loss in the first 1–2 years of operation.

LID in Czochralski silicon is driven by boron-oxygen complex formation. LETID appears across multiple cell technologies and is not fully stabilized within the first year. Both are often mistaken for underperforming modules or commissioning errors.

Observed via Early-life PR below model Distinct signature from long-term PID Year 1–2 trend

1.11

Localized cell heating caused by current mismatch, cracks, or shading — accelerating long-term degradation and creating thermal stress.

A cracked or mismatched cell in a sub-string operates in reverse bias when its string is producing current, dissipating power as heat. Thermal imaging is the gold standard; electrical signals show output loss versus peer modules.

Observed via Output loss vs peer panels Thermal imaging Long-term trend

02 — Inverter

Inverter Layer — Power Conversion Analysis

The inverter is the single most critical component in a solar plant — a failure here eliminates the entire DC field's contribution to the grid. Inverter faults range from hard trips (immediate, recoverable) to slow degradation (efficiency curves drifting below the manufacturer specification). Distinguishing thermal derating from MPPT failure from hardware degradation requires watching the relationship between DC input and AC output across operating conditions.

Referenced standards IEC 61683· IEEE 1547· IEC 62109

2.01

The inverter ceases AC output while DC energy is present.

The inverter's protection logic — over-temperature, over-voltage, anti-islanding, ground fault — can trigger a trip independently of any DC-side failure. Distinguishing self-protective trips from hardware failures determines whether the response is a reset, a service call, or a module-level investigation.

Observed via AC power = 0 DC present Error code log

2.02

The inverter caps its AC output below rated capacity in response to high ambient or internal temperature.

Power semiconductors (IGBTs) have temperature-dependent efficiency and reliability limits. The inverter controller deliberately reduces output to protect components — visible as an AC output plateau that correlates with ambient temperature, not with DC availability.

Observed via AC output plateau vs temperature Efficiency ratio drop

2.03

DC input exceeds the inverter's rated AC capacity — excess power is dissipated, not converted.

Clipping is a design artifact, not a failure — it results from a DC:AC ratio deliberately above 1.0 for economic reasons (oversized DC field reduces LCOE by maximizing afternoon output). Detection matters for loss quantification and warranty reporting.

Observed via AC output flat-top during peak irradiance DC > AC rated Clipping hours count

2.04

The AC-to-DC power conversion ratio falls below the manufacturer's efficiency curve for given operating conditions.

Inverter efficiency is a function of loading level, input voltage, and temperature. An inverter operating at 30% load has lower efficiency than at 80% — but a permanent efficiency deficit at any loading level signals component degradation.

Observed via Efficiency ratio vs manufacturer curve Baseline model deviation

2.05

The inverter's Maximum Power Point Tracking algorithm fails to find or hold the optimal DC operating point.

The PV I–V curve changes with irradiance, temperature, and partial shade every second. An MPPT algorithm that hunts too aggressively, gets trapped at a local maximum (under partial shade), or freezes at a fixed operating point leaves significant yield on the table.

Observed via Operating point below Vmpp×Impp expected Output vs irradiance correlation lost

2.06

Failure of IGBT, capacitor, gate driver, or control board — producing output distortion, trips, or total loss.

Electrolytic capacitors in DC link circuits degrade with thermal cycling, reducing capacitance and increasing ESR over time. IGBT failures are often preceded by increased switching losses detectable as efficiency drift before a hard fault event.

Observed via Error codes Output waveform distortion Efficiency decline preceding trip

2.07

Rapid cycling between trip and reconnect — typically at minimum power threshold conditions (dawn/dusk).

Near startup, irradiance may be insufficient to sustain stable inverter operation. A poorly tuned startup threshold creates a hysteresis loop — the inverter connects, draws power, drops below the threshold, trips, irradiance recovers, reconnect — cycling multiple times per minute.

Observed via High-frequency trip/reconnect events Time-of-day pattern (dawn/dusk)

2.08

Failure of the inverter's internal control power supply, cooling fan, or communication module — taking the inverter offline without a DC-side cause.

Auxiliary power circuits are often powered from the DC bus or a separate AC supply. A fan failure can cause gradual thermal derating followed by a protective shutdown — indistinguishable at the output level from a grid fault without telemetry from the inverter's internal sensors.

Observed via Inverter offline No comms No output despite DC available

2.09

Following a grid outage, the inverter successfully trips as required — but fails to reconnect when the grid is restored.

Grid reconnection requires the inverter to confirm stable voltage, frequency, and phase for a defined window (typically 5 minutes under IEEE 1547). A lockout occurs when the inverter's reconnection logic fails to validate these conditions, despite the grid being healthy.

Observed via Grid voltage OK Inverter stays offline Reconnect timer log

03 — AC & Grid Interface

AC & Grid Interface

The AC side is where the plant meets the grid — and where inverters are most vulnerable to events they cannot control. Grid faults cause inverter trips that are often misclassified as inverter failures. Separating the two requires independent voltage, frequency, and phase monitoring at the point of common coupling (PCC), not just at the inverter output.

Referenced standards IEEE 1547-2018· IEC 61000-4· EN 50160

3.01

Complete loss of grid voltage — inverter trips to zero as required by anti-islanding standards.

A grid outage is not a plant fault — but without independent grid voltage measurement, it is indistinguishable in plant data from an inverter failure. Confirming via the grid voltage sensor allows correct fault attribution and rapid restart preparation.

Observed via AC = 0 Grid voltage sensor confirmation

3.02

Grid voltage rises above the inverter's upper operating limit, causing derating or protective trip.

Overvoltage is common in distribution networks where solar generation temporarily exceeds local load. As installed solar density increases, overvoltage events are becoming a significant source of curtailment — distinct from DSO-commanded curtailment.

Observed via Vac > upper threshold Output curtailed without DC fault

3.03

Grid voltage drops below the inverter's lower operating limit — derating or trip.

Undervoltage typically indicates upstream faults, heavy load, or transformer saturation. It presents identically to an inverter trip in plant data without voltage telemetry.

Observed via Vac < lower threshold No DC-side anomaly

3.04

Grid frequency deviates from the nominal operating range (e.g., 47.5–51.5 Hz under IEC standards) — inverter trips.

Frequency deviations indicate grid stress events: large generator trips, load imbalances, or islanding conditions. Inverter protection is set to trip within these limits by grid code requirement.

Observed via Frequency deviation log Trip event timing

3.05

Unequal power or current distribution across phases in a three-phase system.

Phase imbalance causes additional losses in the inverter's AC output stage and can indicate asymmetric string allocation, unequal load distribution, or a partial AC-side fault. Triplen harmonic currents from imbalance can cause transformer heating.

Observed via Phase current/power asymmetry Neutral current elevated

3.06

Failure or degradation of the step-up or distribution transformer — causing post-inverter power loss with no DC-side signature.

Transformer faults manifest as power reduction after the inverter, making them invisible to string or inverter monitoring. They are localized by comparing inverter output against the grid meter and are often first detected as unexplained PR drops.

Observed via Power drop post-inverter No DC-side fault Grid meter vs inverter output delta

3.07

Resistive losses in AC transmission cables — elevated with load, degrading over time with poor terminations or corrosion.

AC cable losses follow I²R scaling — worst at peak generation. They are distinguished from transformer faults by their load-dependent profile and localized by measurement at multiple points in the AC run.

Observed via Measured vs expected cable losses Load-dependent profile

3.08

Excessive reactive power output increases apparent current, raises cable losses, and may trigger grid operator penalties.

Modern inverters can provide reactive power support under grid codes (e.g., LVRT reactive injection). A persistent low power factor without a grid code requirement indicates misconfiguration or a control loop issue.

Observed via PF below 0.95 Reactive power log Apparent vs active power ratio

04 — Environmental

Environmental Conditions

Not every production shortfall is a fault. Environmental conditions — cloud transients, high temperature, seasonal irradiance shifts — produce output variations that are physically expected. Misclassifying these as faults generates noise and destroys operator trust. The challenge is not detection but discrimination: separating genuine anomalies from expected physics. This requires a continuously-updated weather-normalized model.

Referenced standards IEC 61724-1· pvlib-python· PVGIS· ERA5 reanalysis

4.01

Short-duration irradiance ramps caused by cloud edges moving across the array — often misclassified as inverter trips or string faults.

The ramp rate of a cloud edge crossing a large plant is a function of cloud velocity and array geometry. A 100 kW/min ramp is normal for a 5 MW plant during broken cloud cover — triggering a false alarm only if the detection model does not account for irradiance dynamics.

Observed via Short-duration ramp Correlated with pyranometer No lag vs irradiance

4.02

Cumulative particulate accumulation producing a gradual PR trend that must be separated from module degradation and sensor drift.

Soiling rates vary by geography (coastal salinity, Saharan dust, agricultural pollen) and season. Quantifying soiling loss requires comparing the current PR to a clean-baseline model — not to a fixed historical average — to avoid conflating soiling with real degradation.

Observed via PR trend declining No degradation signature Cleaning event correlation

4.03

High cell temperature reduces Voc and module efficiency — a physics-expected loss at a rate of approximately −0.35% per °C for silicon.

Cell temperature depends on ambient temperature, irradiance, and wind speed. An output deficit on a hot afternoon is not a fault — but an output deficit that exceeds the temperature-corrected model is. The correction is the baseline.

Observed via Output vs temperature-corrected pvlib model Cell temperature sensor

4.04

Natural irradiance reduction across winter months — expected lower generation due to lower sun angles and shorter days.

Comparing December output to June output without irradiance normalization is meaningless. PR-based comparison against a pvlib seasonal model is the only correct basis for performance assessment across months.

Observed via Expected vs actual via pvlib seasonal model PR stability across seasons

4.05

Physical obstruction on the panel surface causing near-zero or zero output — distinct from a module fault by its geographic and seasonal pattern.

Snow shedding is a function of panel tilt, snow density, and temperature. Flat-mounted panels retain snow longer; steep-tilted panels clear faster. The presence of a thermal gradient across the panel (heated cells underneath) can accelerate clearing.

Observed via Near-zero output Weather station snowfall Satellite / visual confirmation

4.06

A single-axis tracker row stuck at the wrong angle, reducing energy yield relative to adjacent rows on the same field.

Tracker stalls typically originate from motor failures, wind-braking activation (stow events not releasing), or control communication failure. They are identified by the angular mismatch between the stalled row and its neighbors — requiring row-level angle telemetry or inference from the row's output profile versus its irradiance-expected value.

Observed via Single row underperforming vs adjacent Angle sensor mismatch Output profile divergence

05 — Data & Telemetry

Data & Telemetry Integrity

A detection engine is only as trustworthy as the data it runs on. Telemetry faults — register mismatches, stale values, sensor drift, scaling errors — are among the most common root causes of false alarms in operational solar plants. Before any physical fault can be confidently attributed, the data pipeline must be validated. This layer runs ahead of all physical inference.

5.01

AC and DC registers are out of sync — the data system reports physically impossible values such as AC output exceeding DC input.

AC > DC at the inverter level violates the first law of thermodynamics. When this appears in data, it is a register timing offset, a communication protocol misalignment, or a channel labeling error — not a real measurement. Any physical inference built on mismatched registers is invalid.

Observed via AC > DC physically impossible Timestamp alignment check

5.02

One or more data channels stop updating in real time, reporting a frozen or lagged value while the plant state changes.

Stale data is particularly dangerous because it creates false confidence — the value is present but wrong. It is most common at field edges with long cable runs, noisy RS-485 buses, or intermittent cellular connectivity in remote sites.

Observed via Timestamp delta between channels Frozen value detection Variance = 0 over interval

5.03

Incorrect unit interpretation at the data logger or SCADA level — e.g., watts reported as kilowatts, millivolts reported as volts.

A 1000× scaling error produces values that are physically plausible (e.g., 500 kW where 500,000 W was correct) but wrong in context. They are caught by comparing against physical upper bounds and against expected ranges for the plant capacity.

Observed via Values outside physical plausible range Plant capacity cross-check

5.04

One or more strings, meters, or sensors stop reporting — leaving blind spots in the monitoring coverage.

Missing channels do not produce wrong values — they produce no values. This makes them easy to detect but easy to ignore. A missing string channel can mask a real fault for days if the gap is not actively monitored.

Observed via Null/gap in string-level data Missing sensor IDs Expected channel count check

5.05

Multiple channels reporting statistically identical values — indicating a data collection error, shared register, or wiring mistake.

In a healthy plant, peer strings always differ slightly due to manufacturing tolerances, soiling, and shading variation. Perfect equality (p1 = p2 = p3 exactly) is a physical impossibility that indicates a data fault.

Observed via Statistical duplicate detection across channels Variance = 0 across peers

5.06

Gradual calibration shift in a pyranometer, temperature sensor, or current transducer — introducing a slow systematic bias into all downstream calculations.

Pyranometer drift is a well-documented phenomenon: soiling, cosine response degradation, and thermopile aging introduce 1–3% per year error in unchecked sensors. Performance Ratio calculations are directly corrupted by an irradiance sensor that reads 5% low.

Observed via Slow divergence vs reference sensor Satellite irradiance comparison Cross-sensor ratio trend

5.07

Sudden, physically implausible jumps in any signal — caused by electrical interference, ADC errors, or communication corruption.

Noise on a DC current channel can be introduced by inverter switching harmonics coupling into the measurement circuit. Spikes must be identified and removed before any statistical analysis — a single 10× spike corrupts a rolling average for minutes.

Observed via Value > physical max Outlier detection Rate-of-change threshold

5.08

Data from different devices is not time-synchronized — making cross-signal comparisons and event correlation unreliable.

A 15-second clock drift between an inverter and its string combiner creates a phase offset in cross-correlation analysis. At 5-minute resolution data, this drift is invisible; at 1-second resolution it becomes a serious diagnostic error.

Observed via Cross-signal lag Interpolation artifacts NTP sync check

5.09

Incorrect current transformer or potential transformer ratio configured at commissioning — producing a systematic offset in all reported power values.

A CT ratio set to 200:5 when the actual transformer is 400:5 produces a 2× error in every current reading — persistent, systematic, and invisible without comparison to an independent meter. This is distinct from drift: it does not change over time.

Observed via Systematic offset vs expected Not correlated with operating conditions Independent meter comparison

06 — Performance

Performance & Economic Attribution

Physical faults matter because they translate into lost energy — and lost revenue. The performance layer connects fault detection to quantified impact: how many kWh were lost, to which cause, and what would have been generated under fault-free conditions. This is where physics meets the P&L.

Referenced standards IEC 61724-1 (PR)· IEA PVPS T13-01· NREL Loss Waterfall

6.01

The ratio of actual energy output to the energy a fault-free, temperature-normalized system of the same capacity would have produced.

PR is the universal solar plant health metric. A PR of 0.78 means the plant produced 78% of what physics and irradiance allow. All fault categories degrade PR; isolating which fault is responsible requires the waterfall decomposition below.

Observed via Actual yield ÷ irradiance-normalized expected yield

6.02

Energy not generated during periods when the inverter or string was offline despite available irradiance.

Availability loss is the most directly attributable category — every minute of downtime has a calculable kWh cost at the current irradiance level. It is distinguished from curtailment (active grid command) by the absence of a DSO signal.

Observed via Uptime logs Irradiance during downtime Zero output window

6.03

Grid operator-commanded output reduction — externally imposed, not caused by a plant fault.

Curtailment is increasingly common in high-solar-penetration grids. It must be separated from clipping and underperformance in loss waterfall accounting — curtailed energy is not recoverable by the operator.

Observed via Output capped below capacity No inverter fault present DSO signal or grid event log

6.04

The fraction of PR degradation attributable to soiling after removing temperature and irradiance variation.

Soiling loss quantification is the economic basis for cleaning scheduling decisions. The break-even cleaning frequency depends on the soiling rate (kWh/day lost) vs. the cleaning cost per visit.

Observed via Before/after cleaning PR comparison Soiling index model Satellite reflectance

6.05

System-wide yield reduction from string-to-string or module-to-module variation forcing a suboptimal collective operating point.

Mismatch loss is a static design artifact that worsens as the fleet ages and modules degrade at different rates. It is quantified by the statistical spread across strings — the gap between the theoretical sum of string maxima and the actual MPPT operating point.

Observed via Peer comparison Statistical spread in string output String-level MPPT analysis

6.06

Energy irreversibly lost because DC production exceeds the inverter's rated AC output capacity during peak hours.

Clipping loss is expected in most optimally-designed plants (DC:AC ratio > 1.0). It becomes a concern when clipping hours exceed the design assumption — indicating the DC field is outperforming projections, or that the inverter is running below its rated capacity.

Observed via Flat-top hours × expected unclipped power DC:AC ratio at peak

07 — Signal Intelligence

Signal Intelligence Index

Each detection signal maps to one or more physical fault domains. Ellume's engine runs all applicable signals simultaneously — no single signal is sufficient in isolation.

Observable Signal	Fault Domains
String current imbalance	DC Layer
V × I inconsistency	DC LayerInverter
Irradiance-normalized output ratio	DC LayerEnvironmentalPerformance
AC vs DC power ratio	Inverter
Efficiency curve deviation	Inverter
AC voltage / frequency logs	AC & Grid
Phase current comparison	AC & Grid
Weather-corrected pvlib model	EnvironmentalPerformance
Performance Ratio (PR) trend	Performance
Energy loss waterfall	Performance
Physical sanity bounds	Data Integrity
Cross-signal temporal consistency	Data Integrity
Reference sensor / satellite comparison	Data IntegrityEnvironmental

The Physics Behind Every Alert

DC Layer — Panel & String Physics

String Outage

String Mismatch

Soiling Loss

Partial Shading

Bypass Diode Failure

Connector & Cable Fault

Open Circuit

Short Circuit

Degradation / PID

LID / LETID

Hotspots / Cell Damage

Inverter Layer — Power Conversion Analysis

Inverter Trip / Shutdown

Thermal Derating

Clipping

Low Efficiency

MPPT Failure

Internal Hardware Fault

Start/Stop Oscillation

Aux Power Failure

Anti-Islanding Lockout

AC & Grid Interface

Grid Outage

Overvoltage

Undervoltage

Frequency Instability

Phase Imbalance

Transformer Loss / Fault

AC Cable Losses

Poor Power Factor

Environmental Conditions

Cloud Transients

Soiling (Trend)

Temperature Loss

Seasonal Irradiance Shift

Snow / Dirt Accumulation

Tracker Stall

Data & Telemetry Integrity

Register Mismatch

Stale Data

Scaling Errors

Missing Channels

Duplicate Signals

Sensor Drift

Spikes / Noise

Timestamp Misalignment

CT/PT Misconfiguration

Performance & Economic Attribution

Performance Ratio (PR)

Availability Loss

Curtailment

Soiling Loss (Quantified)

Mismatch Loss

Clipping Loss (Quantified)

Signal Intelligence Index