# KuCoin Normalizer - Guía de Implementación

Esta guía proporciona instrucciones paso a paso para implementar las validaciones críticas faltantes en el KuCoin normalizer.

## Información del Broker

- **Broker ID:** 173 (kucoin)
- **Formato:** JSON API
- **Source:** sync (API)
- **Assets:** crypto (perpetual futures)
- **Archivo principal:** `pipeline/p01_normalize/brokers/kucoin/kucoin.py`
- **Tests:** `pipeline/tests/brokers/test_kucoin.py`

## Estado Actual

### Implementación Nueva (404 líneas)
- **Match rate:** 100% (hash MD5 legacy-compatible)
- **Reducción de código:** 80% vs legacy (2,015 → 404 líneas)
- **Arquitectura:** Superior - separación correcta de concerns

### Validaciones Críticas Faltantes
1. ❌ Side validation estricta (CRÍTICO)
2. ❌ Required fields validation (CRÍTICO)
3. ❌ Type filter validation
4. ⚠️ Status filter validation (condicional)
5. ⚠️ Multiplier validation (parcial)
6. ❌ Quantity absolute + decimal
7. ❌ Price rounding

---

## FASE 1: Validaciones Críticas (1.5-2 días)

### Prerrequisitos

```bash
# Instalar dependencias
pip install polars pytest

# Activar entorno virtual si es necesario
source venv/bin/activate

# Verificar tests actuales pasan
pytest tests/brokers/test_kucoin.py -v
```

### Paso 1.1: Side Validation Estricta (0.5 días)

**Objetivo:** Rechazar sides inválidos/vacíos explícitamente

**Ubicación:** `brokers/kucoin/kucoin.py:123-130`

#### Código a Modificar

**BEFORE:**
```python
side_raw = order.get("side", "").lower()
side = "BUY" if side_raw == "buy" else "SELL"  # Default silencioso!
```

**AFTER:**
```python
# Validación estricta de side
side_raw = order.get("side", "").lower()

if not side_raw or side_raw not in ("buy", "sell"):
    logger.warning(
        f"Skipping order {order.get('tradeId', 'unknown')}: "
        f"invalid side '{side_raw}'"
    )
    continue  # Skip este order completamente

side = "BUY" if side_raw == "buy" else "SELL"
```

#### Tests a Añadir

**Ubicación:** `tests/brokers/test_kucoin.py`

```python
def test_side_validation_rejects_invalid(self):
    """Verifica que sides inválidos son rechazados"""
    interpreter = KucoinInterpreter()
    content = json.dumps({
        "orders": [{
            "tradeId": "test-invalid-side",
            "symbol": "XBTUSDTM",
            "side": "INVALID",
            "price": "50000",
            "size": "1",
            "value": "50000",
            "fee": "25",
            "createdAt": "1736276232000",
        }]
    })
    df = interpreter.parse_json_content(content)
    assert len(df) == 0

def test_side_validation_rejects_empty(self):
    """Verifica que sides vacíos son rechazados"""
    interpreter = KucoinInterpreter()
    content = json.dumps({
        "orders": [{
            "tradeId": "test-empty-side",
            "symbol": "XBTUSDTM",
            "side": "",
            "price": "50000",
            "size": "1",
            "value": "50000",
            "fee": "25",
            "createdAt": "1736276232000",
        }]
    })
    df = interpreter.parse_json_content(content)
    assert len(df) == 0
```

#### Verificación

```bash
# Run tests específicos
pytest tests/brokers/test_kucoin.py::TestKucoinInterpreter::test_side_validation_rejects_invalid -v
pytest tests/brokers/test_kucoin.py::TestKucoinInterpreter::test_side_validation_rejects_empty -v

# Verificar que tests existentes siguen pasando
pytest tests/brokers/test_kucoin.py -v
```

---

### Paso 1.2: Required Fields Validation (0.5 días)

**Objetivo:** Validar campos requeridos (tradeId, symbol, side, price, size)

**Ubicación:** `brokers/kucoin/kucoin.py` después de línea 122

#### Código a Añadir

```python
for row_idx, order in enumerate(orders):
    # Extraer campos requeridos
    trade_id = order.get("tradeId", "")
    symbol = order.get("symbol", "")
    side = order.get("side", "")
    price = order.get("price", 0)
    size = order.get("size", 0)

    # Validar tradeId
    if not trade_id:
        logger.warning("Skipping order: missing tradeId")
        continue

    # Validar symbol
    if not symbol:
        logger.warning(f"Skipping order {trade_id}: missing symbol")
        continue

    # Validar side
    if not side:
        logger.warning(f"Skipping order {trade_id}: missing side")
        continue

    # Validar price > 0 y numérico
    try:
        price_float = float(price or 0)
        if price_float <= 0:
            logger.warning(
                f"Skipping order {trade_id}: invalid price {price_float}"
            )
            continue
    except (ValueError, TypeError):
        logger.warning(f"Skipping order {trade_id}: non-numeric price")
        continue

    # Validar size > 0 y numérico
    try:
        size_float = float(size or 0)
        if size_float <= 0:
            logger.warning(
                f"Skipping order {trade_id}: invalid size {size_float}"
            )
            continue
    except (ValueError, TypeError):
        logger.warning(f"Skipping order {trade_id}: non-numeric size")
        continue

    # Continuar con procesamiento normal...
```

#### Tests a Añadir

```python
class TestRequiredFieldsValidation:
    """Tests for required fields validation"""

    def test_required_fields_missing_trade_id(self):
        interpreter = KucoinInterpreter()
        content = json.dumps({
            "orders": [{"tradeId": "", "symbol": "XBTUSDTM", "side": "buy", "price": "50000", "size": "1"}]
        })
        df = interpreter.parse_json_content(content)
        assert len(df) == 0

    def test_required_fields_missing_symbol(self):
        interpreter = KucoinInterpreter()
        content = json.dumps({
            "orders": [{"tradeId": "test1", "symbol": "", "side": "buy", "price": "50000", "size": "1"}]
        })
        df = interpreter.parse_json_content(content)
        assert len(df) == 0

    def test_required_fields_zero_price(self):
        interpreter = KucoinInterpreter()
        content = json.dumps({
            "orders": [{"tradeId": "test1", "symbol": "XBTUSDTM", "side": "buy", "price": 0, "size": "1"}]
        })
        df = interpreter.parse_json_content(content)
        assert len(df) == 0

    def test_required_fields_zero_size(self):
        interpreter = KucoinInterpreter()
        content = json.dumps({
            "orders": [{"tradeId": "test1", "symbol": "XBTUSDTM", "side": "buy", "price": "50000", "size": 0}]
        })
        df = interpreter.parse_json_content(content)
        assert len(df) == 0
```

#### Verificación

```bash
pytest tests/brokers/test_kucoin.py::TestRequiredFieldsValidation -v
```

---

### Paso 1.3: Type Filter Validation (0.25 días)

**Objetivo:** Rechazar órdenes no-trade (TRANSFER, LIQUIDATION, FUNDING)

**Ubicación:** `brokers/kucoin/kucoin.py` después de validación de campos

#### Código a Añadir

```python
# Filtrar por tipo de orden
trade_type = order.get("tradeType", "").upper()
if trade_type and trade_type not in ("TRADE", "BUSTTRADE"):
    logger.debug(
        f"Skipping order {trade_id}: non-trade type '{trade_type}'"
    )
    continue
```

#### Tests a Añadir

```python
class TestTypeFilterValidation:
    """Tests for trade type filtering"""

    def test_type_filter_rejects_transfer(self):
        interpreter = KucoinInterpreter()
        content = json.dumps({
            "orders": [{
                "tradeId": "test1", "symbol": "XBTUSDTM", "side": "buy",
                "price": "50000", "size": "1", "tradeType": "TRANSFER",
                "value": "50000", "fee": "0", "createdAt": "1736276232000"
            }]
        })
        df = interpreter.parse_json_content(content)
        assert len(df) == 0

    def test_type_filter_accepts_trade(self):
        interpreter = KucoinInterpreter()
        content = json.dumps({
            "orders": [{
                "tradeId": "test1", "symbol": "XBTUSDTM", "side": "buy",
                "price": "50000", "size": "1", "tradeType": "TRADE",
                "value": "50000", "fee": "25", "createdAt": "1736276232000"
            }]
        })
        df = interpreter.parse_json_content(content)
        assert len(df) == 1

    def test_type_filter_accepts_busttrade(self):
        interpreter = KucoinInterpreter()
        content = json.dumps({
            "orders": [{
                "tradeId": "test1", "symbol": "XBTUSDTM", "side": "sell",
                "price": "50000", "size": "1", "tradeType": "BUSTTRADE",
                "value": "50000", "fee": "25", "createdAt": "1736276232000"
            }]
        })
        df = interpreter.parse_json_content(content)
        assert len(df) == 1
```

---

### Paso 1.4: Status Filter Validation (0.25 días - CONDICIONAL)

**Objetivo:** Verificar si API devuelve campo status y filtrar órdenes incomplete

#### Investigación Requerida

1. **Revisar sample data:**
```python
# En un notebook o script temporal
import json

# Cargar sample orders del JSON actual
with open("sample_kucoin_orders.json") as f:
    data = json.load(f)

# Inspeccionar fields
for order in data["orders"][:10]:
    print(order.keys())
    if "status" in order:
        print(f"Found status: {order['status']}")
```

2. **Decisión:**
- Si NO existe campo `status`: **OMITIR validación**
- Si existe pero siempre "COMPLETED": **OMITIR validación**
- Si existe con valores variados: **IMPLEMENTAR filtro**

#### Si Implementar (condicional):

```python
# Solo si API devuelve campo status
status = order.get("status", "").upper()
if status and status not in ("COMPLETED", "DONE", "FILLED"):
    logger.debug(
        f"Skipping order {trade_id}: incomplete status '{status}'"
    )
    continue
```

---

## FASE 2: Calidad de Datos (1-1.5 días)

### Paso 2.1: Multiplier Validation (0.5 días)

**Objetivo:** Validar multiplier está en rango razonable

**Ubicación:** `brokers/kucoin/kucoin.py:300-306`

#### Código a Modificar

**BEFORE:**
```python
pl.when((pl.col("price") > 0) & (pl.col("size") > 0))
.then(pl.col("value") / (pl.col("price") * pl.col("size")))
.otherwise(pl.lit(1.0))
.alias("multiplier"),
```

**AFTER:**
```python
pl.when(
    (pl.col("price") > 0) &
    (pl.col("size") > 0) &
    (pl.col("value") > 0)
)
.then(
    pl.when(
        (pl.col("value") / (pl.col("price") * pl.col("size")))
        .is_between(0.01, 1000000)
    )
    .then(pl.col("value") / (pl.col("price") * pl.col("size")))
    .otherwise(pl.lit(1.0))  # Fallback para valores anómalos
)
.otherwise(pl.lit(1.0))
.alias("multiplier"),
```

#### Tests a Añadir

```python
class TestMultiplierValidation:
    """Tests for multiplier validation"""

    def test_multiplier_validation_normal(self):
        interpreter = KucoinInterpreter()
        content = json.dumps({
            "orders": [{
                "tradeId": "test1", "symbol": "XBTUSDTM", "side": "buy",
                "price": "50000", "size": "1", "value": "50000",
                "fee": "25", "createdAt": "1736276232000"
            }]
        })
        df = interpreter.parse_json_content(content)
        result = interpreter.normalize(df.lazy(), user_id=1, account_id="test").collect()
        assert result["multiplier"][0] == 1.0

    def test_multiplier_validation_anomalous(self):
        interpreter = KucoinInterpreter()
        content = json.dumps({
            "orders": [{
                "tradeId": "test1", "symbol": "XBTUSDTM", "side": "buy",
                "price": "1", "size": "1", "value": "999999999",
                "fee": "25", "createdAt": "1736276232000"
            }]
        })
        df = interpreter.parse_json_content(content)
        result = interpreter.normalize(df.lazy(), user_id=1, account_id="test").collect()
        assert result["multiplier"][0] == 1.0  # Fallback
```

---

### Paso 2.2: Quantity Absolute + Decimal (0.25 días)

**Objetivo:** Aplicar abs() y round(8) a quantity

**Ubicación:** `brokers/kucoin/kucoin.py:262`

#### Código a Modificar

**BEFORE:**
```python
pl.col("size").alias("quantity"),
```

**AFTER:**
```python
pl.col("size").abs().round(8).alias("quantity"),  # 8 decimales para crypto
```

#### Tests a Añadir

```python
def test_quantity_absolute_value_negative(self):
    interpreter = KucoinInterpreter()
    content = json.dumps({
        "orders": [{
            "tradeId": "test1", "symbol": "XBTUSDTM", "side": "sell",
            "price": "50000", "size": "-1.5", "value": "75000",
            "fee": "25", "createdAt": "1736276232000"
        }]
    })
    df = interpreter.parse_json_content(content)
    result = interpreter.normalize(df.lazy(), user_id=1, account_id="test").collect()
    assert result["quantity"][0] == 1.5

def test_quantity_decimal_limit(self):
    interpreter = KucoinInterpreter()
    content = json.dumps({
        "orders": [{
            "tradeId": "test1", "symbol": "XBTUSDTM", "side": "buy",
            "price": "50000", "size": "1.123456789012345", "value": "56172.84",
            "fee": "25", "createdAt": "1736276232000"
        }]
    })
    df = interpreter.parse_json_content(content)
    result = interpreter.normalize(df.lazy(), user_id=1, account_id="test").collect()
    assert abs(result["quantity"][0] - 1.12345679) < 0.00000001
```

---

### Paso 2.3: Price Rounding (0.25 días)

**Objetivo:** Aplicar round(8) a price

**Ubicación:** `brokers/kucoin/kucoin.py:265`

#### Código a Modificar

**BEFORE:**
```python
pl.col("price").alias("price"),
```

**AFTER:**
```python
pl.col("price").round(8).alias("price"),  # 8 decimales para crypto
```

#### Tests a Añadir

```python
def test_price_rounding(self):
    interpreter = KucoinInterpreter()
    content = json.dumps({
        "orders": [{
            "tradeId": "test1", "symbol": "XBTUSDTM", "side": "buy",
            "price": "50000.123456789012345", "size": "1", "value": "50000.12",
            "fee": "25", "createdAt": "1736276232000"
        }]
    })
    df = interpreter.parse_json_content(content)
    result = interpreter.normalize(df.lazy(), user_id=1, account_id="test").collect()
    assert abs(result["price"][0] - 50000.12345679) < 0.00000001
```

---

## Verificación Final

### Tests Completos

```bash
# Run todos los tests
pytest tests/brokers/test_kucoin.py -v

# Run con coverage
pytest tests/brokers/test_kucoin.py --cov=pipeline.p01_normalize.brokers.kucoin --cov-report=html

# Run tests de validación específicos
pytest tests/brokers/test_kucoin.py -v -k "validation"
```

### Hash Compatibility

```python
# Verificar que hash formula no cambió
def test_hash_compatibility_maintained():
    """Verificar 100% match con legacy"""
    interpreter = KucoinInterpreter()

    # Sample order con tradeId conocido
    content = json.dumps({
        "orders": [{
            "tradeId": "known-trade-id-123",
            "symbol": "XBTUSDTM",
            "side": "buy",
            "price": "50000",
            "size": "1",
            "value": "50000",
            "fee": "25",
            "createdAt": "1736276232000",
        }]
    })

    df = interpreter.parse_json_content(content)
    result = interpreter.normalize(df.lazy(), user_id=1, account_id="test").collect()

    # Hash esperado (calcular manualmente)
    import hashlib
    expected_hash = hashlib.md5(
        json.dumps("known-trade-id-123").encode('utf-8')
    ).hexdigest()

    assert result["file_row"][0] == expected_hash
```

### Performance Testing

```bash
# Test con dataset grande
python -m pytest tests/brokers/test_kucoin.py --benchmark-only
```

---

## Troubleshooting

### Issue: Tests fallan después de cambios

**Solución:**
1. Verificar que imports están correctos
2. Verificar que logger está configurado
3. Run tests individuales para identificar issue

```bash
pytest tests/brokers/test_kucoin.py::TestKucoinInterpreter::test_specific -v -s
```

### Issue: Hash mismatch con legacy

**Solución:**
1. Verificar orden de keys en legacy_order dict
2. Verificar que json.dumps no cambia orden
3. Comparar con old_code_from_legacy/kucoin_export.py

### Issue: Performance degradation

**Solución:**
1. Verificar que validaciones usan continue (no levantan exceptions)
2. Profiling con cProfile
3. Considerar batch processing para datasets grandes

---

## Checklist de Implementación

### Fase 1
- [ ] Side validation estricta
- [ ] Required fields validation
- [ ] Type filter validation
- [ ] Status filter (investigar primero)
- [ ] Tests para Fase 1 (15 tests)
- [ ] Verificar tests existentes pasan
- [ ] Verificar hash compatibility

### Fase 2
- [ ] Multiplier validation
- [ ] Quantity absolute + decimal
- [ ] Price rounding
- [ ] Tests para Fase 2 (8 tests)
- [ ] Verificar match rate mantenido
- [ ] Performance testing

### Deployment
- [ ] Full test suite passes
- [ ] Code review
- [ ] Documentation actualizada
- [ ] Merge to main
- [ ] Monitor production logs

---

**Última actualización:** 2026-01-14
**Maintainer:** Pipeline Team
