t_wの輪郭

Feedlyでフォローするボタン
Sentence Embeddingコサイン類似度微分コサイン類似度を目標値に近づける

あれ

2023/7/25 9:39:00

sentence embeddingの学習のために、コサイン類似度(cos(X,Y) - target_similarity)^2 の微分がしたいが、もはや微分は記憶の彼方。

コサイン類似度からなる損失関数の微分コサイン類似度を目標値に近づける勾配の微分の計算過程(できてない)全微分ですか?それともどっちかの偏微分?

数値微分と結果が一致することを確認できた


コサイン類似度: \( cos(X, Y) = \frac{\sum_{i=1}^{N}(X_i Y_i)}{\sqrt{\sum_{i=1}^{N}(X_i^2)} \sqrt{\sum_{i=1}^{N}(Y_i^2)}} \)

目標とする類似度: \(t\)

コサイン類似度からなる損失関数: \( L(X, Y, t) = (t- cos(X, Y))^2 \)

より

コサイン類似度からなる損失関数を\(X_i\)について偏微分した式

$$ \frac{\partial L(X, Y, t)}{\partial X_i} = \frac{\partial(t-cos(X,Y))^2}{\partial X_i} = \frac{\partial(t-cos(X,Y))^2}{\partial (t-cos(X,Y))} \frac{\partial(t-cos(X,Y))}{\partial cos(X,Y)} \frac{\partial cos(X,Y)}{\partial X_i} $$

を求める。


$$ \frac{\partial(t-cos(X,Y))^2}{\partial (t-cos(X,Y))} = 2(t-cos(X,Y))$$
$$ \frac{\partial(t-cos(X,Y))}{\partial cos(X,Y)} = -1 $$
$$ \frac{\partial cos(X,Y)}{\partial X_i} = \frac{\partial \frac{\sum_{i=1}^{N}(X_i Y_i)}{\sqrt{\sum_{i=1}^{N}(X_i^2)} \sqrt{\sum_{i=1}^{N}(Y_i^2)}}}{\partial X_i} $$
$$ = \frac{\partial (\sum_{i=1}^{N}(X_i Y_i) ({\sum_{i=1}^{N}(X_i^2))^{-\frac{1}{2}} (\sum_{i=1}^{N}(Y_i^2))^{-\frac{1}{2}}})}{\partial X_i} $$
$$ = \frac{\partial (\sum_{i=1}^{N}(X_i Y_i))}{\partial X_i} (\sum_{i=1}^{N}(X_i^2))^{-\frac{1}{2}} (\sum_{i=1}^{N}(Y_i^2))^{-\frac{1}{2}} + \sum_{i=1}^{N}(X_i Y_i) \frac{\partial ((\sum_{i=1}^{N}(X_i^2))^{-\frac{1}{2}})}{\partial X_i} (\sum_{i=1}^{N}(Y_i^2))^{-\frac{1}{2}} $$
$$ = Y_i (\sum_{i=1}^{N}(X_i^2))^{-\frac{1}{2}} (\sum_{i=1}^{N}(Y_i^2))^{-\frac{1}{2}} + \sum_{i=1}^{N}(X_i Y_i) \frac{\partial (\sum_{i=1}^{N}(X_i^2))^{-\frac{1}{2}}}{\partial (\sum_{i=1}^{N}(X_i^2))} \frac{\partial (\sum_{i=1}^{N}(X_i^2))}{\partial X_i} (\sum_{i=1}^{N}(Y_i^2))^{-\frac{1}{2}} $$
$$ = Y_i (\sum_{i=1}^{N}(X_i^2))^{-\frac{1}{2}} (\sum_{i=1}^{N}(Y_i^2))^{-\frac{1}{2}} + \sum_{i=1}^{N}(X_i Y_i) (-\frac{1}{2} (\sum_{i=1}^{N}(X_i^2))^{-\frac{3}{2}} 2X_i) (\sum_{i=1}^{N}(Y_i^2))^{-\frac{1}{2}} $$
$$ = Y_i (\sum_{i=1}^{N}(X_i^2))^{-\frac{1}{2}} (\sum_{i=1}^{N}(Y_i^2))^{-\frac{1}{2}} - X_i \sum_{i=1}^{N}(X_i Y_i) (\sum_{i=1}^{N}(X_i^2))^{-\frac{3}{2}} (\sum_{i=1}^{N}(Y_i^2))^{-\frac{1}{2}} $$
$$ = (\sum_{i=1}^{N}(Y_i^2))^{-\frac{1}{2}} (Y_i (\sum_{i=1}^{N}(X_i^2))^{-\frac{1}{2}} - X_i \sum_{i=1}^{N}(X_i Y_i) (\sum_{i=1}^{N}(X_i^2))^{-\frac{3}{2}}) $$

より

$$ \frac{\partial L(X, Y, t)}{\partial X_i} = 2(t-cos(X,Y)) (-1) ((\sum_{i=1}^{N}(Y_i^2))^{-\frac{1}{2}} (Y_i (\sum_{i=1}^{N}(X_i^2))^{-\frac{1}{2}} - X_i \sum_{i=1}^{N}(X_i Y_i) (\sum_{i=1}^{N}(X_i^2))^{-\frac{3}{2}}) $$
$$ = -2(t-cos(X,Y)) ((\sum_{i=1}^{N}(Y_i^2))^{-\frac{1}{2}} (Y_i (\sum_{i=1}^{N}(X_i^2))^{-\frac{1}{2}} - X_i \sum_{i=1}^{N}(X_i Y_i) (\sum_{i=1}^{N}(X_i^2))^{-\frac{3}{2}}) $$