��� 1 ������ ���������������������.

Information, entropy
and all that
지동표
울산과기대, 서울대
Event의 값어치
• 확률 𝑝로 일어나는 사건이 관찰되었을 때 이 관찰(정보)의 값어치는 (?)
• 희귀할수록 값어치가 커야 함.
1
• Is it 𝑝?
• 서로 독립인 두 사건이 확률 𝑝, 𝑞로 동시에 일어날 때의 값어치
=
1
(?)
𝑝𝑝
= 각각의 값어치의 곱
• 전체의 값어치는 각각의 값어치의 합이 되는 것이 정서에 맞음
• How about 값어치 = − log 𝑝?
Then − log 𝑝𝑝 = − log 𝑝 + (− log 𝑞)
O.K.
Event의 값어치
• 어떤 확률 분포를 갖는 정보의 평균 값어치 = − ∑𝑖∈𝑥 𝑝𝑖 𝑙𝑙𝑙 𝑝𝑝
• Measure of uncertainty = measure of information
어떤 사건을 관찰하기 전에는 uncertainty
그러나 관찰되고 난 후에는 information obtained
• Entropy 𝐻 𝑥 ≔ − ∑𝑖∈𝑥 𝑝𝑖 log 𝑝𝑝
(uncertainty의 평균 또한 정보의 평균)
Event의 값어치
• Joint entropy 𝐻 𝑋, 𝑌 = − ∑ ∑ 𝑝 𝑥, 𝑦 log 𝑝 𝑥, 𝑦
• Conditional entropy
𝐻 𝑌 𝑋 = � 𝑝 𝑥 𝐻 𝑌 𝑋 = 𝑥 = � � −𝑝 𝑥, 𝑦 log 𝑝(𝑦|𝑥)
(𝑋를 알고 있을 때 𝑌에 대한 uncertainty)
• 𝐻 𝑋, 𝑌 = 𝐻 𝑋 + 𝐻 𝑌 𝑋
전체의 uncertainty는 𝑋의 uncertainty + 𝑋를 알았을 때 𝑌에 대한 uncertainty
𝑝(𝑥,𝑦)
• 𝐼 𝑋: 𝑌 = ∑ ∑ 𝑝 𝑥, 𝑦 𝑙𝑙𝑙 𝑝 𝑥 𝑝(𝑦) = 𝐻 𝑋 − 𝐻(𝑋|𝑌)
(mutual information) 𝑌를 관찰해서 얻은 𝑋에 대한 정보
note: 𝐼 𝑋: 𝑌 = 𝐼 𝑌: 𝑋 = 𝐻 𝑋 + 𝐻 𝑌 − 𝐻(𝑋, 𝑌)
Asymptotic Equipartition Property (AEP)
• Empirical average
1
𝑓 𝑥1 + ⋯ 𝑓 𝑥𝑛
𝑛
𝑥𝑖 는 independent sampling of (𝑋, 𝑃)
• True average ∑ 𝑓 𝑥 𝑝(𝑥)
• Th (Weak Theorem of Large numbers)
empirical average
true average
𝑛→∞
(in probability)
• 𝑓 𝑥 = − log 𝑝 (𝑥) 라 놓으면
1
empirical average = log 𝑝 𝑥1 ⋯ 𝑥𝑛
𝑛
true average = H(X)
Asymptotic Equipartition Property (AEP)
• 즉, 𝑝 𝑥1 ⋯ 𝑥𝑛 ≈ 2−𝑛𝑛(𝑋) (거의 모든 경우에)
(typical event라 함)
이런 사건 𝑥1 ⋯ 𝑥𝑛 개수 ≈ 2𝑛𝑛(𝑋)
Typical event
≈ 2𝑛𝑛(𝑋)
그러나 확률적으로는 거의 전부 (≈ 1)
|𝑋|𝑛 = 2𝑛𝑛𝑛𝑛2|𝑋|
𝑋 = (𝑋의 크기)
Shannon의 source coding theorem (Data
compression)
• Data 𝑋 𝑛 을 𝑛𝑛(𝑋) bit으로 압축할 수 있다. (optimal)
(idea: generic element만 살리고 나머지는 버려라)
실제로 LZW, MP3, MPEG, JPEG 등
• Channel
X
𝑛𝑜𝑜𝑜𝑜
𝑙𝑜𝑜𝑜
Y
𝑝 𝑏 𝑎 (𝑎를 보낼 때 𝑏가 나올 확률)가 channel을 characterize
Shannon의 noisy channel theorem
Fuzzy 2𝑛𝑛(𝑌|𝑋)
• 따라서 약
2𝑛𝑛(𝑋)
2𝑛𝑛(𝑌)
만을
2𝑛𝑛(𝑌|𝑋)
2𝑛𝑛(𝑌)
잘 보낼 수 있다.
2𝑛𝑛(𝑌)
= 2𝑛(𝐻
𝑛𝑛(𝑌|𝑋)
2
𝑌 −𝐻(𝑌|𝑋))
= 2𝑛(𝐻
𝑌 −𝐻 𝑋,𝑌 −𝐻(𝑋))
= 2𝑛𝑛(𝑋:𝑌)
Shannon의 noisy channel theorem
• 𝐶 ≔ 𝑚𝑚𝑚𝑝 𝑥 𝐼 𝑋: 𝑌
그러면 𝐶 − 𝜀 rate로 거의 error 없이 보낼 수 있고 𝐶 + 𝜀 rate는 많은
error가 생김.
(구체적인 coding도 많이 만들어지고 있다)
Slepian- Wolf의 distributed source coding
theorem
• 𝐻 𝑋 𝑌 (= 𝐻 𝑋, 𝑌 − 𝐻(𝑌))를 prior information을 𝑌를 가졌지만 아직
𝑋에 대하여 잘 모르는 량 (partial information)
• Th (Slepian-Wolf)
𝑋가 𝑌에게 H(𝑋|𝑌)만큼 정보를 주어 𝑌가 𝑋에 대하여 전부 알 게 할 수
있다. (구체적인 protocol)
𝑛𝑛
2
law of thermodynamics
• Clausius: No process is possible whose sole result is the transfer of heat
from a colder to a hotter body
• Kelvin: No process possible whose sole result is the complete conversion of
heat into work
• Carnot: Of all the heat engines working between given temperatures, none is
more efficient than a Carnot engine
Clausius’ theorem
• For any close cycle, ∮
𝑑𝑑
𝑇
≤0
with equality necessarily holds for a reversible cycle.
Maxwell’s demon
빠른 놈만
열역학 제 2 법칙이 break!
Szillard, Landauer, Bennet, Llyod, Sugawa ,etc
exorcise the Maxwell demon
• Information theoretic entropy 와 thermodynamic entropy의 합에 열역학
제 2법칙을 적용
•
•
•
•
•
Information is physical!
Physics is information!
Forgetting increases the entropy of the universe
가상세계, 현실 세계
Are they different?
Quantum Information Theory
• Shannon entropy
𝐻 𝑋 = − ∑ 𝑝𝑖𝑙𝑙𝑙𝑙𝑖
bit (0/1)
von Neumann entropy
𝑆 𝜌 = −𝑡𝑡 𝜌𝑙𝑙𝑙𝜌
qubit (|0>,|1>)
far more subtle than classical
• 고전정보와 양자정보 사이의 관계
• Measurement problem
• 새로운 물리량 entanglement (ebit)
• Conditional entropy
𝑆 𝐴 𝐵 = 𝑆 𝐴𝐴 − 𝑆(𝐵)
Quantum Information Theory
• 𝑆(𝐴|𝐵) could be negative in quantum information!
• Negative information exists in quantum world.
• 𝑆(𝐴|𝐵)의 operational meaning
(고전정보에서는 Slepian – Wolf th)
• When 𝑆(𝐴|𝐵) negative,
• Bob knows too much
• If I tell you, you’ll know less.
• But I and Bob would share maximally entangled states(−𝑆(𝐴|𝐵)만큼) which could be used
in teleportation latter.
Quantum version of Slepian-Wolf th
(Quantum State Merging
𝑆(𝐴|𝐵)의 operational meaning)
𝜌𝐴𝐴 : given
• When 𝑆 𝐴 𝐵 > 0, the merging is possible if only if R > 𝑆(𝐴|𝐵) ebits per
input copy are provided.
• When 𝑆 𝐴 𝐵 < 0, the merging is possible by LOCC and moreover
R < −𝑆(𝐴|𝐵) ebits obtained per input copy.
Quantum version of Slepian-Wolf th
(Quantum State Merging
𝑆(𝐴|𝐵)의 operational meaning)
• Remark: −𝑆 𝐴 𝐵 = 𝑆 𝐵 − 𝑆 𝐴𝐴 =: 𝐼(𝐴 > 𝐵)
called coherent information
• Previously known (Shor 등)
channel capacity of quantum channel
= 𝑚𝑚𝑚𝜌𝐴 𝐼(𝐴 > 𝐵)
Many Applications of State Merging
1) Distributed Compression
:given 𝜌𝐴1⋯𝐴𝑚
:object: each party compress their share so that full state can be reconstructed by a
single decoder.
: 𝑅1 ⋯ 𝑅𝑚 achievable if ∃ (𝑚 + 1) party LOCC taking 𝜌𝐴⊗𝑛⋯𝐴 , whose
1
𝑚
⊗𝑛
purification is 𝜓𝑅𝑅 ⋯𝐴 , and 𝑛(𝑅𝑅 + 𝜀) ebits between 𝐴𝑖 and decoder 𝐵. obtain the
1
𝑚
final state 𝜌𝑅𝑛𝐵1′⋯𝐵𝑚′ with 𝐹 𝜌, 𝜓 ⊗𝑛 ≥ 1 − 𝜀.
Many Applications of State Merging
2) Quantum Source Coding with side Information at the decoder:
the decoder decode the state of Alice, while Bob’s state only used to help in decoding
3) Multipartite Entanglement of Assistance: pure state shared by many parties, and the goal is
to distill the maximal amount of entanglement between two parties.
4) Capacity Region for the multiple access channel: two share a quantum channel, find optimal rate.
5) Solution to the long standing problem on negative coherent information.
5) Strong Subadditivity
𝑆(𝐴|𝐵𝐵) ≤ 𝑆(𝐴|𝐵)
If Bob has access to an additional registers, then Alice doesn’t need to send more partial information for him to
get full state 𝜌𝐴𝐴 .
Stronger version of merging protocol
• Called Fully Quantum Slepian – Wolf protocol
(or Mother protocol)
1
< 𝑊 𝑆→𝐴𝐴 : 𝜑 𝑠 > + I(A:R)𝜑 𝑞 → 𝑞
1
𝐼
2
2
≥
𝐴: 𝐵 𝜑 𝑞𝑞 +< 𝑖𝑖 𝑠→𝐵 : 𝜑𝑠 >
note that 𝑞𝑞 + 2 𝑐 → 𝑐 = [𝑞 → 𝑞](teleportation)
Hence we obtain
< 𝑊 𝑆→𝐴𝐴 : 𝜑 𝑠 > +𝑆(AB)𝜑 𝑞 → 𝑞 + 𝐼 𝐴: 𝐵 𝜑 𝑐 → 𝑐𝑐
≥< 𝑖𝑖 𝑠→𝐵 : 𝜑𝑠 >
Thank you