英文:
Merge two or more tables checking the unique value using python
问题 {#heading}
以下是翻译好的部分:
我是Python的初学者。
我必须检查2或3个表格,看看是否有一些数据缺失。
这些数据是`pandas.core.frame.DataFrame`。
我有这些表格:
start end val accn fy fp form filed frame
0 2016-01-01 2016-12-31 90272000000 0001652044-19-000004 2018 FY 10-K 2019-02-05 CY2016
6 2017-01-01 2017-12-31 110855000000 0001652044-19-000004 2018 FY 10-K 2019-02-05 NaN
7 2017-01-01 2017-12-31 110855000000 0001652044-20-000008 2019 FY 10-K 2020-02-04 CY2017
18 2018-01-01 2018-12-31 136819000000 0001652044-19-000004 2018 FY 10-K 2019-02-05 NaN
19 2018-01-01 2018-12-31 136819000000 0001652044-20-000008 2019 FY 10-K 2020-02-04 CY2018
30 2019-01-01 2019-12-31 161857000000 0001652044-20-000008 2019 FY 10-K 2020-02-04 CY2019
41 2020-01-01 2020-12-31 182527000000 0001652044-23-000016 2022 FY 10-K 2023-02-03 CY2020
52 2021-01-01 2021-12-31 257637000000 0001652044-23-000016 2022 FY 10-K 2023-02-03 CY2021
61 2022-01-01 2022-12-31 282836000000 0001652044-23-000016 2022 FY 10-K 2023-02-03 CY2022
和
start end val accn fy fp form filed frame
0 2006-12-31 2007-12-29 39474000000 0001193125-10-036385 2009 FY 10-K 2010-02-22 CY2007
4 2007-12-30 2008-12-27 43251000000 0001193125-11-040427 2010 FY 10-K 2011-02-18 NaN
5 2007-12-30 2008-12-27 43251000000 0001193125-10-036385 2009 FY 10-K 2010-02-22 NaN
13 2008-12-28 2009-12-26 43232000000 0001193125-12-081822 2011 FY 10-K 2012-02-27 CY2009
15 2008-12-28 2009-12-26 43232000000 0001193125-11-040427 2010 FY 10-K 2011-02-18 NaN
16 2008-12-28 2009-12-26 43232000000 0001193125-10-036385 2009 FY 10-K 2010-02-22 NaN
27 2009-12-27 2010-12-25 57838000000 0001445305-13-000278 2012 FY 10-K 2013-02-21 CY2010
28 2009-12-27 2010-12-25 57838000000 0001193125-12-081822 2011 FY 10-K 2012-02-27 NaN
30 2009-12-27 2010-12-25 57838000000 0001193125-11-040427 2010 FY 10-K 2011-02-18 NaN
41 2010-12-26 2011-12-31 66504000000 0001445305-13-000278 2012 FY 10-K 2013-02-21 NaN
42 2010-12-26 2011-12-31 66504000000 0001193125-12-081822 2011 FY 10-K 2012-02-27 NaN
43 2010-12-26 2011-12-31 66504000000 0000077476-14-000007 2013 FY 10-K 2014-02-14 CY2011
54 2012-01-01 2012-12-29 65492000000 0001445305-13-000278 2012 FY 10-K 2013-02-21 NaN
56 2012-01-01 2012-12-29 65492000000 0000077476-15-000012 2014 FY 10-K 2015-02-12 NaN
57 2012-01-01 2012-12-29 65492000000 0000077476-14-000007 2013 FY 10-K 2014-02-14 NaN
68 2012-12-30 2013-12-28 66415000000 0000077476-16-000066 2015 FY 10-K 2016-02-11 CY2013
70 2012-12-30 2013-12-28 66415000000 0000077476-15-000012 2014 FY 10-K 2015-02-12 NaN
71 2012-12-30 2013-12-28 66415000000 0000077476-14-000007 2013 FY 10-K 2014-02-14 NaN
82 2013-12-29 2014-12-27 66683000000 0000077476-17-000010 2016 FY 10-K 2017-02-15 CY2014
83 2013-12-29 2014
\<details\>
\<summary\>英文:\</summary\>
I am a beginner with Python.
I have to check 2 or 3 tables and see if some data is missing.
The data are `pandas.core.frame.DataFrame`.
I have these tables:
start end val accn fy fp form filed frame
0 2016-01-01 2016-12-31 90272000000 0001652044-19-000004 2018 FY 10-K 2019-02-05 CY2016
6 2017-01-01 2017-12-31 110855000000 0001652044-19-000004 2018 FY 10-K 2019-02-05 NaN
7 2017-01-01 2017-12-31 110855000000 0001652044-20-000008 2019 FY 10-K 2020-02-04 CY2017
18 2018-01-01 2018-12-31 136819000000 0001652044-19-000004 2018 FY 10-K 2019-02-05 NaN
19 2018-01-01 2018-12-31 136819000000 0001652044-20-000008 2019 FY 10-K 2020-02-04 CY2018
30 2019-01-01 2019-12-31 161857000000 0001652044-20-000008 2019 FY 10-K 2020-02-04 CY2019
41 2020-01-01 2020-12-31 182527000000 0001652044-23-000016 2022 FY 10-K 2023-02-03 CY2020
52 2021-01-01 2021-12-31 257637000000 0001652044-23-000016 2022 FY 10-K 2023-02-03 CY2021
61 2022-01-01 2022-12-31 282836000000 0001652044-23-000016 2022 FY 10-K 2023-02-03 CY2022
and
start end val accn fy fp form filed frame
0 2006-12-31 2007-12-29 39474000000 0001193125-10-036385 2009 FY 10-K 2010-02-22 CY2007
4 2007-12-30 2008-12-27 43251000000 0001193125-11-040427 2010 FY 10-K 2011-02-18 NaN
5 2007-12-30 2008-12-27 43251000000 0001193125-10-036385 2009 FY 10-K 2010-02-22 NaN
13 2008-12-28 2009-12-26 43232000000 0001193125-12-081822 2011 FY 10-K 2012-02-27 CY2009
15 2008-12-28 2009-12-26 43232000000 0001193125-11-040427 2010 FY 10-K 2011-02-18 NaN
16 2008-12-28 2009-12-26 43232000000 0001193125-10-036385 2009 FY 10-K 2010-02-22 NaN
27 2009-12-27 2010-12-25 57838000000 0001445305-13-000278 2012 FY 10-K 2013-02-21 CY2010
28 2009-12-27 2010-12-25 57838000000 0001193125-12-081822 2011 FY 10-K 2012-02-27 NaN
30 2009-12-27 2010-12-25 57838000000 0001193125-11-040427 2010 FY 10-K 2011-02-18 NaN
41 2010-12-26 2011-12-31 66504000000 0001445305-13-000278 2012 FY 10-K 2013-02-21 NaN
42 2010-12-26 2011-12-31 66504000000 0001193125-12-081822 2011 FY 10-K 2012-02-27 NaN
43 2010-12-26 2011-12-31 66504000000 0000077476-14-000007 2013 FY 10-K 2014-02-14 CY2011
54 2012-01-01 2012-12-29 65492000000 0001445305-13-000278 2012 FY 10-K 2013-02-21 NaN
56 2012-01-01 2012-12-29 65492000000 0000077476-15-000012 2014 FY 10-K 2015-02-12 NaN
57 2012-01-01 2012-12-29 65492000000 0000077476-14-000007 2013 FY 10-K 2014-02-14 NaN
68 2012-12-30 2013-12-28 66415000000 0000077476-16-000066 2015 FY 10-K 2016-02-11 CY2013
70 2012-12-30 2013-12-28 66415000000 0000077476-15-000012 2014 FY 10-K 2015-02-12 NaN
71 2012-12-30 2013-12-28 66415000000 0000077476-14-000007 2013 FY 10-K 2014-02-14 NaN
82 2013-12-29 2014-12-27 66683000000 0000077476-17-000010 2016 FY 10-K 2017-02-15 CY2014
83 2013-12-29 2014-12-27 66683000000 0000077476-16-000066 2015 FY 10-K 2016-02-11 NaN
85 2013-12-29 2014-12-27 66683000000 0000077476-15-000012 2014 FY 10-K 2015-02-12 NaN
96 2014-12-28 2015-12-26 63056000000 0000077476-18-000012 2017 FY 10-K 2018-02-13 CY2015
97 2014-12-28 2015-12-26 63056000000 0000077476-17-000010 2016 FY 10-K 2017-02-15 NaN
98 2014-12-28 2015-12-26 63056000000 0000077476-16-000066 2015 FY 10-K 2016-02-11 NaN
109 2015-12-27 2016-12-31 62799000000 0000077476-18-000012 2017 FY 10-K 2018-02-13 CY2016
110 2015-12-27 2016-12-31 62799000000 0000077476-17-000010 2016 FY 10-K 2017-02-15 NaN
117 2017-01-01 2017-12-30 63525000000 0000077476-18-000012 2017 FY 10-K 2018-02-13 CY2017
So the target is to check and merge the data from all years.
So basically I have to check the 2 tables and create a new table with all CY Years.
For example the first table start from 2016, but the second from 2007. So I need to create a table with `val` and `frame` merged and unique values. The table basically should have all years found from an API.
Example result:
39474000000 2007
43232000000 2009
57838000000 2010
...
110855000000 2017
etc etc
There is a thing I have to check if both tables contain data for `the same year`(example `2017`), then I have to check which `val` is bigger and insert this in the new table.
How can I do an iter for loo year by year and check the val value between the two tables?
\</details\>
答案1
===
得分: 1
如果您想忽略`frame`列中包含`NaN`的行,您可以在应用`max`之前,简单地连接您过滤后的数据框并按`frame`进行分组:
```python
df1 = df1.dropna(subset='frame')[['val', 'frame']]
df2 = df2.dropna(subset='frame')[['val', 'frame']]
pd.concat([
df1[~df1['frame'].str.contains('Q')],
df2[~df2['frame'].str.contains('Q')]
]).groupby('frame').max()
</code></pre>
<p>输出:</p>
<pre tabindex="0" style="color:#f8f8f2;background-color:#272822;"><code><span style="display:flex;"><span> val
</span></span><span style="display:flex;"><span>frame
</span></span><span style="display:flex;"><span>CY2007 39474000000
</span></span><span style="display:flex;"><span>CY2009 43232000000
</span></span><span style="display:flex;"><span>CY2010 57838000000
</span></span><span style="display:flex;"><span>CY2011 66504000000
</span></span><span style="display:flex;"><span>CY2013 66415000000
</span></span><span style="display:flex;"><span>CY2014 66683000000
</span></span><span style="display:flex;"><span>CY2015 63056000000
</span></span><span style="display:flex;"><span>CY2016 90272000000
</span></span><span style="display:flex;"><span>CY2017 110855000000
</span></span><span style="display:flex;"><span>CY2018 136819000000
</span></span><span style="display:flex;"><span>CY2019 161857000000
</span></span><span style="display:flex;"><span>CY2020 182527000000
</span></span><span style="display:flex;"><span>CY2021 257637000000
</span></span><span style="display:flex;"><span>CY2022 282836000000
</span></span></code></pre>
<p>编辑:不包括<code>frame</code>单元格中包含<code>'Q'</code>的行。</p>
<details>
<summary>英文:</summary>
<p>If you want to ignore rows where <code>frame</code> is <code>NaN</code>, you can simply concat your filtered dfs and group by <code>frame</code> before applying <code>max</code>:</p>
<pre tabindex="0" style="color:#f8f8f2;background-color:#272822;"><code><span style="display:flex;"><span>df1 <span style="color:#f92672">=</span> df1<span style="color:#f92672">.</span>dropna(subset<span style="color:#f92672">=&</span><span style="color:#75715e">#39;frame&#39;)[[&#39;val&#39;, &#39;frame&#39;]]</span>
</span></span><span style="display:flex;"><span>df2 <span style="color:#f92672">=</span> df2<span style="color:#f92672">.</span>dropna(subset<span style="color:#f92672">=&</span><span style="color:#75715e">#39;frame&#39;)[[&#39;val&#39;, &#39;frame&#39;]]</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>pd<span style="color:#f92672">.</span>concat([
</span></span><span style="display:flex;"><span> df1[<span style="color:#f92672">~</span>df1[<span style="color:#f92672">&</span><span style="color:#75715e">#39;frame&#39;].str.contains(&#39;Q&#39;)], </span>
</span></span><span style="display:flex;"><span> df2[<span style="color:#f92672">~</span>df2[<span style="color:#f92672">&</span><span style="color:#75715e">#39;frame&#39;].str.contains(&#39;Q&#39;)]</span>
</span></span><span style="display:flex;"><span>])<span style="color:#f92672">.</span>groupby(<span style="color:#f92672">&</span><span style="color:#75715e">#39;frame&#39;).max()</span>
</span></span></code></pre>
<p>Output:</p>
<pre><code> val
frame
CY2007 39474000000
CY2009 43232000000
CY2010 57838000000
CY2011 66504000000
CY2013 66415000000
CY2014 66683000000
CY2015 63056000000
CY2016 90272000000
CY2017 110855000000
CY2018 136819000000
CY2019 161857000000
CY2020 182527000000
CY2021 257637000000
CY2022 282836000000
</code></pre>
<p>Edit: excluding <code>frame</code> cells containing <code>&#39;Q&#39;</code>.</p>
</details>
<p></p>
</div>
```