Login

Python | Split String and Remove newline

<div>
<div class="kk-star-ratings kksr-auto kksr-align-left kksr-valign-top" data-payload='{"align":"left","id":"982999","slug":"default","valign":"top","ignore":"","reference":"auto","class":"","count":"0","legendonly":"","readonly":"","score":"0","best":"5","gap":"5","greet":"Rate this post","legend":"0\/5 - (0 votes)","size":"24","width":"0","_legend":"{score}\/{best} - ({count} {votes})","font_factor":"1.25"}'>
<div class="kksr-stars">
<div class="kksr-stars-inactive">
<div class="kksr-star" data-star="1" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="2" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="3" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="4" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="5" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
<div class="kksr-stars-active" style="width: 0px;">
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
</div>
<div class="kksr-legend" style="font-size: 19.2px;"> <span class="kksr-muted">Rate this post</span> </div>
</div>
<p class="has-global-color-8-background-color has-background"><strong>Summary: </strong>The simplest way to split a string and remove the newline characters is to use a list comprehension with a if condition that eliminates the newline strings. </p>
<h3><strong>Minimal Example</strong></h3>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">text = '\n-hello\n-Finxter'
words = text.split('-') # Method 1
res = [x.strip('\n') for x in words if x!='\n']
print(res) # Method 2
li = list(map(str.strip, words))
res = list(filter(bool, li))
print(res) # Method 3
import re
words = re.findall('([^-\s]+)', text)
print(words) # ['hello', 'Finxter']</pre>
<h2>Problem Formulation</h2>
<p class="has-global-color-8-background-color has-background"><strong>Problem: </strong>Say you use the split function to split a string on all occurrences of a certain pattern. If the pattern appears at the beginning, in between, or at the end of the string along with a newline character, the resulting split list will contain newline strings along with the required substrings. How to get rid of the newline character strings automatically?</p>
<h3><strong>Example</strong></h3>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">text = '\n\tabc\n\txyz\n\tlmn\n'
words = text.split('\t') # ['\n', 'abc\n', 'xyz\n', 'lmn\n']</pre>
<p>Note the empty strings in the resulting list.</p>
<p><strong>Expected Output:</strong></p>
<pre class="wp-block-code"><code>['abc', 'xyz', 'lmn']</code></pre>
<h2>Method 1: Use a List Comprehension</h2>
<p>The trivial solution to this problem is to <strong>remove all newline strings</strong> from the resulting list using <a rel="noreferrer noopener" href="https://blog.finxter.com/list-comprehension/" target="_blank"><strong>list comprehension with a condition</strong></a> such as <code>[x.strip('\n') for x in words if x!='\n']</code> to <a href="https://blog.finxter.com/python-filter/">filter </a>out the newline strings. To be specific, the strip function in the expression allows you to get rid of the newline characters from the items, while the if condition allows you to eliminate any independently occurring newline character.</p>
<p><strong>Code:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">text = '\n\tabc\n\txyz\n\tlmn\n'
words = text.split('\t')
res = [x.strip('\n') for x in words if x!='\n']
print(res) # ['abc', 'xyz', 'lmn']</pre>
<h2>Method 2: Use a map and filter</h2>
<p><strong>Prerequisite</strong></p>
<ul>
<li>The <code>map()</code> function transforms one or more iterables into a new one by applying a “transformator function” to the i-th elements of each iterable. The arguments are the<em> transformator function object</em> and <em>one or more iterables</em>. If you pass <strong><em>n</em> iterables</strong> as arguments, the transformator function must be an <strong><em>n</em>-ary function</strong> taking <strong><em>n</em></strong> input arguments. The return value is an iterable map object of transformed, and possibly aggregated, elements.</li>
<li>Python’s built-in <code>filter()</code> function is used to filter out elements that pass a filtering condition. It takes two arguments: <code>function</code> and <code>iterable</code>. The <code>function</code> assigns a Boolean value to each element in the <code>iterable</code> to check whether the element will pass the filter or not. It returns an iterator with the elements that pass the filtering condition.</li>
</ul>
<p class="has-base-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f30e.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /><strong>Related Read: <br />(i) <a rel="noreferrer noopener" href="https://blog.finxter.com/python-map/" target="_blank">Python map()</a></strong> <strong><br />(ii) <a rel="noreferrer noopener" href="https://blog.finxter.com/python-filter/" target="_blank">Python filter()</a></strong></p>
<p><strong>Approach: </strong>An alternative solution is to <strong>remove all newline strings</strong> from the resulting list using <code>map()</code> to first get rid of the newline characters attached to each item of the returned list and then using the <strong><code><a rel="noreferrer noopener" href="https://blog.finxter.com/python-filter/" target="_blank">filter()</a></code></strong> function such as <code>filter(bool, words)</code> to <a href="https://blog.finxter.com/python-filter/">filter </a>out any empty string <code>''</code> and other elements that evaluate to <code>False</code> such as <code>None</code>.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">text = '\n\tabc\n\txyz\n\tlmn\n'
words = text.split('\t')
li = list(map(str.strip, words))
res = list(filter(bool, li))
print(res) # ['abc', 'xyz', 'lmn']</pre>
<h2>Method 3: Use re.findall() Instead</h2>
<p>A simple and Pythonic solution is to use <code>re.findall(pattern, string)</code> with the inverse pattern used for splitting the list. If pattern A is used as a split pattern, everything that does not match pattern A can be used in the <code>re.findall()</code> function to essentially retrieve the split list.</p>
<p>Here’s the example that uses a <a rel="noreferrer noopener" href="https://blog.finxter.com/python-character-set-regex-tutorial/" target="_blank">negative character class</a> <code>[^\s]+</code> to find all characters that do not match the split pattern:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re text = '\n\tabc\n\txyz\n\tlmn\n'
words = re.findall('([^\s]+)', text)
print(words) # ['abc', 'xyz', 'lmn']</pre>
<p><strong>Note:</strong></p>
<p>The <code>re.findall(pattern, string)</code> method scans <code>string</code> from <strong>left to right</strong>, searching for all <strong>non-overlapping matches</strong> of the <code>pattern</code>. It returns a <strong>list of strings</strong> in the matching order when scanning the string from left to right.</p>
<figure class="wp-block-image size-full is-style-default"><img loading="lazy" decoding="async" width="768" height="432" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-229.png" alt="" class="wp-image-983047" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-229.png 768w, https://blog.finxter.com/wp-content/uplo...00x169.png 300w" sizes="(max-width: 768px) 100vw, 768px" /></figure>
<p class="has-base-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f30e.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /><strong>Related Read: <a href="https://blog.finxter.com/python-re-findall/" target="_blank" rel="noreferrer noopener">Python re.findall() – Everything You Need to Know</a></strong></p>
<h2><strong>Exercise</strong>: <strong>Split String and Remove Empty Strings</strong></h2>
<p><strong>Problem: </strong>Say you have been given a string that has been split by the split method on all occurrences of a given pattern. The pattern appears at the end and beginning of the string. How to get rid of the empty strings automatically? </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">s = '_hello_world_'
words = s.split('_')
print(words) # ['', 'hello', 'world', '']</pre>
<p>Note the empty strings in the resulting list.</p>
<p><strong>Expected Output:</strong></p>
<pre class="wp-block-code"><code>['hello', 'world']</code></pre>
<p><strong><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Hint: <a rel="noreferrer noopener" href="https://blog.finxter.com/python-regex-split-without-empty-string/" target="_blank">Python Regex Split Without Empty String</a></strong></p>
<p><strong>Solution:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re s = '_hello_world_'
words = s.split('_') # Method 1: Using List Comprehension
print([x for x in words if x!='']) # Method 2: Using filter
print(list(filter(bool, words))) # Method 3: Using re.findall
print(re.findall('([^_\s]+)', s))</pre>
<h2>Conclusion</h2>
<p>Thus, we come to the end of this tutorial. We have learned how to eliminate newline characters and empty strings from a list in Python in this article. I hope it helped you and answered all your queries. Please <strong><a href="https://blog.finxter.com/subscribe/" target="_blank" rel="noreferrer noopener">subscribe</a></strong> and stay tuned for more interesting reads. </p>
<hr class="wp-block-separator has-alpha-channel-opacity" />
</div>

https://www.sickgaming.net/blog/2022/12/...e-newline/

xSicKxBot